Uber Simulation — Distributed Microservices with Kafka, Redis, Docker & ML

A full-stack, event-driven Uber-like platform showcasing distributed systems design, load balancing, caching, and asynchronous workflows.
Built with a modern microservices architecture, combining backend, ML, infra, and frontend technologies into one scalable system.

Tagline

From ride request to bill in seconds — horizontally scalable, cache-accelerated, event-driven.

1️⃣ Executive Summary — What & Why

This project is a distributed, microservices-based simulation of Uber, built to demonstrate how modern ride-hailing platforms are engineered for scalability, reliability, and performance.

What it does

Customers can sign up, book rides, view history, and pay bills.
Drivers can register, manage profiles, accept rides, and track earnings.
Admins can oversee the entire system, add/manage drivers & customers, view revenue/ride statistics, and generate reports.
A Dynamic Pricing Service predicts fares in real-time using machine learning.
Billing is generated asynchronously using Kafka events to decouple ride completion from payment processing.

Why it matters

Real-world relevance: ride-hailing systems like Uber and Lyft rely heavily on distributed architectures to handle massive concurrency, dynamic pricing, and system resilience.
System Design strength: this project demonstrates event-driven architecture, load balancing, caching strategies, fault tolerance, and scalability principles expected in modern production systems.
Interview-ready: showcases hands-on experience with Kafka, Redis, Docker, MongoDB, FastAPI, React, and JMeter — technologies used widely in industry.

What makes it unique

Event-driven billing: Kafka ensures billing is processed asynchronously and idempotently, preventing double charges.
Performance-optimized: Redis caches accelerate driver search, revenue stats, and ride lookups with cache invalidation strategies.
Scalable architecture: Docker/Kubernetes deployment enables horizontal scaling across services.
Data-driven pricing: Machine learning (XGBoost via FastAPI) integrates with the ride lifecycle to provide surge/dynamic pricing predictions.
End-to-end coverage: From frontend UI (React) to backend microservices, infra (Docker/K8s), messaging (Kafka), and testing (JMeter), the system covers all critical aspects of distributed systems engineering.

2️⃣ Project Flow — Detailed Role-Centric Walkthrough

The platform is designed for three distinct roles — Customer, Driver, and Admin.
Each role is given tailored capabilities, interfaces, and responsibilities.
Below is a comprehensive walkthrough of how each role interacts with the system, enriched with screenshots and technical insights.

Customer Flow — "Requesting and Experiencing a Ride"

1. Sign Up & Authentication

Customers start by registering with details such as name, email, phone, and credit card info.
The credentials are hashed using bcrypt before storage, and a JWT token is issued upon login.
This token secures all subsequent requests (Authorization: Bearer <token>).

2. Booking a Ride

The customer enters pickup and drop-off locations.
The Rides Service queries the Drivers Service for available drivers within a 10-mile radius (using Haversine formula).
At this moment, the Dynamic Pricing ML Service is called to predict an estimated fare (/predict with distance, time, passenger count, etc.).
The ride is created in MongoDB with status = in_progress.

Behind the scenes:

Request hits rides-service (4001) → calls drivers-service (4002) + ml-service (8000) → persists in MongoDB.

3. Ride Tracking & Completion

The customer can see live ride status (requested → accepted → in_progress → completed).
When marked completed, the rides-service produces a ride-completed Kafka event.
Billing-service consumes this event asynchronously to generate a final bill.

Behind the scenes:

Kafka → ensures billing happens asynchronously, improving throughput.
Redis caches common queries like rides:byCustomer:{id} to accelerate dashboard loads.

4. Billing & Payments

Customers can view a detailed billing history: ride time, distance, predicted vs actual fare.
Data is fetched from the billing-service (with Redis caching for frequent lookups).

Behind the scenes:

Billing-service ensures idempotency (one bill per rideId).
Admins can also query the same bills for audits.

5. Feedback & Ratings

After ride completion, customers can rate their driver (1–5 stars) and leave comments.
Ratings update the driver’s aggregated score, influencing search results for future customers.

Driver Flow — "Managing Profile & Completing Rides"

1. Sign Up & Authentication

Drivers register with ID, car details, license, and insurance.
Secure auth with JWT tokens; passwords hashed with bcrypt.

2. Profile Management & Media Uploads

Drivers can upload a short video introduction and profile picture.
Videos/images are stored locally (/uploads) or Mongo (metadata), streamed with HTTP range requests.
All profile changes are cached for faster reads via Redis.

Behind the scenes:

File handling with Multer; cache invalidation triggered when profile is updated.

3. Accepting Rides

Drivers see ride requests in their area and accept assignments.
System updates ride → driverId field is set, status = accepted.

Behind the scenes:

Updates propagate to Redis (rides:byDriver:{id} cache refreshed).
Rides-service ensures drivers cannot double-book (transaction checks).

4. Completing Rides & Earnings Summary

After completion, driver’s account is updated with earnings, and ride-completed Kafka event is fired.
Drivers can view a summary dashboard showing completed rides, ratings, and earnings.

Admin Flow — "Overseeing the Ecosystem"

1. Secure Login

Admins log in with special credentials.
JWT auth with elevated role privileges.

2. Managing Drivers & Customers

Admins can add new drivers/customers or deactivate accounts.
These operations proxy calls to the drivers-service and customers-service.

3. Monitoring System Statistics

View daily revenue, rides by area, rides per driver, rides per customer.
Graphs generated via MongoDB aggregations + cached in Redis for dashboard speed.

4. Billing Oversight

Admins can search bills, audit discrepancies, and ensure fairness in pricing.

🔑 Role–Action Summary

Action / Feature	Customer	Driver	Admin
Register & Login	✅	✅	✅
Book/Accept/Complete Ride	✅	✅	—
Billing (view, history, audits)	✅	✅	✅
Profile & Media	✅	✅	✅
Ratings & Reviews	✅	✅	—
Manage Users	—	—	✅
View Revenue & Stats	—	—	✅

3️⃣ Project Flow — At a Glance

While each role has its own journey, the entire system works together as a distributed, event-driven platform.
This high-level flow shows how Customers, Drivers, and Admins interact with the microservices, and how Kafka, Redis, and MongoDB glue everything together.

System Interaction Diagram

flowchart LR
  %% Clients
  subgraph Clients
    CU[Customer UI]
    DR[Driver UI]
    AD[Admin UI]
  end

  %% Services
  subgraph Services
    R1[Rides 4001]
    D1[Drivers 4002]
    C1[Customers 4003]
    B1[Billing 4004]
    A1[Admin 4005]
  end

  %% ML
  subgraph ML
    ML1[Dynamic Pricing API 8000]
  end

  %% Infra
  subgraph Infra
    K[(Kafka)]
    X[(Redis)]
    M[(MongoDB)]
  end

  %% Flows
  CU --> R1
  DR --> D1
  AD --> A1

  R1 <--> D1
  R1 --> ML1
  A1 --> D1
  A1 --> C1
  A1 --> B1

  R1 <--> M
  D1 <--> M
  C1 <--> M
  B1 <--> M
  A1 <--> M

  R1 -.-> K
  K -.-> B1

  R1 <--> X
  D1 <--> X
  B1 <--> X

Explanation

Customers request rides → handled by Rides Service, which queries Drivers Service for nearby drivers and calls ML Service for dynamic fare prediction.
Drivers accept rides via Drivers Service, which updates ride status in MongoDB and invalidates Redis caches.
When a ride is completed, Rides Service emits a ride-completed event to Kafka.
Billing Service consumes this event, generates the final bill, stores it in MongoDB, and caches frequent queries in Redis.
Admins interact with Admin Service to manage users, audit bills, and view system-wide statistics (fueled by MongoDB aggregations + Redis cache).
Redis ensures fast reads (driver search, revenue stats, ride lookups).
Kafka decouples services, ensuring that ride completion and billing remain scalable and resilient.
MongoDB stores all persistent entities (Drivers, Customers, Rides, Bills, Reviews, Media metadata).

Role–Action Matrix

Action / Feature	Customer	Driver	Admin
Register & Login	✅	✅	✅
Book a Ride	✅	—	—
Accept / Complete Ride	—	✅	—
View Billing & History	✅	✅	✅
Profile & Media	✅	✅	✅
Ratings & Reviews	✅	✅	—
Manage Users (CRUD)	—	—	✅
Revenue & Ride Analytics	—	—	✅
Audit Bills	—	—	✅

4️⃣ System Architecture — High-Level Tech Overview

At its core, the Uber Simulation is built on a microservices architecture.
Each domain (Rides, Drivers, Customers, Billing, Admin, Pricing) is implemented as a separate service, allowing for independent development, deployment, and scaling.
The services communicate via REST APIs (synchronous) and Kafka events (asynchronous), while Redis accelerates hot lookups and MongoDB persists system state.

Architecture Diagram

flowchart TB
  %% Layers
  subgraph L0["Client Layer"]
    FE["React + Redux (Frontend)"]
  end

  subgraph L1["Edge / Networking"]
    GW["Ingress / Reverse Proxy (NGINX)"]
  end

  subgraph L2["Microservices Layer"]
    R1["Rides svc :4001"]
    D1["Drivers svc :4002"]
    C1["Customers svc :4003"]
    B1["Billing svc :4004"]
    A1["Admin svc :4005"]
    ML["Dynamic Pricing (FastAPI) :8000"]
  end

  subgraph L3["Data & Infra Layer"]
    DB[(MongoDB)]
    RD[(Redis)]
    KF[(Kafka Broker)]
    ZK[(Zookeeper)]
  end

  %% Flows
  FE -->|HTTPS| GW
  GW -->|HTTP| R1
  GW -->|HTTP| D1
  GW -->|HTTP| C1
  GW -->|HTTP| B1
  GW -->|HTTP| A1

  R1 -->|"HTTP /predict"| ML

  R1 --- DB
  D1 --- DB
  C1 --- DB
  B1 --- DB
  A1 --- DB

  R1 --- RD
  D1 --- RD
  B1 --- RD

  R1 -. "produce ride-completed" .-> KF
  KF  -. "consume" .-> B1
  ZK --- KF

Service Overview

Service	Port	Responsibilities	Key Tech
Rides Service	4001	Core ride lifecycle: create, update, near-by driver search, reviews, statistics; produces `ride-completed` Kafka events; integrates with ML service for fare prediction.	Node.js, Express, MongoDB, Redis, Kafka Producer
Drivers Service	4002	Driver auth & profile management, car/insurance details, intro videos, search (cached).	Node.js, Express, MongoDB, Redis
Customers Service	4003	Customer auth, profile management, ride history links to rides/billing.	Node.js, Express, MongoDB
Billing Service	4004	Bill generation & search; consumes `ride-completed` Kafka events; ensures idempotency.	Node.js, Express, MongoDB, Redis, Kafka Consumer
Admin Service	4005	Admin auth, add/manage drivers & customers, revenue & ride statistics, billing audits.	Node.js, Express, MongoDB, Redis
Dynamic Pricing Model	8000	Machine learning service providing estimated_price predictions during ride creation.	FastAPI, Python, XGBoost, Joblib

Infrastructure Components

MongoDB (Atlas/local) → primary data store for all entities (Drivers, Customers, Rides, Billing, Reviews, Media metadata).
Redis → cache for driver searches, revenue stats, ride/billing lookups.
Kafka → event bus connecting rides → billing; ensures decoupled, resilient workflows.
Docker → containerization of all services for local and cloud deployment.
Kubernetes (K8s-ready) → orchestration layer for scalability and load balancing.
JMeter → load/performance testing across scenarios (B, B+S, B+S+K).
React + Redux → frontend client (Customer, Driver, Admin portals).

Why Microservices?

Scalability: each service can be scaled independently (e.g., rides-service under heavy load).
Resilience: failure in billing-service won’t block ride creation thanks to Kafka decoupling.
Technology fit: Python/ML model isolated from Node.js services.
Team productivity: each service can be developed & deployed by separate teams.

5️⃣ Request Lifecycles & Sequence Diagrams

Understanding the internal workflows is crucial for evaluating system design.
Below are the four most important request lifecycles, illustrated with sequence diagrams.

1. Authentication & JWT Flow

sequenceDiagram
  participant U as User (Customer/Driver/Admin)
  participant S as Service (e.g., Drivers)
  participant DB as MongoDB
  participant J as JWT Middleware

  U->>S: POST /login (email + password)
  S->>DB: Verify credentials (bcrypt hash)
  DB-->>S: User found + valid
  S-->>U: 200 OK + { token: "Bearer <JWT>" }
  U->>S: GET /protected (Authorization: Bearer <JWT>)
  S->>J: Verify token + role
  J-->>S: OK (req.user populated)
  S-->>U: Protected resource JSON

Key Points

Passwords stored as bcrypt hashes.
JWT includes user role → used for authorization in role-specific routes.
Stateless → services can scale horizontally without sticky sessions.

2. Ride Completion → Billing (Event-Driven with Kafka)

sequenceDiagram
  participant C as Customer
  participant R as Rides Service
  participant ML as ML Service
  participant K as Kafka Broker
  participant B as Billing Service
  participant M as MongoDB

  C->>R: POST /api/rides (pickup, dropoff)
  R->>ML: POST /predict (distance, time, passengers, etc.)
  ML-->>R: { estimated_price }
  R->>M: Save ride { status: in_progress, estimatedPrice }
  
  C->>R: PATCH /api/rides/:id/status completed
  R->>K: produce("ride-completed", ride data)
  K-->>B: consume("ride-completed")
  B->>M: Insert/Upsert Bill (idempotent by rideId)
  B-->>C: Bill generated & available

Key Points

Asynchronous decoupling: rides-service doesn’t wait for billing → higher throughput.
Idempotency: billing ensures no duplicate bills for same rideId.
Scalability: Kafka can buffer load spikes (durable queue).

3. Nearby Driver Search (with Redis Caching)

sequenceDiagram
  participant C as Customer
  participant R as Rides Service
  participant D as Drivers Service
  participant X as Redis
  participant M as MongoDB

  C->>R: GET /api/rides/nearby-drivers?lat=...&lng=...
  R->>X: Check cache key driver:search:{lat,lng}
  alt Cache hit
    X-->>R: Cached list of drivers
  else Cache miss
    R->>D: Query drivers within 10 miles (Haversine formula)
    D->>M: Geo query in MongoDB
    M-->>D: List of drivers
    D-->>R: Driver list
    R->>X: Cache results with TTL=60s
  end
  R-->>C: List of available drivers

Key Points

Redis reduces repeated queries for popular areas (e.g., airports).
TTL ensures data freshness (drivers update frequently).
Reduces MongoDB load under high concurrency.

4. Admin Analytics (Stats with MongoDB + Redis)

sequenceDiagram
  participant A as Admin
  participant AS as Admin Service
  participant B as Billing Service
  participant X as Redis
  participant M as MongoDB

  A->>AS: GET /api/admin/statistics/revenue
  AS->>X: Check cache key stats:revenue:day:{date}
  alt Cache hit
    X-->>AS: Cached revenue data
  else Cache miss
    AS->>B: Request billing summary
    B->>M: Aggregate bills by date
    M-->>B: { totalRevenue, ridesPerArea, ridesPerDriver }
    B-->>AS: Aggregated stats
    AS->>X: Cache stats for 5 mins
  end
  AS-->>A: Render revenue dashboard (charts)

Key Points

MongoDB aggregation pipelines compute totals, grouped by day/area/driver.
Redis ensures dashboards load fast (<200ms).
Admin sees updated revenue with minimal DB load.

6️⃣ Microservices Deep Dive

Each domain is implemented as an independent service with its own API, data model, caching strategy, and (where applicable) Kafka role.

Notation:
HTTP→ outbound service call • K = Kafka role • R = Redis keys • DB = Mongo collections

Rides Service (4001)

Purpose
Core ride lifecycle: create/update, nearby driver search, ride statistics, reviews, media metadata, and produces ride-completed on finish. Integrates with ML for fare prediction.

Top Endpoints

Method	Path	Purpose
`POST`	`/api/rides`	Create ride (calls ML `/predict`, finds nearest driver)
`PATCH`	`/api/rides/:id/status`	Update status; on `completed` → produce Kafka `ride-completed`
`GET`	`/api/rides/nearby-drivers?lat=&lng=`	Haversine search for drivers around a point
`GET`	`/api/rides/statistics`	Aggregated stats (revenue/day, rides/hour/area/driver/customer)
`GET`	`/api/rides/:id` / `DELETE`	Get / delete ride
`POST`	`/api/rides/:id/images`	Attach image metadata to ride
`POST`	`/api/rides/reviews`	Create review (customer↔driver)
`GET`	`/api/rides/reviews/user/:userId`	Reviews by user

Inter-Service Calls (HTTP→)

HTTP→ Drivers: fetch drivers for nearby search / details.
HTTP→ ML: POST /predict to calculate estimatedPrice.

Kafka (K)

Producer: ride-completed (payload includes rideId, driverId, customerId, distanceKm, predictedPrice, actualPrice, startedAt, endedAt).
Idempotency target: Billing upserts by rideId.

Redis (R)

Reads/Writes:
- driver:search:{lat}:{lng} (60s) – cached search results.
- rides:byDriver:{driverId} (60s) – recent rides for driver.
Invalidate on write (status change, create/delete).

Mongo (DB)

rides (ride lifecycle), reviews (ratings), media (metadata).

Sample cURL

# Create ride (server will call ML for estimated price)
curl -X POST http://localhost:4001/api/rides \
  -H "Content-Type: application/json" \
  -d '{"customerId":"CUS-42","pickup":{"lat":37.77,"lng":-122.42},"dropoff":{"lat":37.79,"lng":-122.39},"passengerCount":1}'

Drivers Service (4002)

Purpose

Driver identity & profile management:

Signup / login (JWT, bcrypt)
Car & insurance details
Intro video upload/stream
Cached search for nearby drivers
Summaries for dashboards

Top Endpoints

Method	Path	Purpose
POST	`/api/drivers/signup` / `/login`	Auth (JWT), bcrypt password storage
GET	`/api/drivers` / `/search?q=`	List drivers & cached search
GET	`/api/drivers/:id`	Fetch profile
PUT	`/api/drivers/:id`	Update profile
DELETE	`/api/drivers/:id`	Delete profile
POST	`/api/drivers/:id/video`	Upload intro video
GET	`/api/drivers/:id/video`	Stream intro video (HTTP range)
GET	`/api/drivers/:driverId/summary`	Aggregates for dashboards (earnings, ratings)

Inter-Service Calls

Serves data to Rides (nearby search)
Serves data to Admin (management & dashboards)

Redis

driver:search:{q} (TTL 60s)
driver:summary:{driverId} (TTL 60s)
Invalidate on profile updates

MongoDB

drivers → profile, vehicle, insurance, location
videos → paths, metadata

Customers Service (4003)

Purpose

Customer authentication & profile management:

Signup/login
Profile CRUD
Links to rides/billing via UI/API layer

Top Endpoints

Method	Path	Purpose
POST	`/api/customers/signup` / `/login`	Auth (JWT)
GET	`/api/customers`	List customers
GET	`/api/customers/:id`	Fetch details
PUT	`/api/customers/:id`	Update profile
DELETE	`/api/customers/:id`	Delete profile

Inter-Service Calls

Called by Admin for management
UI fetches rides/billing directly from their services

MongoDB

customers → PII, address, masked card refs, preferences

Billing Service (4004)

Purpose

Generate & search bills
Consume ride-completed events
Guarantee idempotency (unique rideId)

Top Endpoints

Method	Path	Purpose
POST	`/api/billing/rides/:rideId`	Manually create bill (check by rideId)
GET	`/api/billing/:billId`	Fetch single bill
GET	`/api/billing/search?driverId=&customerId=&status=`	Search bills
GET	`/api/billing/customer/:customerId`	Bills by customer
GET	`/api/billing/driver/:driverId`	Bills by driver

Kafka

Consumer: ride-completed
Group: billing-consumer-group
On consume → upsert into billing collection → cache hot queries

Redis

billing:byUser:{userId}:{role} (TTL 60s) → customer/driver lists
stats:revenue:day:{YYYY-MM-DD} (TTL 300s) → precomputed daily revenue

MongoDB

billing → predicted vs actual, totals, timestamps, status

Admin Service (4005)

Purpose

Administrative plane:

Privileged authentication
Manage drivers/customers
Financial & ride analytics
Bill audits

Top Endpoints

Method	Path	Purpose
POST	`/api/admin/signup` / `/login`	Admin authentication
POST	`/api/admin/drivers` / `/customers`	Proxy create via services
GET	`/api/admin/statistics/revenue`	Revenue per day (charts)
GET	`/api/admin/statistics/rides`	Rides per area/driver/customer
GET	`/api/admin/bills/search`	Search bills
GET	`/api/admin/bills/:billId`	Billing tools

Inter-Service Calls

HTTP → Drivers/Customers for management
HTTP → Billing for audits & stats
HTTP → Rides for ride metrics

Redis

Reads cached stats & billing keys
May set dashboard caches (TTL 5m)

MongoDB

Minimal own state
Mostly queries across other domains

Dynamic Pricing Model (FastAPI, 8000)

Purpose

Predict estimated_price during ride creation based on features:

Distance (km)
Time of day
Weekend / night flag
Passenger count

Endpoint

Method	Path	Body	Response
POST	`/predict`	`{ distance_km, passenger_count, hour, day_of_week, is_weekend, is_night }`	`{ "estimated_price": <float> }`

Example

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
        "distance_km": 12.5,
        "passenger_count": 1,
        "hour": 18,
        "day_of_week": 5,
        "is_weekend": 1,
        "is_night": 0
      }'

7️⃣ Data Models & Persistence

The system uses MongoDB for operational data across domains (Drivers, Customers, Rides, Billing, Reviews, Media metadata).
Schemas are optimized for read performance (dashboards/search), idempotency (billing), and evolving structures (optional media/metadata).

Entity-Relationship (ER) Diagram

erDiagram
  DRIVER {
    string _id
    string driverId      "external id (SSN-like)"
    string email         "unique"
    string passwordHash
    string name
    object carDetails    "make, model, plate"
    object insurance     "policyNo, expiresAt"
    object location      "lat, lng"
    int    rating
    string videoPath
    date   createdAt
    date   updatedAt
  }

  CUSTOMER {
    string _id
    string customerId    "external id"
    string email         "unique"
    string passwordHash
    string name
    object address       "city, state, zip"
    object card          "tokenized ref"
    int    rating
    date   createdAt
    date   updatedAt
  }

  RIDE {
    string _id
    string rideId        "unique human id"
    datetime dateTime
    object pickup        "lat, lng, address"
    object dropoff       "lat, lng, address"
    string driverId
    string customerId
    number distanceKm
    number estimatedPrice
    number actualPrice
    string status        "requested|accepted|in_progress|completed|canceled"
    array  media         "mediaId[]"
    date   createdAt
    date   updatedAt
  }

  BILLING {
    string _id
    string billingId     "BILL-<timestamp>"
    string rideId        "unique (idempotency)"
    string driverId
    string customerId
    number predictedPrice
    number actualPrice
    number distanceKm
    datetime startedAt
    datetime endedAt
    string status        "created|paid|void"
    date   createdAt
    date   updatedAt
  }

  REVIEW {
    string _id
    string rideId
    string reviewerId
    string revieweeId
    string reviewerType  "driver|customer"
    int    rating        "1..5"
    string comment
    date   createdAt
  }

  MEDIA {
    string _id
    string rideId
    string ownerId       "driverId|customerId"
    string path          "local/S3 url"
    string type          "image|video"
    number sizeBytes
    string contentType
    date   createdAt
  }

  DRIVER   ||--o{ RIDE   : drives
  CUSTOMER ||--o{ RIDE   : books
  RIDE     ||--o{ BILLING: generates
  RIDE     ||--o{ REVIEW : has
  DRIVER   ||--o{ REVIEW : receives
  CUSTOMER ||--o{ REVIEW : receives
  RIDE     ||--o{ MEDIA  : attaches

MongoDB Collections & Indexing

Collection	Key Fields	Recommended Indexes	Notes
`drivers`	`_id, driverId, email, location(lat,lng), rating, videoPath`	`email` (unique), `driverId` (unique), (future) `2dsphere` on `location`	Cached search results in Redis via `driver:search:{q}`; summary cache `driver:summary:{driverId}`.
`customers`	`_id, customerId, email, address, rating`	`email` (unique), `customerId` (unique)	Card data should be tokenized (never store PAN).
`rides`	`_id, rideId, driverId, customerId, status, dateTime, pickup/dropoff.lat/lng`	`rideId` (unique), `driverId`, `customerId`, `status`, `dateTime`, (future) `2dsphere` on pickup/dropoff	Hot path for dashboards; `rides:byDriver:{driverId}` cache with TTL.
`billing`	`_id, billingId, rideId, driverId, customerId, predictedPrice, actualPrice, status`	`rideId` (unique), `driverId`, `customerId`, `status`, `createdAt`	Idempotency by `rideId` (prevents duplicate bills from repeated events).
`reviews`	`_id, rideId, reviewerId, revieweeId, reviewerType, rating`	`rideId`, `revieweeId`, `rating`	Used to compute rating aggregates in app/service layer.
`media`	`_id, rideId, ownerId, path, type, sizeBytes, contentType`	`rideId`, `ownerId`, `type`, `createdAt`	Store only metadata in DB; file on disk or S3; serve via signed URLs/range.

Idempotency & Referential Integrity

Billing Idempotency

billing.rideId is unique.
Kafka consumer performs upsert by rideId to avoid duplicate bills when the ride-completed event is replayed.

Ride Lifecycle Checks

Validate legal status transitions:
- requested → accepted → in_progress → completed
- Prevent double completion or invalid jumps (e.g., in_progress → canceled).

Foreign Keys (Logical)

rides.driverId and rides.customerId must reference existing documents.
Enforce via pre-create checks in services (and optional MongoDB schema validation).

Validation & Security

Passwords: store bcrypt hashes (never plain text).
Emails & IDs: validate on input; email uniqueness enforced at DB-level.
Coordinates: lat ∈ [-90, 90], lng ∈ [-180, 180].
Reviews: enforce rating ∈ [1..5].
Media: enforce content types & size limits on upload; sanitize filenames; store paths only.
JWT: required for protected endpoints; role claims (admin|driver|customer) checked per route.

Data Lifecycle & Retention

Rides/Billing: retain indefinitely for analytics; optionally archive to cold collections or a data lake for historical reporting.
Media: apply TTL or move to cheaper storage (e.g., S3 Glacier) after N days.
Caches (Redis): ephemeral; short TTLs (60s–5m) tuned per key; safe to flush during incidents.
Soft deletes (optional): add isActive / deletedAt to drivers/customers to avoid hard deletes.

Aggregations for Admin Analytics (Examples)

Revenue per Day

db.billing.aggregate([
  { $match: { status: "paid" } },
  {
    $group: {
      _id: { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" } },
      totalRevenue: { $sum: "$actualPrice" },
      rides: { $sum: 1 }
    }
  },
  { $sort: { _id: 1 } }
])

Top Drivers by Revenue

 db.billing.aggregate([
  { $match: { status: "paid" } },
  {
    $group: {
      _id: "$driverId",
      revenue: { $sum: "$actualPrice" },
      rides: { $sum: 1 }
    }
  },
  { $sort: { revenue: -1 } },
  { $limit: 10 }
])

8️⃣ Kafka & Eventing

The system uses Apache Kafka to decouple ride completion from billing.
By emitting a ride-completed event, the Rides Service hands off billing work to the Billing Service asynchronously, improving throughput and resilience.

Topic Specification

Topic	Purpose	Producer	Consumer	Partitions	Replication	Key
`ride-completed`	Notify that a ride finished (billable)	Rides Service	Billing Service (CG)	3–6 (cfg)	1–3 (cfg)	`rideId` (str)

Partitioning strategy: keyBy(rideId) keeps the same ride’s messages ordered on a single partition → simplifies idempotent upsert logic in Billing.
Consumer group: billing-consumer-group (scales horizontally; each instance gets a subset of partitions).
Delivery semantics: at-least-once (consumer commits after upsert). With upsert idempotency in Billing, duplicates are safe.

Event Schema (JSON)

{
  "eventType": "ride-completed",
  "version": 1,
  "rideId": "RIDE-2025-09-18-00123",
  "driverId": "DRV-9",
  "customerId": "CUS-42",
  "distanceKm": 12.1,
  "predictedPrice": 18.75,
  "actualPrice": 19.40,
  "startedAt": "2025-09-18T10:00:00Z",
  "endedAt": "2025-09-18T10:25:00Z",
  "metadata": {
    "source": "rides-service",
    "emittedAt": "2025-09-18T10:25:05Z",
    "traceId": "f5c9…"
  }
}

Schema Notes

Include eventType & version to evolve payloads safely.
traceId helps correlate logs across Rides ↔ Kafka ↔ Billing.

Producer Logic (Rides Service)

Emit exactly one event when status transitions to completed.
Use synchronous confirmation (await Kafka produce) or buffered with retry/backoff.
Attach rideId as the message key.

Example (Node.js with kafkajs)

import { Kafka } from "kafkajs";
const kafka = new Kafka({ clientId: "rides", brokers: [process.env.KAFKA_BROKERS] });
const producer = kafka.producer();

await producer.connect();
await producer.send({
  topic: "ride-completed",
  messages: [{
    key: ride.rideId,
    value: JSON.stringify({
      eventType: "ride-completed",
      version: 1,
      rideId: ride.rideId,
      driverId: ride.driverId,
      customerId: ride.customerId,
      distanceKm: ride.distanceKm,
      predictedPrice: ride.estimatedPrice,
      actualPrice: ride.actualPrice,
      startedAt: ride.startedAt,
      endedAt: ride.endedAt,
      metadata: { source: "rides-service", emittedAt: new Date().toISOString(), traceId }
    })
  }]
});

Consumer Logic (Billing Service)

At-least-once processing → commit offset only after successful upsert.
Idempotency: billing collection has a unique index on rideId; consumer performs upsert by rideId to avoid duplicates.
Failure handling: retry with backoff; if still failing, log & (optional) publish to a DLQ topic.

Example (Node.js with kafkajs)

import { Kafka } from "kafkajs";
const kafka = new Kafka({ clientId: "billing", brokers: [process.env.KAFKA_BROKERS] });
const consumer = kafka.consumer({ groupId: "billing-consumer-group" });

await consumer.connect();
await consumer.subscribe({ topic: "ride-completed", fromBeginning: false });

await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    const payload = JSON.parse(message.value.toString());
    try {
      // Upsert by rideId (idempotent)
      await BillingModel.updateOne(
        { rideId: payload.rideId },
        {
          $set: {
            driverId: payload.driverId,
            customerId: payload.customerId,
            predictedPrice: payload.predictedPrice,
            actualPrice: payload.actualPrice,
            distanceKm: payload.distanceKm,
            startedAt: new Date(payload.startedAt),
            endedAt: new Date(payload.endedAt),
            status: "created",
            updatedAt: new Date()
          },
          $setOnInsert: { billingId: `BILL-${Date.now()}`, createdAt: new Date() }
        },
        { upsert: true }
      );
      // offset is auto-committed by kafkajs unless manual commit mode is enabled
    } catch (err) {
      console.error("Billing consume error:", err);
      // Optional Dead Letter Queue (DLQ)
      // await dlqProducer.send({
      //   topic: "ride-completed.DLQ",
      //   messages: [{ key: payload.rideId, value: JSON.stringify(payload) }]
      // });
    }
  }
});

Reliability & Backpressure

At-least-once + idempotent upsert → safe duplicates, never double-charge.
Consumer group scaling: run N billing instances → Kafka partitions are divided → linear throughput gains (bounded by partitions).
Backpressure: if Billing lags, Kafka buffers messages durably; Rides remains fast.
Retries: exponential backoff on transient DB/Redis errors.

Local Operations (Docker)

Create topic (if auto-create disabled):

docker exec -it kafka kafka-topics \
  --create --topic ride-completed \
  --bootstrap-server kafka:9092 \
  --partitions 3 --replication-factor 1

Describe topic:

docker exec -it kafka kafka-topics \
  --describe --topic ride-completed \
  --bootstrap-server kafka:9092

Consume (debug):

docker exec -it kafka kafka-console-consumer \
  --bootstrap-server kafka:9092 \
  --topic ride-completed --from-beginning

Produce (debug):

docker exec -it kafka kafka-console-producer \
  --broker-list kafka:9092 --topic ride-completed
# paste a JSON line and press Enter to send

In Docker Compose, services reach Kafka via the hostname kafka:9092.
On host tools, use the advertised listener localhost:29092 (if configured).

Monitoring & Observability

Metrics to watch: consumer lag per partition, produce/consume rates, error counts, retry counts.
Logging: include traceId across Rides → Kafka → Billing to correlate events.
Future: add Prometheus exporters (e.g., Burrow for consumer lag) + Grafana dashboards.

Design Rationale (Interview Notes)

Why Kafka? Asynchronous decoupling improves ride throughput and system resilience vs. synchronous billing API calls.
Why at-least-once (not exactly-once)? Simpler + robust with idempotent DB upserts; operationally safer than coordinated transactions.
Why key by rideId? Ensures per-ride ordering and simplifies idempotent consumer logic.
What about DLQ? Optional safety net for poison messages; keeps main consumer healthy while problematic events are quarantined.

9️⃣ Redis Caching Strategy

To reduce MongoDB query load and deliver sub-200ms responses on hot paths, the system uses Redis as a distributed in-memory cache.

Redis follows a cache-aside pattern:

Service checks Redis first.
On cache miss → query MongoDB → return result + populate Redis with TTL.
On updates → invalidate affected cache keys.

Cached Keys

Key Pattern	Value	TTL	Writer Service	Reader Services
`driver:search:{q}`	JSON list of drivers for search	60s	Drivers	Rides, Admin
`driver:summary:{driverId}`	Aggregated stats for profile/dashboard	60s	Drivers	Admin
`rides:byDriver:{driverId}`	List of rides for driver	60s	Rides	Admin
`billing:byUser:{userId}:{role}`	Bills for customer/driver	60s	Billing	Admin, Customer
`stats:revenue:day:{YYYY-MM-DD}`	Revenue + ride counts for day	300s	Billing/Admin	Admin dashboard

Cache Invalidation Rules

Driver profile update → DEL driver:search:* and driver:summary:{driverId}.
New ride / status update → DEL rides:byDriver:{driverId}.
Billing created/updated → DEL billing:byUser:* and stats:revenue:day:*.
Admin dashboards refresh every 5 minutes → Redis keys expire naturally.

Flow Example — Nearby Driver Search

sequenceDiagram
  participant C as Customer
  participant R as Rides Service
  participant D as Drivers Service
  participant X as Redis
  participant M as MongoDB

  C->>R: GET /api/rides/nearby-drivers?lat=..&lng=..
  R->>X: Check key driver:search:{lat}:{lng}
  alt Cache hit
    X-->>R: Cached drivers (ms response)
  else Cache miss
    R->>D: Fetch drivers in area
    D->>M: Geo query (Haversine in Mongo)
    M-->>D: Drivers list
    D-->>R: Drivers list
    R->>X: Cache result (TTL 60s)
  end
  R-->>C: Driver list

Impact on Performance

Baseline (B): Every request hits MongoDB → latency spikes under load.
B+S (Base + SQL Caching): Redis absorbs hot read traffic → p95 latency cut by ~40%, throughput ↑.
B+S+K (Base + Redis + Kafka): Billing async + Redis caching → smoothest performance; Mongo load reduced drastically.

See performance graphs in Section 14 — Load Testing.

Design Rationale

Why cache-aside? Simple, widely used; services decide what to cache.
Why short TTLs (60s–300s)? Keeps data fresh (drivers move constantly, revenue updates every few mins).
Why Redis over in-process cache? Distributed; works across multiple service instances → safe for horizontal scaling.
What about consistency? Slight staleness tolerated (e.g., nearby drivers list). Strong consistency maintained via invalidation on updates.

🔟 Dynamic Pricing (ML Service)

The platform includes a machine learning microservice to simulate Uber's dynamic pricing model.
This ensures fares reflect demand, supply, and context (time, location, conditions).

Model Overview

Framework: FastAPI (Python) serving an XGBoost regression model.
Trained on: Uber Fares Kaggle dataset (pickup/dropoff coordinates, datetime, passenger count, fare amount).
Serialization: Model persisted via joblib for fast loading.
Serving: Uvicorn ASGI server, containerized with Docker.

Feature Inputs

Feature	Type	Example	Why it matters
`distance_km`	float	`12.5`	Longer trips → higher base fare
`passenger_count`	int	`1`	More passengers → adjusted pricing
`hour`	int (0–23)	`18` (6 PM)	Captures rush hour patterns
`day_of_week`	int (0–6)	`5` (Friday)	Captures weekday vs weekend demand
`is_weekend`	binary	`1`	Surge more likely on weekends
`is_night`	binary	`0`	Night trips may have premiums

API Specification

Method	Path	Body	Response
`POST`	`/predict`	JSON with feature set	`{ "estimated_price": <float> }`

Example Request

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "distance_km": 12.5,
    "passenger_count": 1,
    "hour": 18,
    "day_of_week": 5,
    "is_weekend": 1,
    "is_night": 0
  }'

Example Response

{ "estimated_price": 21.37 }

Integration in Ride Lifecycle

Customer books a ride → rides-service calls ML service with trip features.
ML service returns predicted fare (estimatedPrice).
Rides-service persists the ride with this price.
After ride completion → Billing compares predicted vs actual fare → stores both for auditing.

Design Rationale (Interview Notes)

Why separate ML service? Decouples Python stack from Node.js services; can scale independently.
Why FastAPI? Lightweight, async-friendly, production-ready for ML serving.
Why dynamic pricing? Simulates real Uber “surge” behavior where supply-demand elasticity impacts pricing.
What if ML fails? rides-service can fallback to a static formula (distance × rate).

1️⃣1️⃣ API Reference (Selected)

The system exposes REST APIs across microservices.
Below are the most important endpoints, grouped by service, with examples.

Rides Service (4001)

Create Ride

POST /api/rides
Content-Type: application/json
Authorization: Bearer <JWT>

{
  "customerId": "CUS-42",
  "pickup": { "lat": 37.77, "lng": -122.42 },
  "dropoff": { "lat": 37.79, "lng": -122.39 },
  "passengerCount": 1
}

Example Response

{
  "rideId": "RIDE-2025-0001",
  "status": "in_progress",
  "estimatedPrice": 21.37
}

Update Ride Status (→ triggers Kafka)

PATCH /api/rides/:id/status
Authorization: Bearer <JWT>

{ "status": "completed", "actualPrice": 22.10 }

Drivers Service (4002)

Signup

POST /api/drivers/signup
Content-Type: application/json

{
  "driverId": "DRV-100",
  "email": "alex@demo.com",
  "password": "SafePass123!",
  "carDetails": { "make": "Toyota", "model": "Prius" }
}

Search Drivers (cached in Redis)

GET /api/drivers/search?q=Prius
Authorization: Bearer <JWT>

Customers Service (4003)

Signup

POST /api/customers/signup
Content-Type: application/json

{
  "customerId": "CUS-42",
  "email": "jane@demo.com",
  "password": "SafePass123!",
  "address": { "city": "San Jose", "state": "CA", "zip": "95123" }
}

Get Customer Profile

GET /api/customers/CUS-42
Authorization: Bearer <JWT>

Billing Service (4004)

Get Bill by Ride

GET /api/billing/rides/RIDE-2025-0001
Authorization: Bearer <JWT>

Example Response

{
  "billingId": "BILL-17475391",
  "rideId": "RIDE-2025-0001",
  "predictedPrice": 21.37,
  "actualPrice": 22.10,
  "status": "created"
}

Search Bills

GET /api/billing/search?driverId=DRV-100&status=created
Authorization: Bearer <JWT>

Admin Service (4005)

Admin Login

POST /api/admin/login
Content-Type: application/json

{ "email": "admin@demo.com", "password": "AdminPass123!" }

Get Revenue Stats

GET /api/admin/statistics/revenue
Authorization: Bearer <JWT>

Example Response

{
  "date": "2025-09-18",
  "totalRevenue": 12340.75,
  "rides": 842
}

Dynamic Pricing Service (8000)

Predict Fare

POST /predict
Content-Type: application/json

{
  "distance_km": 12.5,
  "passenger_count": 1,
  "hour": 18,
  "day_of_week": 5,
  "is_weekend": 1,
  "is_night": 0
}

Example Response

{ "estimated_price": 21.37 }

Notes

All protected endpoints require Authorization: Bearer <JWT>.
JWT tokens embed role claims (customer, driver, admin) → enforced in route middleware.
Responses are always in JSON format.
Errors follow the structure:
```
{ "error": "Message" }
```

1️⃣2️⃣ Deployment & Operations

This section shows how to configure environments and run the full stack locally with Docker.
All services are 12-factor style: configuration comes from environment variables.

Environment Variables (Matrix)

Var	Rides	Drivers	Customers	Billing	Admin	ML	Infra	Notes
`PORT`	✅	✅	✅	✅	✅	✅	—	Service port (4001–4005, 8000)
`NODE_ENV`	✅	✅	✅	✅	✅	—	—	`development` \| `production`
`MONGO_URI`	✅	✅	✅	✅	✅	—	—	e.g. `mongodb://mongo:27017/uber_sim` or separate DBs per svc
`REDIS_URL`	✅	✅	—	✅	✅	—	—	`redis://redis:6379`
`KAFKA_BROKERS`	✅	—	—	✅	—	—	—	`kafka:9092` inside Docker
`JWT_SECRET`	✅	✅	✅	✅	✅	—	—	Same secret across Node services
`ML_URL`	✅	—	—	—	—	—	—	`http://ml-service:8000/predict`
`ALLOWED_ORIGINS`	✅	✅	✅	✅	✅	—	—	CORS (comma-separated)
`LOG_LEVEL`	✅	✅	✅	✅	✅	✅	—	`info` \| `debug`
`MODEL_PATH`	—	—	—	—	—	✅	—	e.g. `/app/models/xgb.joblib`

Create .env files from .env.example under each service directory and populate these values.

Docker Compose (Local)

Save as docker-compose.yml in repo root (replace if you already have one).
This brings up Mongo, Redis, Zookeeper, Kafka, the ML service, and all Node services.

version: "3.9"

services:
  mongo:
    image: mongo:6
    container_name: mongo
    ports: [ "27017:27017" ]
    volumes:
      - mongo_data:/data/db
    environment:
      MONGO_INITDB_DATABASE: uber_sim

  redis:
    image: redis:7
    container_name: redis
    ports: [ "6379:6379" ]

  zookeeper:
    image: confluentinc/cp-zookeeper:7.6.1
    container_name: zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  kafka:
    image: confluentinc/cp-kafka:7.6.1
    container_name: kafka
    depends_on: [ zookeeper ]
    ports:
      - "29092:29092"   # host access
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_LISTENERS: "PLAINTEXT://kafka:9092,PLAINTEXT_HOST://0.0.0.0:29092"
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092"
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  ml-service:
    build:
      context: ./services/dynamic_pricing_model
      dockerfile: Dockerfile
    container_name: ml-service
    environment:
      PORT: 8000
      LOG_LEVEL: info
      MODEL_PATH: /app/models/xgb.joblib
    ports: [ "8000:8000" ]
    depends_on: [ mongo ]

  rides-service:
    build:
      context: ./services/rides
      dockerfile: Dockerfile
    container_name: rides-service
    environment:
      PORT: 4001
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_rides
      REDIS_URL: redis://redis:6379
      KAFKA_BROKERS: kafka:9092
      JWT_SECRET: change_me
      ML_URL: http://ml-service:8000/predict
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4001:4001" ]
    depends_on: [ mongo, redis, kafka, ml-service ]

  drivers-service:
    build:
      context: ./services/drivers
      dockerfile: Dockerfile
    container_name: drivers-service
    environment:
      PORT: 4002
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_drivers
      REDIS_URL: redis://redis:6379
      JWT_SECRET: change_me
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4002:4002" ]
    depends_on: [ mongo, redis ]

  customers-service:
    build:
      context: ./services/customers
      dockerfile: Dockerfile
    container_name: customers-service
    environment:
      PORT: 4003
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_customers
      JWT_SECRET: change_me
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4003:4003" ]
    depends_on: [ mongo ]

  billing-service:
    build:
      context: ./services/billing
      dockerfile: Dockerfile
    container_name: billing-service
    environment:
      PORT: 4004
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_billing
      REDIS_URL: redis://redis:6379
      KAFKA_BROKERS: kafka:9092
      JWT_SECRET: change_me
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4004:4004" ]
    depends_on: [ mongo, redis, kafka ]

  admin-service:
    build:
      context: ./services/admin
      dockerfile: Dockerfile
    container_name: admin-service
    environment:
      PORT: 4005
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_admin
      REDIS_URL: redis://redis:6379
      JWT_SECRET: change_me
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4005:4005" ]
    depends_on: [ mongo, redis, billing-service, rides-service, drivers-service, customers-service ]

volumes:
  mongo_data:

Usage

# Build & start everything
docker compose up -d --build

# Check logs of a service
docker compose logs -f rides-service

# Stop
docker compose down

1️⃣3️⃣ Local Development (Quick Guide)

You can run services individually without Docker for fast testing.

Prereqs

Node.js 18+, Python 3.10+
MongoDB & Redis running locally (or via docker compose up mongo redis)
Kafka (optional locally, required for billing events)

Start a Service Example: Rides Service

cd services/rides
npm install
npm run dev   # runs on http://localhost:4001

Repeat for other services (drivers → 4002, customers → 4003, billing → 4004, admin → 4005).

ML Service

cd services/dynamic_pricing_model
pip install -r requirements.txt
uvicorn app:app --port 8000 --reload

Seed Minimal Data

# Create customer
curl -X POST http://localhost:4003/api/customers/signup \
  -H "Content-Type: application/json" \
  -d '{"customerId":"CUS-42","email":"jane@demo.com","password":"Pass123"}'

# Create driver
curl -X POST http://localhost:4002/api/drivers/signup \
  -H "Content-Type: application/json" \
  -d '{"driverId":"DRV-100","email":"alex@demo.com","password":"Pass123"}'

Frontend

cd uber-frontend
npm install
npm run dev   # http://localhost:5173

1️⃣4️⃣ Performance & Load Testing

Goal: demonstrate how caching (Redis) and asynchronous billing (Kafka) improve latency, throughput, and stability under load.

Method (High Level)

Tool: JMeter (HTTP test plan)
Workload: concurrent users hitting ride creation, search, status updates, billing queries
Datasets: thousands of drivers/customers/rides preloaded
Scenarios:
- B = Baseline (MongoDB only)
- B+S = Baseline + Redis cache
- B+S+K = Redis + Kafka (async billing)

Results (Visuals)

Avg Response Times — B vs B+S vs B+S+K
Aggregate Report — per-API response times & throughput
Summary Report — overall metrics & success %

What the charts show (key takeaways)

Caching pays first. Moving from B → B+S (adding Redis) delivers the largest drop in average & p95 latency on read-heavy paths (driver search, history, stats) and increases throughput by offloading MongoDB.
Kafka stabilizes write flows. Moving from B+S → B+S+K (adding Kafka) makes completion→billing asynchronous, so ride completion latency stays low and predictable even during spikes; p95 tail improves and error rates drop.
Dashboards stay snappy. Admin analytics backed by Redis remain fast (<~200ms typical) while still reflecting fresh data via short TTLs + invalidation.
Resilience under load. With Kafka, backpressure is absorbed by the broker; Billing catches up without blocking ride flows.

Comparative Summary (trend view)

Scenario	Avg Latency	p95 Latency	Throughput	Error Rate
B	higher	spiky	lower	higher
B+S	lower	lower	higher	lower
B+S+K	lowest	most stable	highest	lowest

For exact values, see the three charts above.

Notes

Why Redis first? Hot-path reads dominate; caching yields immediate wins.
Why Kafka after Redis? It removes a synchronous dependency (billing) from the critical path, improving tail latency and reliability at peak.

1️⃣5️⃣ Security, Reliability & Scalability

Security

JWT auth with role claims (customer, driver, admin)
Passwords hashed with bcrypt
Input validation (IDs, ratings, geo coords, media size/type)
CORS restricted to trusted origins
Secrets from .env (never hardcoded)

Reliability

Idempotent billing (unique rideId)
Kafka at-least-once + DB upsert → no double-charging
Cache invalidation rules keep Redis consistent
Health checks (/healthz) & structured logs with traceId

Scalability

Microservices scale independently (e.g., rides-service during spikes)
Redis absorbs hot reads, offloading MongoDB
Kafka buffers bursts → billing catches up asynchronously
Docker/K8s ready for horizontal scaling

1️⃣6️⃣ Troubleshooting & FAQ

Kafka connection fails inside services

Use kafka:9092 inside Docker, not localhost.
From host tools (JMeter, CLI), use localhost:29092.

Redis not reachable

Check REDIS_URL (redis://redis:6379 in Docker).
Run docker compose ps to ensure container is up.

JWT errors (401 Unauthorized)

Token expired or missing Authorization: Bearer <JWT>.
Re-login to get a fresh token.

CORS issues in frontend

Add your dev origin (http://localhost:5173) to ALLOWED_ORIGINS in each service.

MongoDB slow or errors under load

Ensure indexes (rideId, driverId, customerId, createdAt) exist.
Use Redis cache for frequent reads (search, stats, billing lookups).

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
services		services
ss		ss
uber-frontend		uber-frontend
.gitattributes		.gitattributes
.gitignore		.gitignore
Project_Report_Team3 (1) (1).pdf		Project_Report_Team3 (1) (1).pdf
README.md		README.md
SP25_236_ClassProject_UberSimulation.pdf		SP25_236_ClassProject_UberSimulation.pdf
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json

devarshpatel1506/uber-simulation

Folders and files

Latest commit

History

Repository files navigation

Uber Simulation — Distributed Microservices with Kafka, Redis, Docker & ML

Tagline

1️⃣ Executive Summary — What & Why

What it does

Why it matters

What makes it unique

2️⃣ Project Flow — Detailed Role-Centric Walkthrough

Customer Flow — "Requesting and Experiencing a Ride"

Driver Flow — "Managing Profile & Completing Rides"

Admin Flow — "Overseeing the Ecosystem"

🔑 Role–Action Summary

3️⃣ Project Flow — At a Glance

System Interaction Diagram

Explanation

Role–Action Matrix

4️⃣ System Architecture — High-Level Tech Overview

Architecture Diagram

Service Overview

Infrastructure Components

Why Microservices?

5️⃣ Request Lifecycles & Sequence Diagrams

1. Authentication & JWT Flow

2. Ride Completion → Billing (Event-Driven with Kafka)

3. Nearby Driver Search (with Redis Caching)

4. Admin Analytics (Stats with MongoDB + Redis)

6️⃣ Microservices Deep Dive

Rides Service (4001)

Drivers Service (4002)

Purpose

Top Endpoints

Inter-Service Calls

Redis

MongoDB

Customers Service (4003)

Purpose

Top Endpoints

Inter-Service Calls

MongoDB

Billing Service (4004)

Purpose

Top Endpoints

Kafka

Redis

MongoDB

Admin Service (4005)

Purpose

Top Endpoints

Inter-Service Calls

Redis

MongoDB

Dynamic Pricing Model (FastAPI, 8000)

Purpose

Endpoint

Example

7️⃣ Data Models & Persistence

Entity-Relationship (ER) Diagram

MongoDB Collections & Indexing

Idempotency & Referential Integrity

Billing Idempotency

Ride Lifecycle Checks

Foreign Keys (Logical)

Validation & Security

Data Lifecycle & Retention

Aggregations for Admin Analytics (Examples)

Revenue per Day

Top Drivers by Revenue

8️⃣ Kafka & Eventing

Topic Specification

Event Schema (JSON)

Schema Notes

Producer Logic (Rides Service)

Consumer Logic (Billing Service)

Reliability & Backpressure

Local Operations (Docker)

Describe topic:

Packages