A full-stack, event-driven Uber-like platform showcasing distributed systems design, load balancing, caching, and asynchronous workflows.
Built with a modern microservices architecture, combining backend, ML, infra, and frontend technologies into one scalable system.
From ride request to bill in seconds — horizontally scalable, cache-accelerated, event-driven.
This project is a distributed, microservices-based simulation of Uber, built to demonstrate how modern ride-hailing platforms are engineered for scalability, reliability, and performance.
- Customers can sign up, book rides, view history, and pay bills.
- Drivers can register, manage profiles, accept rides, and track earnings.
- Admins can oversee the entire system, add/manage drivers & customers, view revenue/ride statistics, and generate reports.
- A Dynamic Pricing Service predicts fares in real-time using machine learning.
- Billing is generated asynchronously using Kafka events to decouple ride completion from payment processing.
- Real-world relevance: ride-hailing systems like Uber and Lyft rely heavily on distributed architectures to handle massive concurrency, dynamic pricing, and system resilience.
- System Design strength: this project demonstrates event-driven architecture, load balancing, caching strategies, fault tolerance, and scalability principles expected in modern production systems.
- Interview-ready: showcases hands-on experience with Kafka, Redis, Docker, MongoDB, FastAPI, React, and JMeter — technologies used widely in industry.
- Event-driven billing: Kafka ensures billing is processed asynchronously and idempotently, preventing double charges.
- Performance-optimized: Redis caches accelerate driver search, revenue stats, and ride lookups with cache invalidation strategies.
- Scalable architecture: Docker/Kubernetes deployment enables horizontal scaling across services.
- Data-driven pricing: Machine learning (XGBoost via FastAPI) integrates with the ride lifecycle to provide surge/dynamic pricing predictions.
- End-to-end coverage: From frontend UI (React) to backend microservices, infra (Docker/K8s), messaging (Kafka), and testing (JMeter), the system covers all critical aspects of distributed systems engineering.
The platform is designed for three distinct roles — Customer, Driver, and Admin.
Each role is given tailored capabilities, interfaces, and responsibilities.
Below is a comprehensive walkthrough of how each role interacts with the system, enriched with screenshots and technical insights.
1. Sign Up & Authentication
- Customers start by registering with details such as name, email, phone, and credit card info.
- The credentials are hashed using bcrypt before storage, and a JWT token is issued upon login.
- This token secures all subsequent requests (
Authorization: Bearer <token>
).
2. Booking a Ride
- The customer enters pickup and drop-off locations.
- The Rides Service queries the Drivers Service for available drivers within a 10-mile radius (using Haversine formula).
- At this moment, the Dynamic Pricing ML Service is called to predict an estimated fare (
/predict
with distance, time, passenger count, etc.). - The ride is created in MongoDB with status =
in_progress
.
Behind the scenes:
- Request hits rides-service (4001) → calls drivers-service (4002) + ml-service (8000) → persists in MongoDB.
3. Ride Tracking & Completion
- The customer can see live ride status (requested → accepted → in_progress → completed).
- When marked
completed
, the rides-service produces aride-completed
Kafka event. - Billing-service consumes this event asynchronously to generate a final bill.
Behind the scenes:
- Kafka → ensures billing happens asynchronously, improving throughput.
- Redis caches common queries like
rides:byCustomer:{id}
to accelerate dashboard loads.
4. Billing & Payments
-
Customers can view a detailed billing history: ride time, distance, predicted vs actual fare.
-
Data is fetched from the billing-service (with Redis caching for frequent lookups).
-
Behind the scenes:
- Billing-service ensures idempotency (one bill per rideId).
- Admins can also query the same bills for audits.
5. Feedback & Ratings
- After ride completion, customers can rate their driver (1–5 stars) and leave comments.
- Ratings update the driver’s aggregated score, influencing search results for future customers.
1. Sign Up & Authentication
- Drivers register with ID, car details, license, and insurance.
- Secure auth with JWT tokens; passwords hashed with bcrypt.
2. Profile Management & Media Uploads
- Drivers can upload a short video introduction and profile picture.
- Videos/images are stored locally (
/uploads
) or Mongo (metadata), streamed with HTTP range requests. - All profile changes are cached for faster reads via Redis.
Behind the scenes:
- File handling with Multer; cache invalidation triggered when profile is updated.
3. Accepting Rides
- Drivers see ride requests in their area and accept assignments.
- System updates ride →
driverId
field is set, status =accepted
.
Behind the scenes:
- Updates propagate to Redis (
rides:byDriver:{id}
cache refreshed). - Rides-service ensures drivers cannot double-book (transaction checks).
4. Completing Rides & Earnings Summary
- After completion, driver’s account is updated with earnings, and
ride-completed
Kafka event is fired. - Drivers can view a summary dashboard showing completed rides, ratings, and earnings.
1. Secure Login
- Admins log in with special credentials.
- JWT auth with elevated role privileges.
2. Managing Drivers & Customers
- Admins can add new drivers/customers or deactivate accounts.
- These operations proxy calls to the drivers-service and customers-service.
3. Monitoring System Statistics
- View daily revenue, rides by area, rides per driver, rides per customer.
- Graphs generated via MongoDB aggregations + cached in Redis for dashboard speed.
4. Billing Oversight
- Admins can search bills, audit discrepancies, and ensure fairness in pricing.
Action / Feature | Customer | Driver | Admin |
---|---|---|---|
Register & Login | ✅ | ✅ | ✅ |
Book/Accept/Complete Ride | ✅ | ✅ | — |
Billing (view, history, audits) | ✅ | ✅ | ✅ |
Profile & Media | ✅ | ✅ | ✅ |
Ratings & Reviews | ✅ | ✅ | — |
Manage Users | — | — | ✅ |
View Revenue & Stats | — | — | ✅ |
While each role has its own journey, the entire system works together as a distributed, event-driven platform.
This high-level flow shows how Customers, Drivers, and Admins interact with the microservices, and how Kafka, Redis, and MongoDB glue everything together.
flowchart LR
%% Clients
subgraph Clients
CU[Customer UI]
DR[Driver UI]
AD[Admin UI]
end
%% Services
subgraph Services
R1[Rides 4001]
D1[Drivers 4002]
C1[Customers 4003]
B1[Billing 4004]
A1[Admin 4005]
end
%% ML
subgraph ML
ML1[Dynamic Pricing API 8000]
end
%% Infra
subgraph Infra
K[(Kafka)]
X[(Redis)]
M[(MongoDB)]
end
%% Flows
CU --> R1
DR --> D1
AD --> A1
R1 <--> D1
R1 --> ML1
A1 --> D1
A1 --> C1
A1 --> B1
R1 <--> M
D1 <--> M
C1 <--> M
B1 <--> M
A1 <--> M
R1 -.-> K
K -.-> B1
R1 <--> X
D1 <--> X
B1 <--> X
- Customers request rides → handled by Rides Service, which queries Drivers Service for nearby drivers and calls ML Service for dynamic fare prediction.
- Drivers accept rides via Drivers Service, which updates ride status in MongoDB and invalidates Redis caches.
- When a ride is completed, Rides Service emits a
ride-completed
event to Kafka. - Billing Service consumes this event, generates the final bill, stores it in MongoDB, and caches frequent queries in Redis.
- Admins interact with Admin Service to manage users, audit bills, and view system-wide statistics (fueled by MongoDB aggregations + Redis cache).
- Redis ensures fast reads (driver search, revenue stats, ride lookups).
- Kafka decouples services, ensuring that ride completion and billing remain scalable and resilient.
- MongoDB stores all persistent entities (Drivers, Customers, Rides, Bills, Reviews, Media metadata).
Action / Feature | Customer | Driver | Admin |
---|---|---|---|
Register & Login | ✅ | ✅ | ✅ |
Book a Ride | ✅ | — | — |
Accept / Complete Ride | — | ✅ | — |
View Billing & History | ✅ | ✅ | ✅ |
Profile & Media | ✅ | ✅ | ✅ |
Ratings & Reviews | ✅ | ✅ | — |
Manage Users (CRUD) | — | — | ✅ |
Revenue & Ride Analytics | — | — | ✅ |
Audit Bills | — | — | ✅ |
At its core, the Uber Simulation is built on a microservices architecture.
Each domain (Rides, Drivers, Customers, Billing, Admin, Pricing) is implemented as a separate service, allowing for independent development, deployment, and scaling.
The services communicate via REST APIs (synchronous) and Kafka events (asynchronous), while Redis accelerates hot lookups and MongoDB persists system state.
flowchart TB
%% Layers
subgraph L0["Client Layer"]
FE["React + Redux (Frontend)"]
end
subgraph L1["Edge / Networking"]
GW["Ingress / Reverse Proxy (NGINX)"]
end
subgraph L2["Microservices Layer"]
R1["Rides svc :4001"]
D1["Drivers svc :4002"]
C1["Customers svc :4003"]
B1["Billing svc :4004"]
A1["Admin svc :4005"]
ML["Dynamic Pricing (FastAPI) :8000"]
end
subgraph L3["Data & Infra Layer"]
DB[(MongoDB)]
RD[(Redis)]
KF[(Kafka Broker)]
ZK[(Zookeeper)]
end
%% Flows
FE -->|HTTPS| GW
GW -->|HTTP| R1
GW -->|HTTP| D1
GW -->|HTTP| C1
GW -->|HTTP| B1
GW -->|HTTP| A1
R1 -->|"HTTP /predict"| ML
R1 --- DB
D1 --- DB
C1 --- DB
B1 --- DB
A1 --- DB
R1 --- RD
D1 --- RD
B1 --- RD
R1 -. "produce ride-completed" .-> KF
KF -. "consume" .-> B1
ZK --- KF
Service | Port | Responsibilities | Key Tech |
---|---|---|---|
Rides Service | 4001 | Core ride lifecycle: create, update, near-by driver search, reviews, statistics; produces ride-completed Kafka events; integrates with ML service for fare prediction. |
Node.js, Express, MongoDB, Redis, Kafka Producer |
Drivers Service | 4002 | Driver auth & profile management, car/insurance details, intro videos, search (cached). | Node.js, Express, MongoDB, Redis |
Customers Service | 4003 | Customer auth, profile management, ride history links to rides/billing. | Node.js, Express, MongoDB |
Billing Service | 4004 | Bill generation & search; consumes ride-completed Kafka events; ensures idempotency. |
Node.js, Express, MongoDB, Redis, Kafka Consumer |
Admin Service | 4005 | Admin auth, add/manage drivers & customers, revenue & ride statistics, billing audits. | Node.js, Express, MongoDB, Redis |
Dynamic Pricing Model | 8000 | Machine learning service providing estimated_price predictions during ride creation. | FastAPI, Python, XGBoost, Joblib |
- MongoDB (Atlas/local) → primary data store for all entities (Drivers, Customers, Rides, Billing, Reviews, Media metadata).
- Redis → cache for driver searches, revenue stats, ride/billing lookups.
- Kafka → event bus connecting rides → billing; ensures decoupled, resilient workflows.
- Docker → containerization of all services for local and cloud deployment.
- Kubernetes (K8s-ready) → orchestration layer for scalability and load balancing.
- JMeter → load/performance testing across scenarios (B, B+S, B+S+K).
- React + Redux → frontend client (Customer, Driver, Admin portals).
- Scalability: each service can be scaled independently (e.g., rides-service under heavy load).
- Resilience: failure in billing-service won’t block ride creation thanks to Kafka decoupling.
- Technology fit: Python/ML model isolated from Node.js services.
- Team productivity: each service can be developed & deployed by separate teams.
Understanding the internal workflows is crucial for evaluating system design.
Below are the four most important request lifecycles, illustrated with sequence diagrams.
sequenceDiagram
participant U as User (Customer/Driver/Admin)
participant S as Service (e.g., Drivers)
participant DB as MongoDB
participant J as JWT Middleware
U->>S: POST /login (email + password)
S->>DB: Verify credentials (bcrypt hash)
DB-->>S: User found + valid
S-->>U: 200 OK + { token: "Bearer <JWT>" }
U->>S: GET /protected (Authorization: Bearer <JWT>)
S->>J: Verify token + role
J-->>S: OK (req.user populated)
S-->>U: Protected resource JSON
Key Points
- Passwords stored as bcrypt hashes.
- JWT includes user role → used for authorization in role-specific routes.
- Stateless → services can scale horizontally without sticky sessions.
sequenceDiagram
participant C as Customer
participant R as Rides Service
participant ML as ML Service
participant K as Kafka Broker
participant B as Billing Service
participant M as MongoDB
C->>R: POST /api/rides (pickup, dropoff)
R->>ML: POST /predict (distance, time, passengers, etc.)
ML-->>R: { estimated_price }
R->>M: Save ride { status: in_progress, estimatedPrice }
C->>R: PATCH /api/rides/:id/status completed
R->>K: produce("ride-completed", ride data)
K-->>B: consume("ride-completed")
B->>M: Insert/Upsert Bill (idempotent by rideId)
B-->>C: Bill generated & available
Key Points
- Asynchronous decoupling: rides-service doesn’t wait for billing → higher throughput.
- Idempotency: billing ensures no duplicate bills for same rideId.
- Scalability: Kafka can buffer load spikes (durable queue).
sequenceDiagram
participant C as Customer
participant R as Rides Service
participant D as Drivers Service
participant X as Redis
participant M as MongoDB
C->>R: GET /api/rides/nearby-drivers?lat=...&lng=...
R->>X: Check cache key driver:search:{lat,lng}
alt Cache hit
X-->>R: Cached list of drivers
else Cache miss
R->>D: Query drivers within 10 miles (Haversine formula)
D->>M: Geo query in MongoDB
M-->>D: List of drivers
D-->>R: Driver list
R->>X: Cache results with TTL=60s
end
R-->>C: List of available drivers
Key Points
- Redis reduces repeated queries for popular areas (e.g., airports).
- TTL ensures data freshness (drivers update frequently).
- Reduces MongoDB load under high concurrency.
sequenceDiagram
participant A as Admin
participant AS as Admin Service
participant B as Billing Service
participant X as Redis
participant M as MongoDB
A->>AS: GET /api/admin/statistics/revenue
AS->>X: Check cache key stats:revenue:day:{date}
alt Cache hit
X-->>AS: Cached revenue data
else Cache miss
AS->>B: Request billing summary
B->>M: Aggregate bills by date
M-->>B: { totalRevenue, ridesPerArea, ridesPerDriver }
B-->>AS: Aggregated stats
AS->>X: Cache stats for 5 mins
end
AS-->>A: Render revenue dashboard (charts)
Key Points
- MongoDB aggregation pipelines compute totals, grouped by day/area/driver.
- Redis ensures dashboards load fast (<200ms).
- Admin sees updated revenue with minimal DB load.
Each domain is implemented as an independent service with its own API, data model, caching strategy, and (where applicable) Kafka role.
Notation:
HTTP→ outbound service call • K = Kafka role • R = Redis keys • DB = Mongo collections
Purpose
Core ride lifecycle: create/update, nearby driver search, ride statistics, reviews, media metadata, and produces ride-completed
on finish. Integrates with ML for fare prediction.
Top Endpoints
Method | Path | Purpose |
---|---|---|
POST |
/api/rides |
Create ride (calls ML /predict , finds nearest driver) |
PATCH |
/api/rides/:id/status |
Update status; on completed → produce Kafka ride-completed |
GET |
/api/rides/nearby-drivers?lat=&lng= |
Haversine search for drivers around a point |
GET |
/api/rides/statistics |
Aggregated stats (revenue/day, rides/hour/area/driver/customer) |
GET |
/api/rides/:id / DELETE |
Get / delete ride |
POST |
/api/rides/:id/images |
Attach image metadata to ride |
POST |
/api/rides/reviews |
Create review (customer↔driver) |
GET |
/api/rides/reviews/user/:userId |
Reviews by user |
Inter-Service Calls (HTTP→)
- HTTP→ Drivers: fetch drivers for nearby search / details.
- HTTP→ ML:
POST /predict
to calculateestimatedPrice
.
Kafka (K)
- Producer:
ride-completed
(payload includesrideId
,driverId
,customerId
,distanceKm
,predictedPrice
,actualPrice
,startedAt
,endedAt
). - Idempotency target: Billing upserts by
rideId
.
Redis (R)
- Reads/Writes:
driver:search:{lat}:{lng}
(60s) – cached search results.rides:byDriver:{driverId}
(60s) – recent rides for driver.
- Invalidate on write (status change, create/delete).
Mongo (DB)
rides
(ride lifecycle),reviews
(ratings),media
(metadata).
Sample cURL
# Create ride (server will call ML for estimated price)
curl -X POST http://localhost:4001/api/rides \
-H "Content-Type: application/json" \
-d '{"customerId":"CUS-42","pickup":{"lat":37.77,"lng":-122.42},"dropoff":{"lat":37.79,"lng":-122.39},"passengerCount":1}'
Driver identity & profile management:
- Signup / login (JWT, bcrypt)
- Car & insurance details
- Intro video upload/stream
- Cached search for nearby drivers
- Summaries for dashboards
Method | Path | Purpose |
---|---|---|
POST | /api/drivers/signup / /login |
Auth (JWT), bcrypt password storage |
GET | /api/drivers / /search?q= |
List drivers & cached search |
GET | /api/drivers/:id |
Fetch profile |
PUT | /api/drivers/:id |
Update profile |
DELETE | /api/drivers/:id |
Delete profile |
POST | /api/drivers/:id/video |
Upload intro video |
GET | /api/drivers/:id/video |
Stream intro video (HTTP range) |
GET | /api/drivers/:driverId/summary |
Aggregates for dashboards (earnings, ratings) |
- Serves data to Rides (nearby search)
- Serves data to Admin (management & dashboards)
driver:search:{q}
(TTL 60s)driver:summary:{driverId}
(TTL 60s)- Invalidate on profile updates
drivers
→ profile, vehicle, insurance, locationvideos
→ paths, metadata
Customer authentication & profile management:
- Signup/login
- Profile CRUD
- Links to rides/billing via UI/API layer
Method | Path | Purpose |
---|---|---|
POST | /api/customers/signup / /login |
Auth (JWT) |
GET | /api/customers |
List customers |
GET | /api/customers/:id |
Fetch details |
PUT | /api/customers/:id |
Update profile |
DELETE | /api/customers/:id |
Delete profile |
- Called by Admin for management
- UI fetches rides/billing directly from their services
customers
→ PII, address, masked card refs, preferences
- Generate & search bills
- Consume ride-completed events
- Guarantee idempotency (unique
rideId
)
Method | Path | Purpose |
---|---|---|
POST | /api/billing/rides/:rideId |
Manually create bill (check by rideId) |
GET | /api/billing/:billId |
Fetch single bill |
GET | /api/billing/search?driverId=&customerId=&status= |
Search bills |
GET | /api/billing/customer/:customerId |
Bills by customer |
GET | /api/billing/driver/:driverId |
Bills by driver |
- Consumer:
ride-completed
- Group:
billing-consumer-group
- On consume → upsert into billing collection → cache hot queries
billing:byUser:{userId}:{role}
(TTL 60s) → customer/driver listsstats:revenue:day:{YYYY-MM-DD}
(TTL 300s) → precomputed daily revenue
billing
→ predicted vs actual, totals, timestamps, status
Administrative plane:
- Privileged authentication
- Manage drivers/customers
- Financial & ride analytics
- Bill audits
Method | Path | Purpose |
---|---|---|
POST | /api/admin/signup / /login |
Admin authentication |
POST | /api/admin/drivers / /customers |
Proxy create via services |
GET | /api/admin/statistics/revenue |
Revenue per day (charts) |
GET | /api/admin/statistics/rides |
Rides per area/driver/customer |
GET | /api/admin/bills/search |
Search bills |
GET | /api/admin/bills/:billId |
Billing tools |
- HTTP → Drivers/Customers for management
- HTTP → Billing for audits & stats
- HTTP → Rides for ride metrics
- Reads cached stats & billing keys
- May set dashboard caches (TTL 5m)
- Minimal own state
- Mostly queries across other domains
Predict estimated_price
during ride creation based on features:
- Distance (km)
- Time of day
- Weekend / night flag
- Passenger count
Method | Path | Body | Response |
---|---|---|---|
POST | /predict |
{ distance_km, passenger_count, hour, day_of_week, is_weekend, is_night } |
{ "estimated_price": <float> } |
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"distance_km": 12.5,
"passenger_count": 1,
"hour": 18,
"day_of_week": 5,
"is_weekend": 1,
"is_night": 0
}'
The system uses MongoDB for operational data across domains (Drivers, Customers, Rides, Billing, Reviews, Media metadata).
Schemas are optimized for read performance (dashboards/search), idempotency (billing), and evolving structures (optional media/metadata).
erDiagram
DRIVER {
string _id
string driverId "external id (SSN-like)"
string email "unique"
string passwordHash
string name
object carDetails "make, model, plate"
object insurance "policyNo, expiresAt"
object location "lat, lng"
int rating
string videoPath
date createdAt
date updatedAt
}
CUSTOMER {
string _id
string customerId "external id"
string email "unique"
string passwordHash
string name
object address "city, state, zip"
object card "tokenized ref"
int rating
date createdAt
date updatedAt
}
RIDE {
string _id
string rideId "unique human id"
datetime dateTime
object pickup "lat, lng, address"
object dropoff "lat, lng, address"
string driverId
string customerId
number distanceKm
number estimatedPrice
number actualPrice
string status "requested|accepted|in_progress|completed|canceled"
array media "mediaId[]"
date createdAt
date updatedAt
}
BILLING {
string _id
string billingId "BILL-<timestamp>"
string rideId "unique (idempotency)"
string driverId
string customerId
number predictedPrice
number actualPrice
number distanceKm
datetime startedAt
datetime endedAt
string status "created|paid|void"
date createdAt
date updatedAt
}
REVIEW {
string _id
string rideId
string reviewerId
string revieweeId
string reviewerType "driver|customer"
int rating "1..5"
string comment
date createdAt
}
MEDIA {
string _id
string rideId
string ownerId "driverId|customerId"
string path "local/S3 url"
string type "image|video"
number sizeBytes
string contentType
date createdAt
}
DRIVER ||--o{ RIDE : drives
CUSTOMER ||--o{ RIDE : books
RIDE ||--o{ BILLING: generates
RIDE ||--o{ REVIEW : has
DRIVER ||--o{ REVIEW : receives
CUSTOMER ||--o{ REVIEW : receives
RIDE ||--o{ MEDIA : attaches
Collection | Key Fields | Recommended Indexes | Notes |
---|---|---|---|
drivers |
_id, driverId, email, location(lat,lng), rating, videoPath |
email (unique), driverId (unique), (future) 2dsphere on location |
Cached search results in Redis via driver:search:{q} ; summary cache driver:summary:{driverId} . |
customers |
_id, customerId, email, address, rating |
email (unique), customerId (unique) |
Card data should be tokenized (never store PAN). |
rides |
_id, rideId, driverId, customerId, status, dateTime, pickup/dropoff.lat/lng |
rideId (unique), driverId , customerId , status , dateTime , (future) 2dsphere on pickup/dropoff |
Hot path for dashboards; rides:byDriver:{driverId} cache with TTL. |
billing |
_id, billingId, rideId, driverId, customerId, predictedPrice, actualPrice, status |
rideId (unique), driverId , customerId , status , createdAt |
Idempotency by rideId (prevents duplicate bills from repeated events). |
reviews |
_id, rideId, reviewerId, revieweeId, reviewerType, rating |
rideId , revieweeId , rating |
Used to compute rating aggregates in app/service layer. |
media |
_id, rideId, ownerId, path, type, sizeBytes, contentType |
rideId , ownerId , type , createdAt |
Store only metadata in DB; file on disk or S3; serve via signed URLs/range. |
billing.rideId
is unique.- Kafka consumer performs upsert by
rideId
to avoid duplicate bills when theride-completed
event is replayed.
- Validate legal status transitions:
requested → accepted → in_progress → completed
- Prevent double completion or invalid jumps (e.g.,
in_progress → canceled
).
rides.driverId
andrides.customerId
must reference existing documents.- Enforce via pre-create checks in services (and optional MongoDB schema validation).
- Passwords: store bcrypt hashes (never plain text).
- Emails & IDs: validate on input; email uniqueness enforced at DB-level.
- Coordinates:
lat ∈ [-90, 90]
,lng ∈ [-180, 180]
. - Reviews: enforce
rating ∈ [1..5]
. - Media: enforce content types & size limits on upload; sanitize filenames; store paths only.
- JWT: required for protected endpoints; role claims (
admin|driver|customer
) checked per route.
- Rides/Billing: retain indefinitely for analytics; optionally archive to cold collections or a data lake for historical reporting.
- Media: apply TTL or move to cheaper storage (e.g., S3 Glacier) after N days.
- Caches (Redis): ephemeral; short TTLs (60s–5m) tuned per key; safe to flush during incidents.
- Soft deletes (optional): add
isActive
/deletedAt
to drivers/customers to avoid hard deletes.
db.billing.aggregate([
{ $match: { status: "paid" } },
{
$group: {
_id: { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" } },
totalRevenue: { $sum: "$actualPrice" },
rides: { $sum: 1 }
}
},
{ $sort: { _id: 1 } }
])
db.billing.aggregate([
{ $match: { status: "paid" } },
{
$group: {
_id: "$driverId",
revenue: { $sum: "$actualPrice" },
rides: { $sum: 1 }
}
},
{ $sort: { revenue: -1 } },
{ $limit: 10 }
])
The system uses Apache Kafka to decouple ride completion from billing.
By emitting a ride-completed
event, the Rides Service hands off billing work to the Billing Service asynchronously, improving throughput and resilience.
Topic | Purpose | Producer | Consumer | Partitions | Replication | Key |
---|---|---|---|---|---|---|
ride-completed |
Notify that a ride finished (billable) | Rides Service | Billing Service (CG) | 3–6 (cfg) | 1–3 (cfg) | rideId (str) |
- Partitioning strategy:
keyBy(rideId)
keeps the same ride’s messages ordered on a single partition → simplifies idempotent upsert logic in Billing. - Consumer group:
billing-consumer-group
(scales horizontally; each instance gets a subset of partitions). - Delivery semantics: at-least-once (consumer commits after upsert). With upsert idempotency in Billing, duplicates are safe.
{
"eventType": "ride-completed",
"version": 1,
"rideId": "RIDE-2025-09-18-00123",
"driverId": "DRV-9",
"customerId": "CUS-42",
"distanceKm": 12.1,
"predictedPrice": 18.75,
"actualPrice": 19.40,
"startedAt": "2025-09-18T10:00:00Z",
"endedAt": "2025-09-18T10:25:00Z",
"metadata": {
"source": "rides-service",
"emittedAt": "2025-09-18T10:25:05Z",
"traceId": "f5c9…"
}
}
- Include
eventType
&version
to evolve payloads safely. traceId
helps correlate logs across Rides ↔ Kafka ↔ Billing.
- Emit exactly one event when status transitions to
completed
. - Use synchronous confirmation (
await Kafka produce
) or buffered with retry/backoff. - Attach
rideId
as the message key.
Example (Node.js with kafkajs)
import { Kafka } from "kafkajs";
const kafka = new Kafka({ clientId: "rides", brokers: [process.env.KAFKA_BROKERS] });
const producer = kafka.producer();
await producer.connect();
await producer.send({
topic: "ride-completed",
messages: [{
key: ride.rideId,
value: JSON.stringify({
eventType: "ride-completed",
version: 1,
rideId: ride.rideId,
driverId: ride.driverId,
customerId: ride.customerId,
distanceKm: ride.distanceKm,
predictedPrice: ride.estimatedPrice,
actualPrice: ride.actualPrice,
startedAt: ride.startedAt,
endedAt: ride.endedAt,
metadata: { source: "rides-service", emittedAt: new Date().toISOString(), traceId }
})
}]
});
- At-least-once processing → commit offset only after successful upsert.
- Idempotency:
billing
collection has a unique index onrideId
; consumer performs upsert byrideId
to avoid duplicates. - Failure handling: retry with backoff; if still failing, log & (optional) publish to a DLQ topic.
Example (Node.js with kafkajs)
import { Kafka } from "kafkajs";
const kafka = new Kafka({ clientId: "billing", brokers: [process.env.KAFKA_BROKERS] });
const consumer = kafka.consumer({ groupId: "billing-consumer-group" });
await consumer.connect();
await consumer.subscribe({ topic: "ride-completed", fromBeginning: false });
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
const payload = JSON.parse(message.value.toString());
try {
// Upsert by rideId (idempotent)
await BillingModel.updateOne(
{ rideId: payload.rideId },
{
$set: {
driverId: payload.driverId,
customerId: payload.customerId,
predictedPrice: payload.predictedPrice,
actualPrice: payload.actualPrice,
distanceKm: payload.distanceKm,
startedAt: new Date(payload.startedAt),
endedAt: new Date(payload.endedAt),
status: "created",
updatedAt: new Date()
},
$setOnInsert: { billingId: `BILL-${Date.now()}`, createdAt: new Date() }
},
{ upsert: true }
);
// offset is auto-committed by kafkajs unless manual commit mode is enabled
} catch (err) {
console.error("Billing consume error:", err);
// Optional Dead Letter Queue (DLQ)
// await dlqProducer.send({
// topic: "ride-completed.DLQ",
// messages: [{ key: payload.rideId, value: JSON.stringify(payload) }]
// });
}
}
});
- At-least-once + idempotent upsert → safe duplicates, never double-charge.
- Consumer group scaling: run N billing instances → Kafka partitions are divided → linear throughput gains (bounded by partitions).
- Backpressure: if Billing lags, Kafka buffers messages durably; Rides remains fast.
- Retries: exponential backoff on transient DB/Redis errors.
Create topic (if auto-create disabled):
docker exec -it kafka kafka-topics \
--create --topic ride-completed \
--bootstrap-server kafka:9092 \
--partitions 3 --replication-factor 1
docker exec -it kafka kafka-topics \
--describe --topic ride-completed \
--bootstrap-server kafka:9092
docker exec -it kafka kafka-console-consumer \
--bootstrap-server kafka:9092 \
--topic ride-completed --from-beginning
docker exec -it kafka kafka-console-producer \
--broker-list kafka:9092 --topic ride-completed
# paste a JSON line and press Enter to send
In Docker Compose, services reach Kafka via the hostname
kafka:9092
.
On host tools, use the advertised listenerlocalhost:29092
(if configured).
- Metrics to watch: consumer lag per partition, produce/consume rates, error counts, retry counts.
- Logging: include
traceId
across Rides → Kafka → Billing to correlate events. - Future: add Prometheus exporters (e.g., Burrow for consumer lag) + Grafana dashboards.
- Why Kafka? Asynchronous decoupling improves ride throughput and system resilience vs. synchronous billing API calls.
- Why at-least-once (not exactly-once)? Simpler + robust with idempotent DB upserts; operationally safer than coordinated transactions.
- Why key by
rideId
? Ensures per-ride ordering and simplifies idempotent consumer logic. - What about DLQ? Optional safety net for poison messages; keeps main consumer healthy while problematic events are quarantined.
To reduce MongoDB query load and deliver sub-200ms responses on hot paths, the system uses Redis as a distributed in-memory cache.
Redis follows a cache-aside pattern:
- Service checks Redis first.
- On cache miss → query MongoDB → return result + populate Redis with TTL.
- On updates → invalidate affected cache keys.
Key Pattern | Value | TTL | Writer Service | Reader Services |
---|---|---|---|---|
driver:search:{q} |
JSON list of drivers for search | 60s | Drivers | Rides, Admin |
driver:summary:{driverId} |
Aggregated stats for profile/dashboard | 60s | Drivers | Admin |
rides:byDriver:{driverId} |
List of rides for driver | 60s | Rides | Admin |
billing:byUser:{userId}:{role} |
Bills for customer/driver | 60s | Billing | Admin, Customer |
stats:revenue:day:{YYYY-MM-DD} |
Revenue + ride counts for day | 300s | Billing/Admin | Admin dashboard |
- Driver profile update →
DEL driver:search:*
anddriver:summary:{driverId}
. - New ride / status update →
DEL rides:byDriver:{driverId}
. - Billing created/updated →
DEL billing:byUser:*
andstats:revenue:day:*
. - Admin dashboards refresh every 5 minutes → Redis keys expire naturally.
sequenceDiagram
participant C as Customer
participant R as Rides Service
participant D as Drivers Service
participant X as Redis
participant M as MongoDB
C->>R: GET /api/rides/nearby-drivers?lat=..&lng=..
R->>X: Check key driver:search:{lat}:{lng}
alt Cache hit
X-->>R: Cached drivers (ms response)
else Cache miss
R->>D: Fetch drivers in area
D->>M: Geo query (Haversine in Mongo)
M-->>D: Drivers list
D-->>R: Drivers list
R->>X: Cache result (TTL 60s)
end
R-->>C: Driver list
- Baseline (B): Every request hits MongoDB → latency spikes under load.
- B+S (Base + SQL Caching): Redis absorbs hot read traffic → p95 latency cut by ~40%, throughput ↑.
- B+S+K (Base + Redis + Kafka): Billing async + Redis caching → smoothest performance; Mongo load reduced drastically.
See performance graphs in Section 14 — Load Testing.
- Why cache-aside? Simple, widely used; services decide what to cache.
- Why short TTLs (60s–300s)? Keeps data fresh (drivers move constantly, revenue updates every few mins).
- Why Redis over in-process cache? Distributed; works across multiple service instances → safe for horizontal scaling.
- What about consistency? Slight staleness tolerated (e.g., nearby drivers list). Strong consistency maintained via invalidation on updates.
The platform includes a machine learning microservice to simulate Uber's dynamic pricing model.
This ensures fares reflect demand, supply, and context (time, location, conditions).
- Framework: FastAPI (Python) serving an XGBoost regression model.
- Trained on: Uber Fares Kaggle dataset (pickup/dropoff coordinates, datetime, passenger count, fare amount).
- Serialization: Model persisted via joblib for fast loading.
- Serving: Uvicorn ASGI server, containerized with Docker.
Feature | Type | Example | Why it matters |
---|---|---|---|
distance_km |
float | 12.5 |
Longer trips → higher base fare |
passenger_count |
int | 1 |
More passengers → adjusted pricing |
hour |
int (0–23) | 18 (6 PM) |
Captures rush hour patterns |
day_of_week |
int (0–6) | 5 (Friday) |
Captures weekday vs weekend demand |
is_weekend |
binary | 1 |
Surge more likely on weekends |
is_night |
binary | 0 |
Night trips may have premiums |
Method | Path | Body | Response |
---|---|---|---|
POST |
/predict |
JSON with feature set | { "estimated_price": <float> } |
Example Request
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"distance_km": 12.5,
"passenger_count": 1,
"hour": 18,
"day_of_week": 5,
"is_weekend": 1,
"is_night": 0
}'
Example Response
{ "estimated_price": 21.37 }
- Customer books a ride →
rides-service
calls ML service with trip features. - ML service returns predicted fare (
estimatedPrice
). - Rides-service persists the ride with this price.
- After ride completion → Billing compares predicted vs actual fare → stores both for auditing.
- Why separate ML service? Decouples Python stack from Node.js services; can scale independently.
- Why FastAPI? Lightweight, async-friendly, production-ready for ML serving.
- Why dynamic pricing? Simulates real Uber “surge” behavior where supply-demand elasticity impacts pricing.
- What if ML fails?
rides-service
can fallback to a static formula (distance × rate
).
The system exposes REST APIs across microservices.
Below are the most important endpoints, grouped by service, with examples.
Create Ride
POST /api/rides
Content-Type: application/json
Authorization: Bearer <JWT>
{
"customerId": "CUS-42",
"pickup": { "lat": 37.77, "lng": -122.42 },
"dropoff": { "lat": 37.79, "lng": -122.39 },
"passengerCount": 1
}
Example Response
{
"rideId": "RIDE-2025-0001",
"status": "in_progress",
"estimatedPrice": 21.37
}
Update Ride Status (→ triggers Kafka)
PATCH /api/rides/:id/status
Authorization: Bearer <JWT>
{ "status": "completed", "actualPrice": 22.10 }
Signup
POST /api/drivers/signup
Content-Type: application/json
{
"driverId": "DRV-100",
"email": "alex@demo.com",
"password": "SafePass123!",
"carDetails": { "make": "Toyota", "model": "Prius" }
}
Search Drivers (cached in Redis)
GET /api/drivers/search?q=Prius
Authorization: Bearer <JWT>
Signup
POST /api/customers/signup
Content-Type: application/json
{
"customerId": "CUS-42",
"email": "jane@demo.com",
"password": "SafePass123!",
"address": { "city": "San Jose", "state": "CA", "zip": "95123" }
}
Get Customer Profile
GET /api/customers/CUS-42
Authorization: Bearer <JWT>
Get Bill by Ride
GET /api/billing/rides/RIDE-2025-0001
Authorization: Bearer <JWT>
Example Response
{
"billingId": "BILL-17475391",
"rideId": "RIDE-2025-0001",
"predictedPrice": 21.37,
"actualPrice": 22.10,
"status": "created"
}
Search Bills
GET /api/billing/search?driverId=DRV-100&status=created
Authorization: Bearer <JWT>
Admin Login
POST /api/admin/login
Content-Type: application/json
{ "email": "admin@demo.com", "password": "AdminPass123!" }
Get Revenue Stats
GET /api/admin/statistics/revenue
Authorization: Bearer <JWT>
Example Response
{
"date": "2025-09-18",
"totalRevenue": 12340.75,
"rides": 842
}
Predict Fare
POST /predict
Content-Type: application/json
{
"distance_km": 12.5,
"passenger_count": 1,
"hour": 18,
"day_of_week": 5,
"is_weekend": 1,
"is_night": 0
}
Example Response
{ "estimated_price": 21.37 }
- All protected endpoints require
Authorization: Bearer <JWT>
. - JWT tokens embed role claims (
customer
,driver
,admin
) → enforced in route middleware. - Responses are always in JSON format.
- Errors follow the structure:
{ "error": "Message" }
This section shows how to configure environments and run the full stack locally with Docker.
All services are 12-factor style: configuration comes from environment variables.
Var | Rides | Drivers | Customers | Billing | Admin | ML | Infra | Notes |
---|---|---|---|---|---|---|---|---|
PORT |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | Service port (4001–4005, 8000) |
NODE_ENV |
✅ | ✅ | ✅ | ✅ | ✅ | — | — | development | production |
MONGO_URI |
✅ | ✅ | ✅ | ✅ | ✅ | — | — | e.g. mongodb://mongo:27017/uber_sim or separate DBs per svc |
REDIS_URL |
✅ | ✅ | — | ✅ | ✅ | — | — | redis://redis:6379 |
KAFKA_BROKERS |
✅ | — | — | ✅ | — | — | — | kafka:9092 inside Docker |
JWT_SECRET |
✅ | ✅ | ✅ | ✅ | ✅ | — | — | Same secret across Node services |
ML_URL |
✅ | — | — | — | — | — | — | http://ml-service:8000/predict |
ALLOWED_ORIGINS |
✅ | ✅ | ✅ | ✅ | ✅ | — | — | CORS (comma-separated) |
LOG_LEVEL |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | info | debug |
MODEL_PATH |
— | — | — | — | — | ✅ | — | e.g. /app/models/xgb.joblib |
Create
.env
files from.env.example
under each service directory and populate these values.
Save as docker-compose.yml
in repo root (replace if you already have one).
This brings up Mongo, Redis, Zookeeper, Kafka, the ML service, and all Node services.
version: "3.9"
services:
mongo:
image: mongo:6
container_name: mongo
ports: [ "27017:27017" ]
volumes:
- mongo_data:/data/db
environment:
MONGO_INITDB_DATABASE: uber_sim
redis:
image: redis:7
container_name: redis
ports: [ "6379:6379" ]
zookeeper:
image: confluentinc/cp-zookeeper:7.6.1
container_name: zookeeper
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:7.6.1
container_name: kafka
depends_on: [ zookeeper ]
ports:
- "29092:29092" # host access
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_LISTENERS: "PLAINTEXT://kafka:9092,PLAINTEXT_HOST://0.0.0.0:29092"
KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092"
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
ml-service:
build:
context: ./services/dynamic_pricing_model
dockerfile: Dockerfile
container_name: ml-service
environment:
PORT: 8000
LOG_LEVEL: info
MODEL_PATH: /app/models/xgb.joblib
ports: [ "8000:8000" ]
depends_on: [ mongo ]
rides-service:
build:
context: ./services/rides
dockerfile: Dockerfile
container_name: rides-service
environment:
PORT: 4001
NODE_ENV: development
MONGO_URI: mongodb://mongo:27017/uber_rides
REDIS_URL: redis://redis:6379
KAFKA_BROKERS: kafka:9092
JWT_SECRET: change_me
ML_URL: http://ml-service:8000/predict
LOG_LEVEL: info
ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
ports: [ "4001:4001" ]
depends_on: [ mongo, redis, kafka, ml-service ]
drivers-service:
build:
context: ./services/drivers
dockerfile: Dockerfile
container_name: drivers-service
environment:
PORT: 4002
NODE_ENV: development
MONGO_URI: mongodb://mongo:27017/uber_drivers
REDIS_URL: redis://redis:6379
JWT_SECRET: change_me
LOG_LEVEL: info
ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
ports: [ "4002:4002" ]
depends_on: [ mongo, redis ]
customers-service:
build:
context: ./services/customers
dockerfile: Dockerfile
container_name: customers-service
environment:
PORT: 4003
NODE_ENV: development
MONGO_URI: mongodb://mongo:27017/uber_customers
JWT_SECRET: change_me
LOG_LEVEL: info
ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
ports: [ "4003:4003" ]
depends_on: [ mongo ]
billing-service:
build:
context: ./services/billing
dockerfile: Dockerfile
container_name: billing-service
environment:
PORT: 4004
NODE_ENV: development
MONGO_URI: mongodb://mongo:27017/uber_billing
REDIS_URL: redis://redis:6379
KAFKA_BROKERS: kafka:9092
JWT_SECRET: change_me
LOG_LEVEL: info
ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
ports: [ "4004:4004" ]
depends_on: [ mongo, redis, kafka ]
admin-service:
build:
context: ./services/admin
dockerfile: Dockerfile
container_name: admin-service
environment:
PORT: 4005
NODE_ENV: development
MONGO_URI: mongodb://mongo:27017/uber_admin
REDIS_URL: redis://redis:6379
JWT_SECRET: change_me
LOG_LEVEL: info
ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
ports: [ "4005:4005" ]
depends_on: [ mongo, redis, billing-service, rides-service, drivers-service, customers-service ]
volumes:
mongo_data:
# Build & start everything
docker compose up -d --build
# Check logs of a service
docker compose logs -f rides-service
# Stop
docker compose down
You can run services individually without Docker for fast testing.
Prereqs
- Node.js 18+, Python 3.10+
- MongoDB & Redis running locally (or via
docker compose up mongo redis
) - Kafka (optional locally, required for billing events)
Start a Service Example: Rides Service
cd services/rides
npm install
npm run dev # runs on http://localhost:4001
Repeat for other services (drivers → 4002, customers → 4003, billing → 4004, admin → 4005).
ML Service
cd services/dynamic_pricing_model
pip install -r requirements.txt
uvicorn app:app --port 8000 --reload
Seed Minimal Data
# Create customer
curl -X POST http://localhost:4003/api/customers/signup \
-H "Content-Type: application/json" \
-d '{"customerId":"CUS-42","email":"jane@demo.com","password":"Pass123"}'
# Create driver
curl -X POST http://localhost:4002/api/drivers/signup \
-H "Content-Type: application/json" \
-d '{"driverId":"DRV-100","email":"alex@demo.com","password":"Pass123"}'
Frontend
cd uber-frontend
npm install
npm run dev # http://localhost:5173
Goal: demonstrate how caching (Redis) and asynchronous billing (Kafka) improve latency, throughput, and stability under load.
- Tool: JMeter (HTTP test plan)
- Workload: concurrent users hitting ride creation, search, status updates, billing queries
- Datasets: thousands of drivers/customers/rides preloaded
- Scenarios:
- B = Baseline (MongoDB only)
- B+S = Baseline + Redis cache
- B+S+K = Redis + Kafka (async billing)
- Caching pays first. Moving from B → B+S (adding Redis) delivers the largest drop in average & p95 latency on read-heavy paths (driver search, history, stats) and increases throughput by offloading MongoDB.
- Kafka stabilizes write flows. Moving from B+S → B+S+K (adding Kafka) makes completion→billing asynchronous, so ride completion latency stays low and predictable even during spikes; p95 tail improves and error rates drop.
- Dashboards stay snappy. Admin analytics backed by Redis remain fast (<~200ms typical) while still reflecting fresh data via short TTLs + invalidation.
- Resilience under load. With Kafka, backpressure is absorbed by the broker; Billing catches up without blocking ride flows.
Scenario | Avg Latency | p95 Latency | Throughput | Error Rate |
---|---|---|---|---|
B | higher | spiky | lower | higher |
B+S | lower | lower | higher | lower |
B+S+K | lowest | most stable | highest | lowest |
For exact values, see the three charts above.
- Why Redis first? Hot-path reads dominate; caching yields immediate wins.
- Why Kafka after Redis? It removes a synchronous dependency (billing) from the critical path, improving tail latency and reliability at peak.
- JWT auth with role claims (customer, driver, admin)
- Passwords hashed with bcrypt
- Input validation (IDs, ratings, geo coords, media size/type)
- CORS restricted to trusted origins
- Secrets from
.env
(never hardcoded)
- Idempotent billing (unique
rideId
) - Kafka at-least-once + DB upsert → no double-charging
- Cache invalidation rules keep Redis consistent
- Health checks (
/healthz
) & structured logs withtraceId
- Microservices scale independently (e.g., rides-service during spikes)
- Redis absorbs hot reads, offloading MongoDB
- Kafka buffers bursts → billing catches up asynchronously
- Docker/K8s ready for horizontal scaling
Kafka connection fails inside services
- Use
kafka:9092
inside Docker, notlocalhost
. - From host tools (JMeter, CLI), use
localhost:29092
.
Redis not reachable
- Check
REDIS_URL
(redis://redis:6379
in Docker). - Run
docker compose ps
to ensure container is up.
JWT errors (401 Unauthorized)
- Token expired or missing
Authorization: Bearer <JWT>
. - Re-login to get a fresh token.
CORS issues in frontend
- Add your dev origin (
http://localhost:5173
) toALLOWED_ORIGINS
in each service.
MongoDB slow or errors under load
- Ensure indexes (
rideId
,driverId
,customerId
,createdAt
) exist. - Use Redis cache for frequent reads (search, stats, billing lookups).
This project is licensed under the MIT License — see the LICENSE file for details.