Skip to content

A full-stack, event-driven Uber-like platform showcasing distributed systems design, load balancing, caching, and asynchronous workflows. Built with a modern microservices architecture, combining backend, ML, infra, and frontend technologies into one scalable system.

Notifications You must be signed in to change notification settings

devarshpatel1506/uber-simulation

Repository files navigation

Uber Simulation — Distributed Microservices with Kafka, Redis, Docker & ML

A full-stack, event-driven Uber-like platform showcasing distributed systems design, load balancing, caching, and asynchronous workflows.
Built with a modern microservices architecture, combining backend, ML, infra, and frontend technologies into one scalable system.

Architecture Badge Events Badge Cache Badge DB Badge API Badge Container Badge Orchestration Badge UI Badge Testing Badge Auth Badge License Badge


Tagline

From ride request to bill in seconds — horizontally scalable, cache-accelerated, event-driven.

1️⃣ Executive Summary — What & Why

This project is a distributed, microservices-based simulation of Uber, built to demonstrate how modern ride-hailing platforms are engineered for scalability, reliability, and performance.

What it does

  • Customers can sign up, book rides, view history, and pay bills.
  • Drivers can register, manage profiles, accept rides, and track earnings.
  • Admins can oversee the entire system, add/manage drivers & customers, view revenue/ride statistics, and generate reports.
  • A Dynamic Pricing Service predicts fares in real-time using machine learning.
  • Billing is generated asynchronously using Kafka events to decouple ride completion from payment processing.

Why it matters

  • Real-world relevance: ride-hailing systems like Uber and Lyft rely heavily on distributed architectures to handle massive concurrency, dynamic pricing, and system resilience.
  • System Design strength: this project demonstrates event-driven architecture, load balancing, caching strategies, fault tolerance, and scalability principles expected in modern production systems.
  • Interview-ready: showcases hands-on experience with Kafka, Redis, Docker, MongoDB, FastAPI, React, and JMeter — technologies used widely in industry.

What makes it unique

  • Event-driven billing: Kafka ensures billing is processed asynchronously and idempotently, preventing double charges.
  • Performance-optimized: Redis caches accelerate driver search, revenue stats, and ride lookups with cache invalidation strategies.
  • Scalable architecture: Docker/Kubernetes deployment enables horizontal scaling across services.
  • Data-driven pricing: Machine learning (XGBoost via FastAPI) integrates with the ride lifecycle to provide surge/dynamic pricing predictions.
  • End-to-end coverage: From frontend UI (React) to backend microservices, infra (Docker/K8s), messaging (Kafka), and testing (JMeter), the system covers all critical aspects of distributed systems engineering.

2️⃣ Project Flow — Detailed Role-Centric Walkthrough

The platform is designed for three distinct roles — Customer, Driver, and Admin.
Each role is given tailored capabilities, interfaces, and responsibilities.
Below is a comprehensive walkthrough of how each role interacts with the system, enriched with screenshots and technical insights.


Customer Flow — "Requesting and Experiencing a Ride"

1. Sign Up & Authentication

  • Customers start by registering with details such as name, email, phone, and credit card info.
  • The credentials are hashed using bcrypt before storage, and a JWT token is issued upon login.
  • This token secures all subsequent requests (Authorization: Bearer <token>).

2. Booking a Ride

  • The customer enters pickup and drop-off locations.
  • The Rides Service queries the Drivers Service for available drivers within a 10-mile radius (using Haversine formula).
  • At this moment, the Dynamic Pricing ML Service is called to predict an estimated fare (/predict with distance, time, passenger count, etc.).
  • The ride is created in MongoDB with status = in_progress.

Behind the scenes:

  • Request hits rides-service (4001) → calls drivers-service (4002) + ml-service (8000) → persists in MongoDB.

3. Ride Tracking & Completion

  • The customer can see live ride status (requested → accepted → in_progress → completed).
  • When marked completed, the rides-service produces a ride-completed Kafka event.
  • Billing-service consumes this event asynchronously to generate a final bill.

Behind the scenes:

  • Kafka → ensures billing happens asynchronously, improving throughput.
  • Redis caches common queries like rides:byCustomer:{id} to accelerate dashboard loads.

4. Billing & Payments

  • Customers can view a detailed billing history: ride time, distance, predicted vs actual fare.

  • Data is fetched from the billing-service (with Redis caching for frequent lookups).

Behind the scenes:

  • Billing-service ensures idempotency (one bill per rideId).
  • Admins can also query the same bills for audits.

5. Feedback & Ratings

  • After ride completion, customers can rate their driver (1–5 stars) and leave comments.
  • Ratings update the driver’s aggregated score, influencing search results for future customers.

Driver Flow — "Managing Profile & Completing Rides"

1. Sign Up & Authentication

  • Drivers register with ID, car details, license, and insurance.
  • Secure auth with JWT tokens; passwords hashed with bcrypt.

2. Profile Management & Media Uploads

  • Drivers can upload a short video introduction and profile picture.
  • Videos/images are stored locally (/uploads) or Mongo (metadata), streamed with HTTP range requests.
  • All profile changes are cached for faster reads via Redis.

Behind the scenes:

  • File handling with Multer; cache invalidation triggered when profile is updated.

3. Accepting Rides

  • Drivers see ride requests in their area and accept assignments.
  • System updates ride → driverId field is set, status = accepted.

Behind the scenes:

  • Updates propagate to Redis (rides:byDriver:{id} cache refreshed).
  • Rides-service ensures drivers cannot double-book (transaction checks).

4. Completing Rides & Earnings Summary

  • After completion, driver’s account is updated with earnings, and ride-completed Kafka event is fired.
  • Drivers can view a summary dashboard showing completed rides, ratings, and earnings.


Admin Flow — "Overseeing the Ecosystem"

1. Secure Login

  • Admins log in with special credentials.
  • JWT auth with elevated role privileges.

2. Managing Drivers & Customers

  • Admins can add new drivers/customers or deactivate accounts.
  • These operations proxy calls to the drivers-service and customers-service.


3. Monitoring System Statistics

  • View daily revenue, rides by area, rides per driver, rides per customer.
  • Graphs generated via MongoDB aggregations + cached in Redis for dashboard speed.


4. Billing Oversight

  • Admins can search bills, audit discrepancies, and ensure fairness in pricing.

🔑 Role–Action Summary

Action / Feature Customer Driver Admin
Register & Login
Book/Accept/Complete Ride
Billing (view, history, audits)
Profile & Media
Ratings & Reviews
Manage Users
View Revenue & Stats

3️⃣ Project Flow — At a Glance

While each role has its own journey, the entire system works together as a distributed, event-driven platform.
This high-level flow shows how Customers, Drivers, and Admins interact with the microservices, and how Kafka, Redis, and MongoDB glue everything together.


System Interaction Diagram

flowchart LR
  %% Clients
  subgraph Clients
    CU[Customer UI]
    DR[Driver UI]
    AD[Admin UI]
  end

  %% Services
  subgraph Services
    R1[Rides 4001]
    D1[Drivers 4002]
    C1[Customers 4003]
    B1[Billing 4004]
    A1[Admin 4005]
  end

  %% ML
  subgraph ML
    ML1[Dynamic Pricing API 8000]
  end

  %% Infra
  subgraph Infra
    K[(Kafka)]
    X[(Redis)]
    M[(MongoDB)]
  end

  %% Flows
  CU --> R1
  DR --> D1
  AD --> A1

  R1 <--> D1
  R1 --> ML1
  A1 --> D1
  A1 --> C1
  A1 --> B1

  R1 <--> M
  D1 <--> M
  C1 <--> M
  B1 <--> M
  A1 <--> M

  R1 -.-> K
  K -.-> B1

  R1 <--> X
  D1 <--> X
  B1 <--> X
Loading

Explanation

  • Customers request rides → handled by Rides Service, which queries Drivers Service for nearby drivers and calls ML Service for dynamic fare prediction.
  • Drivers accept rides via Drivers Service, which updates ride status in MongoDB and invalidates Redis caches.
  • When a ride is completed, Rides Service emits a ride-completed event to Kafka.
  • Billing Service consumes this event, generates the final bill, stores it in MongoDB, and caches frequent queries in Redis.
  • Admins interact with Admin Service to manage users, audit bills, and view system-wide statistics (fueled by MongoDB aggregations + Redis cache).
  • Redis ensures fast reads (driver search, revenue stats, ride lookups).
  • Kafka decouples services, ensuring that ride completion and billing remain scalable and resilient.
  • MongoDB stores all persistent entities (Drivers, Customers, Rides, Bills, Reviews, Media metadata).

Role–Action Matrix

Action / Feature Customer Driver Admin
Register & Login
Book a Ride
Accept / Complete Ride
View Billing & History
Profile & Media
Ratings & Reviews
Manage Users (CRUD)
Revenue & Ride Analytics
Audit Bills

4️⃣ System Architecture — High-Level Tech Overview

At its core, the Uber Simulation is built on a microservices architecture.
Each domain (Rides, Drivers, Customers, Billing, Admin, Pricing) is implemented as a separate service, allowing for independent development, deployment, and scaling.
The services communicate via REST APIs (synchronous) and Kafka events (asynchronous), while Redis accelerates hot lookups and MongoDB persists system state.


Architecture Diagram

flowchart TB
  %% Layers
  subgraph L0["Client Layer"]
    FE["React + Redux (Frontend)"]
  end

  subgraph L1["Edge / Networking"]
    GW["Ingress / Reverse Proxy (NGINX)"]
  end

  subgraph L2["Microservices Layer"]
    R1["Rides svc :4001"]
    D1["Drivers svc :4002"]
    C1["Customers svc :4003"]
    B1["Billing svc :4004"]
    A1["Admin svc :4005"]
    ML["Dynamic Pricing (FastAPI) :8000"]
  end

  subgraph L3["Data & Infra Layer"]
    DB[(MongoDB)]
    RD[(Redis)]
    KF[(Kafka Broker)]
    ZK[(Zookeeper)]
  end

  %% Flows
  FE -->|HTTPS| GW
  GW -->|HTTP| R1
  GW -->|HTTP| D1
  GW -->|HTTP| C1
  GW -->|HTTP| B1
  GW -->|HTTP| A1

  R1 -->|"HTTP /predict"| ML

  R1 --- DB
  D1 --- DB
  C1 --- DB
  B1 --- DB
  A1 --- DB

  R1 --- RD
  D1 --- RD
  B1 --- RD

  R1 -. "produce ride-completed" .-> KF
  KF  -. "consume" .-> B1
  ZK --- KF
Loading

Service Overview

Service Port Responsibilities Key Tech
Rides Service 4001 Core ride lifecycle: create, update, near-by driver search, reviews, statistics; produces ride-completed Kafka events; integrates with ML service for fare prediction. Node.js, Express, MongoDB, Redis, Kafka Producer
Drivers Service 4002 Driver auth & profile management, car/insurance details, intro videos, search (cached). Node.js, Express, MongoDB, Redis
Customers Service 4003 Customer auth, profile management, ride history links to rides/billing. Node.js, Express, MongoDB
Billing Service 4004 Bill generation & search; consumes ride-completed Kafka events; ensures idempotency. Node.js, Express, MongoDB, Redis, Kafka Consumer
Admin Service 4005 Admin auth, add/manage drivers & customers, revenue & ride statistics, billing audits. Node.js, Express, MongoDB, Redis
Dynamic Pricing Model 8000 Machine learning service providing estimated_price predictions during ride creation. FastAPI, Python, XGBoost, Joblib

Infrastructure Components

  • MongoDB (Atlas/local) → primary data store for all entities (Drivers, Customers, Rides, Billing, Reviews, Media metadata).
  • Redis → cache for driver searches, revenue stats, ride/billing lookups.
  • Kafka → event bus connecting rides → billing; ensures decoupled, resilient workflows.
  • Docker → containerization of all services for local and cloud deployment.
  • Kubernetes (K8s-ready) → orchestration layer for scalability and load balancing.
  • JMeter → load/performance testing across scenarios (B, B+S, B+S+K).
  • React + Redux → frontend client (Customer, Driver, Admin portals).

Why Microservices?

  • Scalability: each service can be scaled independently (e.g., rides-service under heavy load).
  • Resilience: failure in billing-service won’t block ride creation thanks to Kafka decoupling.
  • Technology fit: Python/ML model isolated from Node.js services.
  • Team productivity: each service can be developed & deployed by separate teams.

5️⃣ Request Lifecycles & Sequence Diagrams

Understanding the internal workflows is crucial for evaluating system design.
Below are the four most important request lifecycles, illustrated with sequence diagrams.


1. Authentication & JWT Flow

sequenceDiagram
  participant U as User (Customer/Driver/Admin)
  participant S as Service (e.g., Drivers)
  participant DB as MongoDB
  participant J as JWT Middleware

  U->>S: POST /login (email + password)
  S->>DB: Verify credentials (bcrypt hash)
  DB-->>S: User found + valid
  S-->>U: 200 OK + { token: "Bearer <JWT>" }
  U->>S: GET /protected (Authorization: Bearer <JWT>)
  S->>J: Verify token + role
  J-->>S: OK (req.user populated)
  S-->>U: Protected resource JSON
Loading

Key Points

  • Passwords stored as bcrypt hashes.
  • JWT includes user role → used for authorization in role-specific routes.
  • Stateless → services can scale horizontally without sticky sessions.

2. Ride Completion → Billing (Event-Driven with Kafka)

sequenceDiagram
  participant C as Customer
  participant R as Rides Service
  participant ML as ML Service
  participant K as Kafka Broker
  participant B as Billing Service
  participant M as MongoDB

  C->>R: POST /api/rides (pickup, dropoff)
  R->>ML: POST /predict (distance, time, passengers, etc.)
  ML-->>R: { estimated_price }
  R->>M: Save ride { status: in_progress, estimatedPrice }
  
  C->>R: PATCH /api/rides/:id/status completed
  R->>K: produce("ride-completed", ride data)
  K-->>B: consume("ride-completed")
  B->>M: Insert/Upsert Bill (idempotent by rideId)
  B-->>C: Bill generated & available
Loading

Key Points

  • Asynchronous decoupling: rides-service doesn’t wait for billing → higher throughput.
  • Idempotency: billing ensures no duplicate bills for same rideId.
  • Scalability: Kafka can buffer load spikes (durable queue).

3. Nearby Driver Search (with Redis Caching)

sequenceDiagram
  participant C as Customer
  participant R as Rides Service
  participant D as Drivers Service
  participant X as Redis
  participant M as MongoDB

  C->>R: GET /api/rides/nearby-drivers?lat=...&lng=...
  R->>X: Check cache key driver:search:{lat,lng}
  alt Cache hit
    X-->>R: Cached list of drivers
  else Cache miss
    R->>D: Query drivers within 10 miles (Haversine formula)
    D->>M: Geo query in MongoDB
    M-->>D: List of drivers
    D-->>R: Driver list
    R->>X: Cache results with TTL=60s
  end
  R-->>C: List of available drivers
Loading

Key Points

  • Redis reduces repeated queries for popular areas (e.g., airports).
  • TTL ensures data freshness (drivers update frequently).
  • Reduces MongoDB load under high concurrency.

4. Admin Analytics (Stats with MongoDB + Redis)

sequenceDiagram
  participant A as Admin
  participant AS as Admin Service
  participant B as Billing Service
  participant X as Redis
  participant M as MongoDB

  A->>AS: GET /api/admin/statistics/revenue
  AS->>X: Check cache key stats:revenue:day:{date}
  alt Cache hit
    X-->>AS: Cached revenue data
  else Cache miss
    AS->>B: Request billing summary
    B->>M: Aggregate bills by date
    M-->>B: { totalRevenue, ridesPerArea, ridesPerDriver }
    B-->>AS: Aggregated stats
    AS->>X: Cache stats for 5 mins
  end
  AS-->>A: Render revenue dashboard (charts)
Loading

Key Points

  • MongoDB aggregation pipelines compute totals, grouped by day/area/driver.
  • Redis ensures dashboards load fast (<200ms).
  • Admin sees updated revenue with minimal DB load.

6️⃣ Microservices Deep Dive

Each domain is implemented as an independent service with its own API, data model, caching strategy, and (where applicable) Kafka role.

Notation:
HTTP→ outbound service call • K = Kafka role • R = Redis keys • DB = Mongo collections


Rides Service (4001)

Purpose
Core ride lifecycle: create/update, nearby driver search, ride statistics, reviews, media metadata, and produces ride-completed on finish. Integrates with ML for fare prediction.

Top Endpoints

Method Path Purpose
POST /api/rides Create ride (calls ML /predict, finds nearest driver)
PATCH /api/rides/:id/status Update status; on completedproduce Kafka ride-completed
GET /api/rides/nearby-drivers?lat=&lng= Haversine search for drivers around a point
GET /api/rides/statistics Aggregated stats (revenue/day, rides/hour/area/driver/customer)
GET /api/rides/:id / DELETE Get / delete ride
POST /api/rides/:id/images Attach image metadata to ride
POST /api/rides/reviews Create review (customer↔driver)
GET /api/rides/reviews/user/:userId Reviews by user

Inter-Service Calls (HTTP→)

  • HTTP→ Drivers: fetch drivers for nearby search / details.
  • HTTP→ ML: POST /predict to calculate estimatedPrice.

Kafka (K)

  • Producer: ride-completed (payload includes rideId, driverId, customerId, distanceKm, predictedPrice, actualPrice, startedAt, endedAt).
  • Idempotency target: Billing upserts by rideId.

Redis (R)

  • Reads/Writes:
    • driver:search:{lat}:{lng} (60s) – cached search results.
    • rides:byDriver:{driverId} (60s) – recent rides for driver.
  • Invalidate on write (status change, create/delete).

Mongo (DB)

  • rides (ride lifecycle), reviews (ratings), media (metadata).

Sample cURL

# Create ride (server will call ML for estimated price)
curl -X POST http://localhost:4001/api/rides \
  -H "Content-Type: application/json" \
  -d '{"customerId":"CUS-42","pickup":{"lat":37.77,"lng":-122.42},"dropoff":{"lat":37.79,"lng":-122.39},"passengerCount":1}'

Drivers Service (4002)

Purpose

Driver identity & profile management:

  • Signup / login (JWT, bcrypt)
  • Car & insurance details
  • Intro video upload/stream
  • Cached search for nearby drivers
  • Summaries for dashboards

Top Endpoints

Method Path Purpose
POST /api/drivers/signup / /login Auth (JWT), bcrypt password storage
GET /api/drivers / /search?q= List drivers & cached search
GET /api/drivers/:id Fetch profile
PUT /api/drivers/:id Update profile
DELETE /api/drivers/:id Delete profile
POST /api/drivers/:id/video Upload intro video
GET /api/drivers/:id/video Stream intro video (HTTP range)
GET /api/drivers/:driverId/summary Aggregates for dashboards (earnings, ratings)

Inter-Service Calls

  • Serves data to Rides (nearby search)
  • Serves data to Admin (management & dashboards)

Redis

  • driver:search:{q} (TTL 60s)
  • driver:summary:{driverId} (TTL 60s)
  • Invalidate on profile updates

MongoDB

  • drivers → profile, vehicle, insurance, location
  • videos → paths, metadata

Customers Service (4003)

Purpose

Customer authentication & profile management:

  • Signup/login
  • Profile CRUD
  • Links to rides/billing via UI/API layer

Top Endpoints

Method Path Purpose
POST /api/customers/signup / /login Auth (JWT)
GET /api/customers List customers
GET /api/customers/:id Fetch details
PUT /api/customers/:id Update profile
DELETE /api/customers/:id Delete profile

Inter-Service Calls

  • Called by Admin for management
  • UI fetches rides/billing directly from their services

MongoDB

  • customers → PII, address, masked card refs, preferences

Billing Service (4004)

Purpose

  • Generate & search bills
  • Consume ride-completed events
  • Guarantee idempotency (unique rideId)

Top Endpoints

Method Path Purpose
POST /api/billing/rides/:rideId Manually create bill (check by rideId)
GET /api/billing/:billId Fetch single bill
GET /api/billing/search?driverId=&customerId=&status= Search bills
GET /api/billing/customer/:customerId Bills by customer
GET /api/billing/driver/:driverId Bills by driver

Kafka

  • Consumer: ride-completed
  • Group: billing-consumer-group
  • On consume → upsert into billing collection → cache hot queries

Redis

  • billing:byUser:{userId}:{role} (TTL 60s) → customer/driver lists
  • stats:revenue:day:{YYYY-MM-DD} (TTL 300s) → precomputed daily revenue

MongoDB

  • billing → predicted vs actual, totals, timestamps, status

Admin Service (4005)

Purpose

Administrative plane:

  • Privileged authentication
  • Manage drivers/customers
  • Financial & ride analytics
  • Bill audits

Top Endpoints

Method Path Purpose
POST /api/admin/signup / /login Admin authentication
POST /api/admin/drivers / /customers Proxy create via services
GET /api/admin/statistics/revenue Revenue per day (charts)
GET /api/admin/statistics/rides Rides per area/driver/customer
GET /api/admin/bills/search Search bills
GET /api/admin/bills/:billId Billing tools

Inter-Service Calls

  • HTTP → Drivers/Customers for management
  • HTTP → Billing for audits & stats
  • HTTP → Rides for ride metrics

Redis

  • Reads cached stats & billing keys
  • May set dashboard caches (TTL 5m)

MongoDB

  • Minimal own state
  • Mostly queries across other domains

Dynamic Pricing Model (FastAPI, 8000)

Purpose

Predict estimated_price during ride creation based on features:

  • Distance (km)
  • Time of day
  • Weekend / night flag
  • Passenger count

Endpoint

Method Path Body Response
POST /predict { distance_km, passenger_count, hour, day_of_week, is_weekend, is_night } { "estimated_price": <float> }

Example

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
        "distance_km": 12.5,
        "passenger_count": 1,
        "hour": 18,
        "day_of_week": 5,
        "is_weekend": 1,
        "is_night": 0
      }'

7️⃣ Data Models & Persistence

The system uses MongoDB for operational data across domains (Drivers, Customers, Rides, Billing, Reviews, Media metadata).
Schemas are optimized for read performance (dashboards/search), idempotency (billing), and evolving structures (optional media/metadata).


Entity-Relationship (ER) Diagram

erDiagram
  DRIVER {
    string _id
    string driverId      "external id (SSN-like)"
    string email         "unique"
    string passwordHash
    string name
    object carDetails    "make, model, plate"
    object insurance     "policyNo, expiresAt"
    object location      "lat, lng"
    int    rating
    string videoPath
    date   createdAt
    date   updatedAt
  }

  CUSTOMER {
    string _id
    string customerId    "external id"
    string email         "unique"
    string passwordHash
    string name
    object address       "city, state, zip"
    object card          "tokenized ref"
    int    rating
    date   createdAt
    date   updatedAt
  }

  RIDE {
    string _id
    string rideId        "unique human id"
    datetime dateTime
    object pickup        "lat, lng, address"
    object dropoff       "lat, lng, address"
    string driverId
    string customerId
    number distanceKm
    number estimatedPrice
    number actualPrice
    string status        "requested|accepted|in_progress|completed|canceled"
    array  media         "mediaId[]"
    date   createdAt
    date   updatedAt
  }

  BILLING {
    string _id
    string billingId     "BILL-<timestamp>"
    string rideId        "unique (idempotency)"
    string driverId
    string customerId
    number predictedPrice
    number actualPrice
    number distanceKm
    datetime startedAt
    datetime endedAt
    string status        "created|paid|void"
    date   createdAt
    date   updatedAt
  }

  REVIEW {
    string _id
    string rideId
    string reviewerId
    string revieweeId
    string reviewerType  "driver|customer"
    int    rating        "1..5"
    string comment
    date   createdAt
  }

  MEDIA {
    string _id
    string rideId
    string ownerId       "driverId|customerId"
    string path          "local/S3 url"
    string type          "image|video"
    number sizeBytes
    string contentType
    date   createdAt
  }

  DRIVER   ||--o{ RIDE   : drives
  CUSTOMER ||--o{ RIDE   : books
  RIDE     ||--o{ BILLING: generates
  RIDE     ||--o{ REVIEW : has
  DRIVER   ||--o{ REVIEW : receives
  CUSTOMER ||--o{ REVIEW : receives
  RIDE     ||--o{ MEDIA  : attaches
Loading

MongoDB Collections & Indexing

Collection Key Fields Recommended Indexes Notes
drivers _id, driverId, email, location(lat,lng), rating, videoPath email (unique), driverId (unique), (future) 2dsphere on location Cached search results in Redis via driver:search:{q}; summary cache driver:summary:{driverId}.
customers _id, customerId, email, address, rating email (unique), customerId (unique) Card data should be tokenized (never store PAN).
rides _id, rideId, driverId, customerId, status, dateTime, pickup/dropoff.lat/lng rideId (unique), driverId, customerId, status, dateTime, (future) 2dsphere on pickup/dropoff Hot path for dashboards; rides:byDriver:{driverId} cache with TTL.
billing _id, billingId, rideId, driverId, customerId, predictedPrice, actualPrice, status rideId (unique), driverId, customerId, status, createdAt Idempotency by rideId (prevents duplicate bills from repeated events).
reviews _id, rideId, reviewerId, revieweeId, reviewerType, rating rideId, revieweeId, rating Used to compute rating aggregates in app/service layer.
media _id, rideId, ownerId, path, type, sizeBytes, contentType rideId, ownerId, type, createdAt Store only metadata in DB; file on disk or S3; serve via signed URLs/range.

Idempotency & Referential Integrity

Billing Idempotency

  • billing.rideId is unique.
  • Kafka consumer performs upsert by rideId to avoid duplicate bills when the ride-completed event is replayed.

Ride Lifecycle Checks

  • Validate legal status transitions:
    • requested → accepted → in_progress → completed
    • Prevent double completion or invalid jumps (e.g., in_progress → canceled).

Foreign Keys (Logical)

  • rides.driverId and rides.customerId must reference existing documents.
  • Enforce via pre-create checks in services (and optional MongoDB schema validation).

Validation & Security

  • Passwords: store bcrypt hashes (never plain text).
  • Emails & IDs: validate on input; email uniqueness enforced at DB-level.
  • Coordinates: lat ∈ [-90, 90], lng ∈ [-180, 180].
  • Reviews: enforce rating ∈ [1..5].
  • Media: enforce content types & size limits on upload; sanitize filenames; store paths only.
  • JWT: required for protected endpoints; role claims (admin|driver|customer) checked per route.

Data Lifecycle & Retention

  • Rides/Billing: retain indefinitely for analytics; optionally archive to cold collections or a data lake for historical reporting.
  • Media: apply TTL or move to cheaper storage (e.g., S3 Glacier) after N days.
  • Caches (Redis): ephemeral; short TTLs (60s–5m) tuned per key; safe to flush during incidents.
  • Soft deletes (optional): add isActive / deletedAt to drivers/customers to avoid hard deletes.

Aggregations for Admin Analytics (Examples)

Revenue per Day

db.billing.aggregate([
  { $match: { status: "paid" } },
  {
    $group: {
      _id: { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" } },
      totalRevenue: { $sum: "$actualPrice" },
      rides: { $sum: 1 }
    }
  },
  { $sort: { _id: 1 } }
])

Top Drivers by Revenue

 db.billing.aggregate([
  { $match: { status: "paid" } },
  {
    $group: {
      _id: "$driverId",
      revenue: { $sum: "$actualPrice" },
      rides: { $sum: 1 }
    }
  },
  { $sort: { revenue: -1 } },
  { $limit: 10 }
])

8️⃣ Kafka & Eventing

The system uses Apache Kafka to decouple ride completion from billing.
By emitting a ride-completed event, the Rides Service hands off billing work to the Billing Service asynchronously, improving throughput and resilience.


Topic Specification

Topic Purpose Producer Consumer Partitions Replication Key
ride-completed Notify that a ride finished (billable) Rides Service Billing Service (CG) 3–6 (cfg) 1–3 (cfg) rideId (str)
  • Partitioning strategy: keyBy(rideId) keeps the same ride’s messages ordered on a single partition → simplifies idempotent upsert logic in Billing.
  • Consumer group: billing-consumer-group (scales horizontally; each instance gets a subset of partitions).
  • Delivery semantics: at-least-once (consumer commits after upsert). With upsert idempotency in Billing, duplicates are safe.

Event Schema (JSON)

{
  "eventType": "ride-completed",
  "version": 1,
  "rideId": "RIDE-2025-09-18-00123",
  "driverId": "DRV-9",
  "customerId": "CUS-42",
  "distanceKm": 12.1,
  "predictedPrice": 18.75,
  "actualPrice": 19.40,
  "startedAt": "2025-09-18T10:00:00Z",
  "endedAt": "2025-09-18T10:25:00Z",
  "metadata": {
    "source": "rides-service",
    "emittedAt": "2025-09-18T10:25:05Z",
    "traceId": "f5c9…"
  }
}

Schema Notes

  • Include eventType & version to evolve payloads safely.
  • traceId helps correlate logs across Rides ↔ Kafka ↔ Billing.

Producer Logic (Rides Service)

  • Emit exactly one event when status transitions to completed.
  • Use synchronous confirmation (await Kafka produce) or buffered with retry/backoff.
  • Attach rideId as the message key.

Example (Node.js with kafkajs)

import { Kafka } from "kafkajs";
const kafka = new Kafka({ clientId: "rides", brokers: [process.env.KAFKA_BROKERS] });
const producer = kafka.producer();

await producer.connect();
await producer.send({
  topic: "ride-completed",
  messages: [{
    key: ride.rideId,
    value: JSON.stringify({
      eventType: "ride-completed",
      version: 1,
      rideId: ride.rideId,
      driverId: ride.driverId,
      customerId: ride.customerId,
      distanceKm: ride.distanceKm,
      predictedPrice: ride.estimatedPrice,
      actualPrice: ride.actualPrice,
      startedAt: ride.startedAt,
      endedAt: ride.endedAt,
      metadata: { source: "rides-service", emittedAt: new Date().toISOString(), traceId }
    })
  }]
});

Consumer Logic (Billing Service)

  • At-least-once processing → commit offset only after successful upsert.
  • Idempotency: billing collection has a unique index on rideId; consumer performs upsert by rideId to avoid duplicates.
  • Failure handling: retry with backoff; if still failing, log & (optional) publish to a DLQ topic.

Example (Node.js with kafkajs)

import { Kafka } from "kafkajs";
const kafka = new Kafka({ clientId: "billing", brokers: [process.env.KAFKA_BROKERS] });
const consumer = kafka.consumer({ groupId: "billing-consumer-group" });

await consumer.connect();
await consumer.subscribe({ topic: "ride-completed", fromBeginning: false });

await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    const payload = JSON.parse(message.value.toString());
    try {
      // Upsert by rideId (idempotent)
      await BillingModel.updateOne(
        { rideId: payload.rideId },
        {
          $set: {
            driverId: payload.driverId,
            customerId: payload.customerId,
            predictedPrice: payload.predictedPrice,
            actualPrice: payload.actualPrice,
            distanceKm: payload.distanceKm,
            startedAt: new Date(payload.startedAt),
            endedAt: new Date(payload.endedAt),
            status: "created",
            updatedAt: new Date()
          },
          $setOnInsert: { billingId: `BILL-${Date.now()}`, createdAt: new Date() }
        },
        { upsert: true }
      );
      // offset is auto-committed by kafkajs unless manual commit mode is enabled
    } catch (err) {
      console.error("Billing consume error:", err);
      // Optional Dead Letter Queue (DLQ)
      // await dlqProducer.send({
      //   topic: "ride-completed.DLQ",
      //   messages: [{ key: payload.rideId, value: JSON.stringify(payload) }]
      // });
    }
  }
});

Reliability & Backpressure

  • At-least-once + idempotent upsert → safe duplicates, never double-charge.
  • Consumer group scaling: run N billing instances → Kafka partitions are divided → linear throughput gains (bounded by partitions).
  • Backpressure: if Billing lags, Kafka buffers messages durably; Rides remains fast.
  • Retries: exponential backoff on transient DB/Redis errors.

Local Operations (Docker)

Create topic (if auto-create disabled):

docker exec -it kafka kafka-topics \
  --create --topic ride-completed \
  --bootstrap-server kafka:9092 \
  --partitions 3 --replication-factor 1

Describe topic:

docker exec -it kafka kafka-topics \
  --describe --topic ride-completed \
  --bootstrap-server kafka:9092

Consume (debug):

docker exec -it kafka kafka-console-consumer \
  --bootstrap-server kafka:9092 \
  --topic ride-completed --from-beginning

Produce (debug):

docker exec -it kafka kafka-console-producer \
  --broker-list kafka:9092 --topic ride-completed
# paste a JSON line and press Enter to send

In Docker Compose, services reach Kafka via the hostname kafka:9092.
On host tools, use the advertised listener localhost:29092 (if configured).


Monitoring & Observability

  • Metrics to watch: consumer lag per partition, produce/consume rates, error counts, retry counts.
  • Logging: include traceId across Rides → Kafka → Billing to correlate events.
  • Future: add Prometheus exporters (e.g., Burrow for consumer lag) + Grafana dashboards.

Design Rationale (Interview Notes)

  • Why Kafka? Asynchronous decoupling improves ride throughput and system resilience vs. synchronous billing API calls.
  • Why at-least-once (not exactly-once)? Simpler + robust with idempotent DB upserts; operationally safer than coordinated transactions.
  • Why key by rideId? Ensures per-ride ordering and simplifies idempotent consumer logic.
  • What about DLQ? Optional safety net for poison messages; keeps main consumer healthy while problematic events are quarantined.

9️⃣ Redis Caching Strategy

To reduce MongoDB query load and deliver sub-200ms responses on hot paths, the system uses Redis as a distributed in-memory cache.

Redis follows a cache-aside pattern:

  1. Service checks Redis first.
  2. On cache miss → query MongoDB → return result + populate Redis with TTL.
  3. On updates → invalidate affected cache keys.

Cached Keys

Key Pattern Value TTL Writer Service Reader Services
driver:search:{q} JSON list of drivers for search 60s Drivers Rides, Admin
driver:summary:{driverId} Aggregated stats for profile/dashboard 60s Drivers Admin
rides:byDriver:{driverId} List of rides for driver 60s Rides Admin
billing:byUser:{userId}:{role} Bills for customer/driver 60s Billing Admin, Customer
stats:revenue:day:{YYYY-MM-DD} Revenue + ride counts for day 300s Billing/Admin Admin dashboard

Cache Invalidation Rules

  • Driver profile updateDEL driver:search:* and driver:summary:{driverId}.
  • New ride / status updateDEL rides:byDriver:{driverId}.
  • Billing created/updatedDEL billing:byUser:* and stats:revenue:day:*.
  • Admin dashboards refresh every 5 minutes → Redis keys expire naturally.

Flow Example — Nearby Driver Search

sequenceDiagram
  participant C as Customer
  participant R as Rides Service
  participant D as Drivers Service
  participant X as Redis
  participant M as MongoDB

  C->>R: GET /api/rides/nearby-drivers?lat=..&lng=..
  R->>X: Check key driver:search:{lat}:{lng}
  alt Cache hit
    X-->>R: Cached drivers (ms response)
  else Cache miss
    R->>D: Fetch drivers in area
    D->>M: Geo query (Haversine in Mongo)
    M-->>D: Drivers list
    D-->>R: Drivers list
    R->>X: Cache result (TTL 60s)
  end
  R-->>C: Driver list
Loading

Impact on Performance

  • Baseline (B): Every request hits MongoDB → latency spikes under load.
  • B+S (Base + SQL Caching): Redis absorbs hot read traffic → p95 latency cut by ~40%, throughput ↑.
  • B+S+K (Base + Redis + Kafka): Billing async + Redis caching → smoothest performance; Mongo load reduced drastically.

See performance graphs in Section 14 — Load Testing.


Design Rationale

  • Why cache-aside? Simple, widely used; services decide what to cache.
  • Why short TTLs (60s–300s)? Keeps data fresh (drivers move constantly, revenue updates every few mins).
  • Why Redis over in-process cache? Distributed; works across multiple service instances → safe for horizontal scaling.
  • What about consistency? Slight staleness tolerated (e.g., nearby drivers list). Strong consistency maintained via invalidation on updates.

🔟 Dynamic Pricing (ML Service)

The platform includes a machine learning microservice to simulate Uber's dynamic pricing model.
This ensures fares reflect demand, supply, and context (time, location, conditions).


Model Overview

  • Framework: FastAPI (Python) serving an XGBoost regression model.
  • Trained on: Uber Fares Kaggle dataset (pickup/dropoff coordinates, datetime, passenger count, fare amount).
  • Serialization: Model persisted via joblib for fast loading.
  • Serving: Uvicorn ASGI server, containerized with Docker.

Feature Inputs

Feature Type Example Why it matters
distance_km float 12.5 Longer trips → higher base fare
passenger_count int 1 More passengers → adjusted pricing
hour int (0–23) 18 (6 PM) Captures rush hour patterns
day_of_week int (0–6) 5 (Friday) Captures weekday vs weekend demand
is_weekend binary 1 Surge more likely on weekends
is_night binary 0 Night trips may have premiums

API Specification

Method Path Body Response
POST /predict JSON with feature set { "estimated_price": <float> }

Example Request

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "distance_km": 12.5,
    "passenger_count": 1,
    "hour": 18,
    "day_of_week": 5,
    "is_weekend": 1,
    "is_night": 0
  }'

Example Response

{ "estimated_price": 21.37 }

Integration in Ride Lifecycle

  1. Customer books a riderides-service calls ML service with trip features.
  2. ML service returns predicted fare (estimatedPrice).
  3. Rides-service persists the ride with this price.
  4. After ride completionBilling compares predicted vs actual fare → stores both for auditing.

Design Rationale (Interview Notes)

  • Why separate ML service? Decouples Python stack from Node.js services; can scale independently.
  • Why FastAPI? Lightweight, async-friendly, production-ready for ML serving.
  • Why dynamic pricing? Simulates real Uber “surge” behavior where supply-demand elasticity impacts pricing.
  • What if ML fails? rides-service can fallback to a static formula (distance × rate).

1️⃣1️⃣ API Reference (Selected)

The system exposes REST APIs across microservices.
Below are the most important endpoints, grouped by service, with examples.


Rides Service (4001)

Create Ride

POST /api/rides
Content-Type: application/json
Authorization: Bearer <JWT>

{
  "customerId": "CUS-42",
  "pickup": { "lat": 37.77, "lng": -122.42 },
  "dropoff": { "lat": 37.79, "lng": -122.39 },
  "passengerCount": 1
}

Example Response

{
  "rideId": "RIDE-2025-0001",
  "status": "in_progress",
  "estimatedPrice": 21.37
}

Update Ride Status (→ triggers Kafka)

PATCH /api/rides/:id/status
Authorization: Bearer <JWT>

{ "status": "completed", "actualPrice": 22.10 }

Drivers Service (4002)

Signup

POST /api/drivers/signup
Content-Type: application/json

{
  "driverId": "DRV-100",
  "email": "alex@demo.com",
  "password": "SafePass123!",
  "carDetails": { "make": "Toyota", "model": "Prius" }
}

Search Drivers (cached in Redis)

GET /api/drivers/search?q=Prius
Authorization: Bearer <JWT>

Customers Service (4003)

Signup

POST /api/customers/signup
Content-Type: application/json

{
  "customerId": "CUS-42",
  "email": "jane@demo.com",
  "password": "SafePass123!",
  "address": { "city": "San Jose", "state": "CA", "zip": "95123" }
}

Get Customer Profile

GET /api/customers/CUS-42
Authorization: Bearer <JWT>

Billing Service (4004)

Get Bill by Ride

GET /api/billing/rides/RIDE-2025-0001
Authorization: Bearer <JWT>

Example Response

{
  "billingId": "BILL-17475391",
  "rideId": "RIDE-2025-0001",
  "predictedPrice": 21.37,
  "actualPrice": 22.10,
  "status": "created"
}

Search Bills

GET /api/billing/search?driverId=DRV-100&status=created
Authorization: Bearer <JWT>

Admin Service (4005)

Admin Login

POST /api/admin/login
Content-Type: application/json

{ "email": "admin@demo.com", "password": "AdminPass123!" }

Get Revenue Stats

GET /api/admin/statistics/revenue
Authorization: Bearer <JWT>

Example Response

{
  "date": "2025-09-18",
  "totalRevenue": 12340.75,
  "rides": 842
}

Dynamic Pricing Service (8000)

Predict Fare

POST /predict
Content-Type: application/json

{
  "distance_km": 12.5,
  "passenger_count": 1,
  "hour": 18,
  "day_of_week": 5,
  "is_weekend": 1,
  "is_night": 0
}

Example Response

{ "estimated_price": 21.37 }

Notes

  • All protected endpoints require Authorization: Bearer <JWT>.
  • JWT tokens embed role claims (customer, driver, admin) → enforced in route middleware.
  • Responses are always in JSON format.
  • Errors follow the structure:
    { "error": "Message" }
    
    

1️⃣2️⃣ Deployment & Operations

This section shows how to configure environments and run the full stack locally with Docker.
All services are 12-factor style: configuration comes from environment variables.


Environment Variables (Matrix)

Var Rides Drivers Customers Billing Admin ML Infra Notes
PORT Service port (4001–4005, 8000)
NODE_ENV development | production
MONGO_URI e.g. mongodb://mongo:27017/uber_sim or separate DBs per svc
REDIS_URL redis://redis:6379
KAFKA_BROKERS kafka:9092 inside Docker
JWT_SECRET Same secret across Node services
ML_URL http://ml-service:8000/predict
ALLOWED_ORIGINS CORS (comma-separated)
LOG_LEVEL info | debug
MODEL_PATH e.g. /app/models/xgb.joblib

Create .env files from .env.example under each service directory and populate these values.


Docker Compose (Local)

Save as docker-compose.yml in repo root (replace if you already have one).
This brings up Mongo, Redis, Zookeeper, Kafka, the ML service, and all Node services.

version: "3.9"

services:
  mongo:
    image: mongo:6
    container_name: mongo
    ports: [ "27017:27017" ]
    volumes:
      - mongo_data:/data/db
    environment:
      MONGO_INITDB_DATABASE: uber_sim

  redis:
    image: redis:7
    container_name: redis
    ports: [ "6379:6379" ]

  zookeeper:
    image: confluentinc/cp-zookeeper:7.6.1
    container_name: zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  kafka:
    image: confluentinc/cp-kafka:7.6.1
    container_name: kafka
    depends_on: [ zookeeper ]
    ports:
      - "29092:29092"   # host access
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_LISTENERS: "PLAINTEXT://kafka:9092,PLAINTEXT_HOST://0.0.0.0:29092"
      KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092"
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  ml-service:
    build:
      context: ./services/dynamic_pricing_model
      dockerfile: Dockerfile
    container_name: ml-service
    environment:
      PORT: 8000
      LOG_LEVEL: info
      MODEL_PATH: /app/models/xgb.joblib
    ports: [ "8000:8000" ]
    depends_on: [ mongo ]

  rides-service:
    build:
      context: ./services/rides
      dockerfile: Dockerfile
    container_name: rides-service
    environment:
      PORT: 4001
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_rides
      REDIS_URL: redis://redis:6379
      KAFKA_BROKERS: kafka:9092
      JWT_SECRET: change_me
      ML_URL: http://ml-service:8000/predict
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4001:4001" ]
    depends_on: [ mongo, redis, kafka, ml-service ]

  drivers-service:
    build:
      context: ./services/drivers
      dockerfile: Dockerfile
    container_name: drivers-service
    environment:
      PORT: 4002
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_drivers
      REDIS_URL: redis://redis:6379
      JWT_SECRET: change_me
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4002:4002" ]
    depends_on: [ mongo, redis ]

  customers-service:
    build:
      context: ./services/customers
      dockerfile: Dockerfile
    container_name: customers-service
    environment:
      PORT: 4003
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_customers
      JWT_SECRET: change_me
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4003:4003" ]
    depends_on: [ mongo ]

  billing-service:
    build:
      context: ./services/billing
      dockerfile: Dockerfile
    container_name: billing-service
    environment:
      PORT: 4004
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_billing
      REDIS_URL: redis://redis:6379
      KAFKA_BROKERS: kafka:9092
      JWT_SECRET: change_me
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4004:4004" ]
    depends_on: [ mongo, redis, kafka ]

  admin-service:
    build:
      context: ./services/admin
      dockerfile: Dockerfile
    container_name: admin-service
    environment:
      PORT: 4005
      NODE_ENV: development
      MONGO_URI: mongodb://mongo:27017/uber_admin
      REDIS_URL: redis://redis:6379
      JWT_SECRET: change_me
      LOG_LEVEL: info
      ALLOWED_ORIGINS: http://localhost:5173,http://localhost:3000
    ports: [ "4005:4005" ]
    depends_on: [ mongo, redis, billing-service, rides-service, drivers-service, customers-service ]

volumes:
  mongo_data:

Usage

# Build & start everything
docker compose up -d --build

# Check logs of a service
docker compose logs -f rides-service

# Stop
docker compose down

1️⃣3️⃣ Local Development (Quick Guide)

You can run services individually without Docker for fast testing.

Prereqs

  • Node.js 18+, Python 3.10+
  • MongoDB & Redis running locally (or via docker compose up mongo redis)
  • Kafka (optional locally, required for billing events)

Start a Service Example: Rides Service

cd services/rides
npm install
npm run dev   # runs on http://localhost:4001

Repeat for other services (drivers → 4002, customers → 4003, billing → 4004, admin → 4005).

ML Service

cd services/dynamic_pricing_model
pip install -r requirements.txt
uvicorn app:app --port 8000 --reload

Seed Minimal Data

# Create customer
curl -X POST http://localhost:4003/api/customers/signup \
  -H "Content-Type: application/json" \
  -d '{"customerId":"CUS-42","email":"jane@demo.com","password":"Pass123"}'

# Create driver
curl -X POST http://localhost:4002/api/drivers/signup \
  -H "Content-Type: application/json" \
  -d '{"driverId":"DRV-100","email":"alex@demo.com","password":"Pass123"}'

Frontend

cd uber-frontend
npm install
npm run dev   # http://localhost:5173

1️⃣4️⃣ Performance & Load Testing

Goal: demonstrate how caching (Redis) and asynchronous billing (Kafka) improve latency, throughput, and stability under load.

Method (High Level)

  • Tool: JMeter (HTTP test plan)
  • Workload: concurrent users hitting ride creation, search, status updates, billing queries
  • Datasets: thousands of drivers/customers/rides preloaded
  • Scenarios:
    • B = Baseline (MongoDB only)
    • B+S = Baseline + Redis cache
    • B+S+K = Redis + Kafka (async billing)

Results (Visuals)

  • Avg Response Times — B vs B+S vs B+S+K

  • Aggregate Report — per-API response times & throughput

  • Summary Report — overall metrics & success %

What the charts show (key takeaways)

  • Caching pays first. Moving from B → B+S (adding Redis) delivers the largest drop in average & p95 latency on read-heavy paths (driver search, history, stats) and increases throughput by offloading MongoDB.
  • Kafka stabilizes write flows. Moving from B+S → B+S+K (adding Kafka) makes completion→billing asynchronous, so ride completion latency stays low and predictable even during spikes; p95 tail improves and error rates drop.
  • Dashboards stay snappy. Admin analytics backed by Redis remain fast (<~200ms typical) while still reflecting fresh data via short TTLs + invalidation.
  • Resilience under load. With Kafka, backpressure is absorbed by the broker; Billing catches up without blocking ride flows.

Comparative Summary (trend view)

Scenario Avg Latency p95 Latency Throughput Error Rate
B higher spiky lower higher
B+S lower lower higher lower
B+S+K lowest most stable highest lowest

For exact values, see the three charts above.

Notes

  • Why Redis first? Hot-path reads dominate; caching yields immediate wins.
  • Why Kafka after Redis? It removes a synchronous dependency (billing) from the critical path, improving tail latency and reliability at peak.

1️⃣5️⃣ Security, Reliability & Scalability

Security

  • JWT auth with role claims (customer, driver, admin)
  • Passwords hashed with bcrypt
  • Input validation (IDs, ratings, geo coords, media size/type)
  • CORS restricted to trusted origins
  • Secrets from .env (never hardcoded)

Reliability

  • Idempotent billing (unique rideId)
  • Kafka at-least-once + DB upsert → no double-charging
  • Cache invalidation rules keep Redis consistent
  • Health checks (/healthz) & structured logs with traceId

Scalability

  • Microservices scale independently (e.g., rides-service during spikes)
  • Redis absorbs hot reads, offloading MongoDB
  • Kafka buffers bursts → billing catches up asynchronously
  • Docker/K8s ready for horizontal scaling

1️⃣6️⃣ Troubleshooting & FAQ

Kafka connection fails inside services

  • Use kafka:9092 inside Docker, not localhost.
  • From host tools (JMeter, CLI), use localhost:29092.

Redis not reachable

  • Check REDIS_URL (redis://redis:6379 in Docker).
  • Run docker compose ps to ensure container is up.

JWT errors (401 Unauthorized)

  • Token expired or missing Authorization: Bearer <JWT>.
  • Re-login to get a fresh token.

CORS issues in frontend

  • Add your dev origin (http://localhost:5173) to ALLOWED_ORIGINS in each service.

MongoDB slow or errors under load

  • Ensure indexes (rideId, driverId, customerId, createdAt) exist.
  • Use Redis cache for frequent reads (search, stats, billing lookups).

License

This project is licensed under the MIT License — see the LICENSE file for details.

About

A full-stack, event-driven Uber-like platform showcasing distributed systems design, load balancing, caching, and asynchronous workflows. Built with a modern microservices architecture, combining backend, ML, infra, and frontend technologies into one scalable system.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published