This repo is a tiny practical project that demonstrates some core distributed systems concepts:
- Latency vs Throughput
- Horizontal vs Vertical Scaling
- How Load Balancers work (Reverse Proxy, Forward Proxy)
- A hands-on demo using NGINX as a load balancer across two simple Node servers
- Meaning: The time taken for one request to travel from client → server → back to client.
- Think: How fast does one flashcard appear after I click “next”?
- Low latency = snappy, instant responses.
- High latency = laggy, delayed responses.
- Meaning: The number of requests handled per second (capacity).
- Think: How many students can click “next card” at the same time without overloading the system?
- High throughput = can serve many users simultaneously.
- Low throughput = requests pile up, users wait.
-
Vertical Scaling (Scale Up)
- Add more power (CPU, RAM) to a single machine.
- Easy to do, but limited — one machine can only get so big.
- Analogy: Hire one super-teacher for the entire school.
-
Horizontal Scaling (Scale Out)
- Add more machines with the same code and spread load across them.
- Virtually unlimited scaling.
- Needs load balancing.
- Analogy: Hire 10 normal teachers and share the students among them.
Benefits & Use Cases
- Vertical → good for small apps or quick fixes.
- Horizontal → essential for large systems like Google, Amazon, Netflix.
A Load Balancer (LB) is like a traffic cop 🚦:
- Clients send requests to the LB (one public address).
- LB forwards each request to one of many backend servers.
- Spreads the load, improves reliability, and hides server details.
- Sits in front of backend servers.
- Clients don’t know about the actual servers, only the proxy.
- Can do load balancing, SSL termination, caching, etc.
- Example: NGINX in this repo.
- Sits in front of clients.
- Client sends request to proxy → proxy fetches data on their behalf.
- Often used for privacy, filtering, or caching.
- Example: A school proxy that filters websites for students.
Load balancers use different strategies to decide which server should handle each request. The most popular ones are:
-
Round Robin (default in NGINX)
- Requests are sent to servers one by one in a cycle (A → B → C → A → B …).
- Simple and effective when servers are similar in capacity.
-
Least Connections
- Sends traffic to the server with the fewest active connections.
- Useful when requests vary in length (some short, some long).
-
IP Hash (Sticky Sessions)
- The client’s IP determines which server they always connect to.
- Ensures the same user keeps hitting the same server (important if server stores session in memory).
-
Weighted Round Robin
- Each server is assigned a “weight” (e.g., Server A = 2, Server B = 1).
- Higher-weighted servers get more requests.
- Useful if some servers are more powerful than others.
This is a mini demo of horizontal scaling with load balancing:
-
Created two identical Node.js servers running on different ports (
3000
and3001
).- Each server replies with its port number so we know which one handled the request.
-
Configured NGINX as a reverse proxy / load balancer on port
8080
.- Requests to
http://localhost:8080/
are forwarded to Node server 3000 or 3001. - NGINX uses round robin (default) to alternate between servers.
- Requests to
-
Tested with Postman and curl.
- Repeated requests show alternating responses:
- This simulates horizontal scaling at a tiny level — multiple servers sharing load with a load balancer in front.
- Even though this demo runs only on your laptop, it mirrors how big companies (Google, Amazon, Netflix) scale to millions of users.
- The same principles (load balancing, horizontal scaling, reducing latency, increasing throughput) apply whether you’re running 2 servers or 20,000.
- Start two Node servers:
PORT=3000 node backend/server.js PORT=3001 node backend/server.js