# **Chapter 6: Load Balancing & Traffic Management**

Modern distributed systems must handle millions of concurrent requests while maintaining high availability and low latency. Load balancing distributes traffic across multiple servers to prevent any single server from becoming a bottleneck. This chapter explores load balancing at different layers, algorithms for distributing traffic, health monitoring, and advanced patterns like API gateways and service mesh.

---

## **6.1 Introduction to Load Balancing**

**Load Balancer**: A reverse proxy that distributes incoming network traffic across a group of backend servers (server pool or server farm). It acts as a traffic cop, routing client requests to available servers capable of fulfilling those requests.

**Why Load Balancing Matters**:
```
Without Load Balancer:
┌─────────────┐         ┌─────────────┐
│   Client 1  │────────>│  Server     │
│   Client 2  │────────>│  (Single)   │
│   Client 3  │────────>│  100% Load  │
│   ...       │         │  Overwhelmed│
│   Client N  │────────>│  Crashes    │
└─────────────┘         └─────────────┘

With Load Balancer:
┌─────────────┐         ┌─────────────┐         ┌─────────────┐
│   Client 1  │         │  Server 1   │         │  Server 2   │
│   Client 2  │────────>│  33% Load   │         │  33% Load   │
│   Client 3  │    │    └─────────────┘    │    └─────────────┘
│   ...       │    │                        │
│   Client N  │────┤    ┌─────────────┐    └──>┌─────────────┐
└─────────────┘    └───>│ Load        │        │  Server 3   │
                        │ Balancer    │───────>│  34% Load   │
                        └─────────────┘        └─────────────┘

Benefits:
- No single point of overload
- Horizontal scaling (add more servers)
- High availability (if one fails, others continue)
- Better performance (distribute load)
```

**Load Balancer Responsibilities**:
1. **Traffic Distribution**: Distribute requests evenly across servers
2. **Health Monitoring**: Check server health, remove unhealthy servers
3. **Session Persistence**: Route same client to same server (if needed)
4. **SSL/TLS Termination**: Handle encryption/decryption
5. **Compression**: Compress responses to reduce bandwidth
6. **Caching**: Cache responses to reduce backend load

---

## **6.2 Layer 4 vs. Layer 7 Load Balancing**

Load balancers operate at different layers of the OSI model, offering different capabilities and performance characteristics.

### **Layer 4 (Transport Layer) Load Balancing**

**Concept**: Operates at the transport layer (TCP/UDP). Makes routing decisions based on IP address and port number, without inspecting the actual content of the message.

**How It Works**:
```
Client Request                    Layer 4 LB                     Backend Servers
┌─────────────────┐              ┌─────────────────┐            ┌─────────────┐
│ IP: 203.0.113.1 │─────────────>│ Inspect:        │            │ Server 1    │
│ Port: 54321     │   TCP SYN    │ - Source IP     │───────────>│ 10.0.1.10   │
│                 │              │ - Dest IP       │   TCP SYN  │             │
│                 │              │ - Source Port   │            │             │
│                 │              │ - Dest Port     │            └─────────────┘
│                 │              │                 │
│                 │              │ Decision: Route │            ┌─────────────┐
│                 │              │ based on IP:Port│            │ Server 2    │
│                 │              │ (No content     │            │ 10.0.1.11   │
│                 │              │ inspection)     │            │             │
└─────────────────┘              └─────────────────┘            └─────────────┘

Characteristics:
- Fast (no content inspection)
- Simple (just NAT - Network Address Translation)
- Protocol agnostic (works with any TCP/UDP protocol)
- Can't make content-based decisions
```

**Implementation** (HAProxy Layer 4):
```haproxy
# HAProxy configuration for Layer 4 (TCP) load balancing
global
    maxconn 4096

defaults
    mode tcp                    # Layer 4 mode
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend tcp_frontend
    bind *:3306                 # Listen on port 3306 (MySQL)
    default_backend mysql_servers

backend mysql_servers
    balance roundrobin          # Round-robin algorithm
    server mysql1 10.0.1.10:3306 check
    server mysql2 10.0.1.11:3306 check
    server mysql3 10.0.1.12:3306 check
```

**Implementation** (NGINX Layer 4/Stream module):
```nginx
# nginx.conf - Layer 4 load balancing using stream module
stream {
    upstream mysql_backend {
        server 10.0.1.10:3306;
        server 10.0.1.11:3306;
        server 10.0.1.12:3306;
    }

    server {
        listen 3306;
        proxy_pass mysql_backend;
        proxy_timeout 3s;
        proxy_connect_timeout 1s;
    }
}
```

**Implementation** (Python socket example):
```python
import socket
import select
import threading

class Layer4LoadBalancer:
    def __init__(self, listen_host, listen_port, backend_servers):
        self.listen_host = listen_host
        self.listen_port = listen_port
        self.backend_servers = backend_servers
        self.current_index = 0
        self.lock = threading.Lock()
    
    def get_next_backend(self):
        """Round-robin selection of backend server"""
        with self.lock:
            backend = self.backend_servers[self.current_index]
            self.current_index = (self.current_index + 1) % len(self.backend_servers)
            return backend
    
    def handle_client(self, client_socket):
        """Handle client connection by forwarding to backend"""
        backend_host, backend_port = self.get_next_backend()
        
        try:
            # Connect to backend
            backend_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            backend_socket.connect((backend_host, backend_port))
            
            # Forward data in both directions
            self.forward_data(client_socket, backend_socket)
            
        except Exception as e:
            print(f"Error connecting to backend {backend_host}:{backend_port}: {e}")
        finally:
            client_socket.close()
            if 'backend_socket' in locals():
                backend_socket.close()
    
    def forward_data(self, client_socket, backend_socket):
        """Forward data between client and backend"""
        sockets = [client_socket, backend_socket]
        
        while True:
            readable, _, _ = select.select(sockets, [], [], 1)
            
            for sock in readable:
                data = sock.recv(4096)
                if not data:
                    return
                
                if sock is client_socket:
                    # Client -> Backend
                    backend_socket.send(data)
                else:
                    # Backend -> Client
                    client_socket.send(data)
    
    def start(self):
        """Start the load balancer"""
        server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        server.bind((self.listen_host, self.listen_port))
        server.listen(5)
        
        print(f"Layer 4 Load Balancer listening on {self.listen_host}:{self.listen_port}")
        
        while True:
            client_socket, address = server.accept()
            print(f"Connection from {address}")
            
            # Handle client in new thread
            thread = threading.Thread(target=self.handle_client, args=(client_socket,))
            thread.start()

# Usage
if __name__ == "__main__":
    backends = [
        ("10.0.1.10", 3306),  # MySQL Server 1
        ("10.0.1.11", 3306),  # MySQL Server 2
        ("10.0.1.12", 3306),  # MySQL Server 3
    ]
    
    lb = Layer4LoadBalancer("0.0.0.0", 3306, backends)
    lb.start()
```

**Advantages**:
- **High Performance**: No content inspection (just packet forwarding)
- **Low Latency**: Minimal processing overhead
- **Protocol Agnostic**: Works with any TCP/UDP protocol (MySQL, Redis, custom protocols)
- **Secure**: Can't inspect content (privacy advantage in some cases)
- **DDoS Resilience**: Can handle higher traffic volumes

**Disadvantages**:
- **No Content-Based Routing**: Can't route based on URL, headers, or content
- **No Caching**: Can't cache responses (doesn't understand HTTP)
- **No SSL Offloading**: Must pass encrypted traffic through (or terminate at backend)
- **No HTTP-Specific Features**: No cookie handling, compression, redirects

**Use Cases**:
- Database load balancing (MySQL, PostgreSQL, MongoDB)
- Cache clusters (Redis, Memcached)
- Custom TCP protocols
- Real-time gaming servers (UDP)
- When maximum performance is required

---

### **Layer 7 (Application Layer) Load Balancing**

**Concept**: Operates at the application layer (HTTP/HTTPS). Makes routing decisions based on content of the message—URL, headers, cookies, or body content.

**How It Works**:
```
Client Request                    Layer 7 LB                     Backend Servers
┌─────────────────┐              ┌─────────────────┐            ┌─────────────┐
│ GET /api/users  │─────────────>│ Inspect:        │            │ API Server  │
│ Host: api.com   │   HTTP       │ - URL (/api/*)  │───────────>│ 10.0.1.10   │
│ Cookie: sess=abc│   Request    │ - Host header   │            │             │
│                 │              │ - Cookies       │            │             │
│                 │              │ - Headers       │            └─────────────┘
│                 │              │ - Body content  │
│                 │              │                 │            ┌─────────────┐
│                 │              │ Decision: Route │            │ Web Server  │
│                 │              │ based on URL    │───────────>│ 10.0.1.11   │
│                 │              │ /static/* ->    │            │ (Static)    │
│                 │              │ /api/* ->       │            │             │
└─────────────────┘              └─────────────────┘            └─────────────┘

Characteristics:
- Slower (content inspection required)
- Smart routing (based on URL, headers, cookies)
- Can cache responses
- SSL termination
- Compression
- HTTP-specific features
```

**Implementation** (NGINX Layer 7):
```nginx
# nginx.conf - Layer 7 (HTTP) load balancing
http {
    upstream api_servers {
        least_conn;                    # Least connections algorithm
        server 10.0.1.10:8080 weight=3;  # API Server 1 (higher capacity)
        server 10.0.1.11:8080 weight=2;  # API Server 2
        server 10.0.1.12:8080 backup;    # API Server 3 (backup only)
        
        keepalive 32;                  # Keep connections alive
    }
    
    upstream static_servers {
        server 10.0.1.20:80;
        server 10.0.1.21:80;
    }
    
    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
    
    server {
        listen 80;
        server_name api.example.com;
        
        # SSL Termination
        listen 443 ssl;
        ssl_certificate /etc/nginx/ssl/cert.pem;
        ssl_certificate_key /etc/nginx/ssl/key.pem;
        
        # Compression
        gzip on;
        gzip_types application/json text/css application/javascript;
        
        # Caching
        proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m;
        
        # Route based on URL path
        location /api/ {
            limit_req zone=api_limit burst=20 nodelay;
            
            proxy_pass http://api_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            
            # Health check
            health_check interval=5s fails=3 passes=2;
        }
        
        location /static/ {
            proxy_pass http://static_servers;
            expires 30d;              # Cache static assets
            add_header Cache-Control "public, immutable";
        }
        
        # Redirect HTTP to HTTPS
        if ($scheme != "https") {
            return 301 https://$host$request_uri;
        }
    }
}
```

**Implementation** (HAProxy Layer 7):
```haproxy
# HAProxy Layer 7 configuration
global
    maxconn 4096
    daemon

defaults
    mode http                    # Layer 7 mode
    timeout connect 5s
    timeout client 50s
    timeout server 50s
    option httpchk GET /health   # HTTP health check

frontend http_frontend
    bind *:80
    bind *:443 ssl crt /etc/haproxy/certs.pem
    
    # ACLs (Access Control Lists) for routing
    acl is_api path_beg /api
    acl is_static path_beg /static
    acl is_mobile hdr_sub(User-Agent) Mobile|Android|iPhone
    
    # Routing rules
    use_backend api_servers if is_api
    use_backend static_servers if is_static
    use_backend mobile_servers if is_mobile
    
    default_backend web_servers

backend api_servers
    balance roundrobin
    option httpchk GET /api/health
    server api1 10.0.1.10:8080 check weight 3
    server api2 10.0.1.11:8080 check weight 2
    server api3 10.0.1.12:8080 check weight 2 backup

backend static_servers
    balance leastconn
    server static1 10.0.1.20:80 check
    server static2 10.0.1.21:80 check

backend mobile_servers
    server mobile1 10.0.1.30:80 check

backend web_servers
    balance source              # IP Hash (sticky sessions)
    server web1 10.0.1.40:80 check
    server web2 10.0.1.41:80 check
```

**Implementation** (Python Flask-based Layer 7 LB):
```python
from flask import Flask, request, Response
import requests
import random
import hashlib

app = Flask(__name__)

# Backend configurations
BACKENDS = {
    'api': [
        'http://10.0.1.10:8080',
        'http://10.0.1.11:8080',
        'http://10.0.1.12:8080'
    ],
    'static': [
        'http://10.0.1.20:80',
        'http://10.0.1.21:80'
    ]
}

class Layer7LoadBalancer:
    def __init__(self):
        self.api_index = 0
        self.sessions = {}  # Session affinity storage
    
    def get_backend(self, service, request):
        """Select backend based on routing logic"""
        if service == 'api':
            # Round-robin for API
            backend = BACKENDS['api'][self.api_index]
            self.api_index = (self.api_index + 1) % len(BACKENDS['api'])
            return backend
        
        elif service == 'static':
            # Random for static assets
            return random.choice(BACKENDS['static'])
        
        elif service == 'sticky':
            # IP Hash for session affinity
            client_ip = request.remote_addr
            hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
            index = hash_val % len(BACKENDS['api'])
            return BACKENDS['api'][index]
    
    def route_request(self, request):
        """Route request based on URL path and headers"""
        path = request.path
        
        # Content-based routing
        if path.startswith('/api/'):
            backend = self.get_backend('api', request)
        elif path.startswith('/static/'):
            backend = self.get_backend('static', request)
        else:
            # Default to sticky session for web
            backend = self.get_backend('sticky', request)
        
        # Forward request
        target_url = f"{backend}{path}"
        
        # Copy headers
        headers = {key: value for key, value in request.headers}
        headers['X-Forwarded-For'] = request.remote_addr
        
        # Forward request to backend
        try:
            resp = requests.request(
                method=request.method,
                url=target_url,
                headers=headers,
                data=request.get_data(),
                cookies=request.cookies,
                timeout=5
            )
            
            # Create response
            response = Response(
                resp.content,
                status=resp.status_code,
                headers=dict(resp.headers)
            )
            return response
            
        except requests.exceptions.RequestException as e:
            return Response(f"Backend error: {str(e)}", status=502)

lb = Layer7LoadBalancer()

@app.route('/', defaults={'path': ''})
@app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE'])
def catch_all(path):
    """Catch-all route to load balancer"""
    return lb.route_request(request)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=80)
```

**Advantages**:
- **Smart Routing**: Route based on URL, headers, cookies, or content
- **SSL/TLS Termination**: Handle encryption at load balancer (reduces backend load)
- **Caching**: Cache responses to reduce backend load
- **Compression**: Compress responses (gzip/brotli)
- **Content Switching**: Different backends for different content types
- **Security**: WAF (Web Application Firewall), DDoS protection, rate limiting
- **Analytics**: Log and analyze HTTP traffic

**Disadvantages**:
- **Higher Latency**: Content inspection adds overhead
- **Resource Intensive**: More CPU/memory required than Layer 4
- **Protocol Specific**: Only works with HTTP/HTTPS (or specific L7 protocols)
- **Complexity**: More complex configuration and debugging

**Use Cases**:
- Web applications (HTTP/HTTPS)
- Microservices routing (path-based)
- SSL termination
- Content caching
- A/B testing (route based on headers/cookies)
- Mobile vs. Desktop routing
- API gateways

---

### **Comparison: Layer 4 vs. Layer 7**

```
┌───────────────────────┬────────────────────────┬────────────────────────┐
│ Feature               │ Layer 4 (Transport)    │ Layer 7 (Application)  │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ OSI Layer             │ Transport (TCP/UDP)    │ Application (HTTP)     │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ Routing Decision      │ IP + Port              │ URL, Headers, Cookies  │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ Content Inspection    │ No                     │ Yes                    │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ Performance           │ High (simple NAT)      │ Lower (inspection)     │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ Latency               │ Low (~1ms)             │ Higher (~5-10ms)       │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ SSL/TLS Termination   │ No (pass-through)      │ Yes                    │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ Caching               │ No                     │ Yes                    │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ Compression           │ No                     │ Yes                    │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ Protocol Support      │ Any TCP/UDP            │ HTTP/HTTPS primarily   │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ Session Affinity      │ IP-based only          │ Cookie-based, IP-based │
├───────────────────────┼────────────────────────┼────────────────────────┤
│ Use Cases             │ Databases, Cache,      │ Web apps, APIs,        │
│                       │ Custom protocols       │ Microservices          │
└───────────────────────┴────────────────────────┴────────────────────────┘
```

---

## **6.3 Load Balancing Algorithms**

Load balancers use different algorithms to determine which backend server receives the next request. The choice of algorithm depends on your use case, server capabilities, and traffic patterns.

### **Round Robin**

**Concept**: Distribute requests sequentially to each server in the list. After reaching the end, start again from the first server.

**How It Works**:
```
Request 1 -> Server 1
Request 2 -> Server 2
Request 3 -> Server 3
Request 4 -> Server 1
Request 5 -> Server 2
Request 6 -> Server 3
...
```

**Implementation**:
```python
class RoundRobinBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.current_index = 0
        self.lock = threading.Lock()
    
    def get_server(self):
        with self.lock:
            server = self.servers[self.current_index]
            self.current_index = (self.current_index + 1) % len(self.servers)
            return server

# Usage
servers = ['10.0.1.10', '10.0.1.11', '10.0.1.12']
balancer = RoundRobinBalancer(servers)

for i in range(6):
    print(f"Request {i+1} -> {balancer.get_server()}")

# Output:
# Request 1 -> 10.0.1.10
# Request 2 -> 10.0.1.11
# Request 3 -> 10.0.1.12
# Request 4 -> 10.0.1.10
# Request 5 -> 10.0.1.11
# Request 6 -> 10.0.1.12
```

**Advantages**:
- **Simple**: Easy to understand and implement
- **Fair**: Equal distribution among servers
- **Stateless**: No tracking required

**Disadvantages**:
- **Ignores Server Load**: Doesn't consider current connections or response time
- **No Session Affinity**: Same client may hit different servers
- **Uneven if Weights Differ**: Doesn't account for server capacity differences

**When to Use**:
- When all servers have equal capacity
- When requests are roughly equal in processing time
- Stateless applications

---

### **Weighted Round Robin**

**Concept**: Like round robin, but servers with higher weights receive proportionally more requests.

**How It Works**:
```
Servers:
- Server 1: Weight 3
- Server 2: Weight 2
- Server 3: Weight 1

Distribution:
Request 1 -> Server 1
Request 2 -> Server 1
Request 3 -> Server 1
Request 4 -> Server 2
Request 5 -> Server 2
Request 6 -> Server 3
Request 7 -> Server 1
...
```

**Implementation**:
```python
class WeightedRoundRobinBalancer:
    def __init__(self, servers_with_weights):
        """
        servers_with_weights: [('10.0.1.10', 3), ('10.0.1.11', 2), ('10.0.1.12', 1)]
        """
        self.servers = []
        for server, weight in servers_with_weights:
            # Add server 'weight' times to the list
            self.servers.extend([server] * weight)
        
        self.current_index = 0
        self.lock = threading.Lock()
    
    def get_server(self):
        with self.lock:
            server = self.servers[self.current_index]
            self.current_index = (self.current_index + 1) % len(self.servers)
            return server

# Usage
servers = [('10.0.1.10', 3), ('10.0.1.11', 2), ('10.0.1.12', 1)]
balancer = WeightedRoundRobinBalancer(servers)

distribution = {}
for i in range(60):
    server = balancer.get_server()
    distribution[server] = distribution.get(server, 0) + 1

print("Distribution:", distribution)
# Output: {'10.0.1.10': 30, '10.0.1.11': 20, '10.0.1.12': 10} (3:2:1 ratio)
```

**Advantages**:
- **Capacity Aware**: Higher capacity servers get more traffic
- **Simple**: Still relatively simple to implement
- **Flexible**: Easy to adjust weights based on server performance

**When to Use**:
- When servers have different capacities (CPU, memory)
- When you want to gradually introduce new servers (start with low weight)
- When some servers are faster than others

---

### **Least Connections**

**Concept**: Route request to the server with the fewest active connections. Assumes current connections indicate current load.

**How It Works**:
```
Current Connections:
- Server 1: 10 connections
- Server 2: 5 connections
- Server 3: 8 connections

Next Request -> Server 2 (fewest connections)

After routing:
- Server 1: 10 connections
- Server 2: 6 connections (+1)
- Server 3: 8 connections
```

**Implementation**:
```python
import threading

class LeastConnectionsBalancer:
    def __init__(self, servers):
        self.servers = {server: 0 for server in servers}  # connection counts
        self.lock = threading.Lock()
    
    def get_server(self):
        with self.lock:
            # Find server with minimum connections
            min_server = min(self.servers, key=self.servers.get)
            self.servers[min_server] += 1
            return min_server
    
    def release_connection(self, server):
        """Call when connection is closed"""
        with self.lock:
            if self.servers[server] > 0:
                self.servers[server] -= 1
    
    def get_stats(self):
        return dict(self.servers)

# Usage
servers = ['10.0.1.10', '10.0.1.11', '10.0.1.12']
balancer = LeastConnectionsBalancer(servers)

# Simulate connections
for i in range(10):
    server = balancer.get_server()
    print(f"Request {i+1} -> {server} (connections: {balancer.get_stats()})")

# Note: In real implementation, you'd call release_connection() when done
```

**Advantages**:
- **Load Aware**: Considers current server load
- **Dynamic**: Adapts to changing traffic patterns
- **Better for Long Connections**: Good for WebSocket, database connections

**Disadvantages**:
- **Connection Count ≠ Load**: A server with few connections might be processing heavy requests
- **Overhead**: Requires tracking connection counts
- **Uneven Distribution**: Can lead to uneven distribution if connections have different durations

**When to Use**:
- When connections have variable durations
- Long-lived connections (WebSockets, streaming)
- When server load correlates with connection count

---

### **Least Response Time**

**Concept**: Route request to the server with the fastest response time. Combines least connections with response time metrics.

**How It Works**:
```
Metrics:
- Server 1: 5 connections, avg response time 100ms
- Server 2: 3 connections, avg response time 200ms
- Server 3: 4 connections, avg response time 50ms

Next Request -> Server 3 (fastest response time, despite having 4 connections)
```

**Implementation**:
```python
import time
import threading
from collections import deque

class LeastResponseTimeBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.response_times = {server: deque(maxlen=10) for server in servers}  # Keep last 10
        self.lock = threading.Lock()
    
    def record_response_time(self, server, response_time):
        """Record response time for a server"""
        with self.lock:
            self.response_times[server].append(response_time)
    
    def get_average_response_time(self, server):
        """Calculate average response time"""
        times = self.response_times[server]
        return sum(times) / len(times) if times else float('inf')
    
    def get_server(self):
        with self.lock:
            # Find server with lowest average response time
            best_server = min(self.servers, key=self.get_average_response_time)
            return best_server

# Usage with timing
servers = ['10.0.1.10', '10.0.1.11', '10.0.1.12']
balancer = LeastResponseTimeBalancer(servers)

# Simulate requests
for i in range(10):
    server = balancer.get_server()
    
    start_time = time.time()
    # ... make request to server ...
    time.sleep(0.1)  # Simulated processing
    response_time = time.time() - start_time
    
    balancer.record_response_time(server, response_time)
    print(f"Request {i+1} -> {server} (avg response: {balancer.get_average_response_time(server):.3f}s)")
```

**Advantages**:
- **Performance Aware**: Routes to fastest server
- **Adaptive**: Adapts to server performance changes
- **Optimal User Experience**: Users get fastest responses

**Disadvantages**:
- **Measurement Overhead**: Requires tracking response times
- **Fluctuation**: Response times can fluctuate, causing thrashing
- **Cold Start**: New servers have no history (assumed slow)

**When to Use**:
- When server performance varies
- When you want to optimize for response time
- When servers have different hardware capabilities

---

### **IP Hash (Source IP Hash)**

**Concept**: Hash the client's IP address to determine which server receives the request. Same client always goes to same server (session affinity/sticky sessions).

**How It Works**:
```
Client IP: 203.0.113.45
Hash: hash(203.0.113.45) % 3 = 1

Request from 203.0.113.45 -> Server 2 (index 1)

All subsequent requests from same IP -> Server 2
```

**Implementation**:
```python
import hashlib

class IPHashBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.num_servers = len(servers)
    
    def get_server(self, client_ip):
        """Get server based on client IP hash"""
        # Create hash of IP address
        hash_obj = hashlib.md5(client_ip.encode())
        hash_val = int(hash_obj.hexdigest(), 16)
        
        # Map hash to server index
        server_index = hash_val % self.num_servers
        return self.servers[server_index]

# Usage
servers = ['10.0.1.10', '10.0.1.11', '10.0.1.12']
balancer = IPHashBalancer(servers)

# Test with different IPs
test_ips = ['203.0.113.1', '203.0.113.2', '203.0.113.1', '198.51.100.5', '203.0.113.1']

for ip in test_ips:
    server = balancer.get_server(ip)
    print(f"Client {ip} -> {server}")

# Output:
# Client 203.0.113.1 -> 10.0.1.11
# Client 203.0.113.2 -> 10.0.1.12
# Client 203.0.113.1 -> 10.0.1.11 (same as first!)
# Client 198.51.100.5 -> 10.0.1.10
# Client 203.0.113.1 -> 10.0.1.11 (same again - sticky!)
```

**Advantages**:
- **Session Affinity**: Same client always hits same server (good for sessions)
- **Stateless LB**: No need to track sessions at load balancer
- **Even Distribution**: Good distribution if IP addresses are random

**Disadvantages**:
- **Uneven Distribution**: If many clients behind NAT (same public IP), one server gets overloaded
- **No Failover**: If server fails, clients on that server lose sessions
- **Caching Issues**: If server cache differs, users see inconsistent data

**When to Use**:
- When session state is stored on server (not in database)
- When you need sticky sessions without cookies
- When clients are well-distributed across IP ranges

---

### **Consistent Hashing (for Distributed Systems)**

**Concept**: We covered this in Chapter 2, but it's crucial for load balancing too. Maps clients to servers on a hash ring, minimizing remapping when servers are added/removed.

**Use Case**: Distributed caching, distributed databases, CDNs.

**Implementation**:
```python
import hashlib
import bisect

class ConsistentHashBalancer:
    def __init__(self, servers=None, replicas=100):
        self.replicas = replicas  # Virtual nodes per server
        self.ring = []  # Sorted list of hash values
        self.nodes = {}  # hash -> server mapping
        
        if servers:
            for server in servers:
                self.add_server(server)
    
    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    def add_server(self, server):
        """Add server to ring"""
        for i in range(self.replicas):
            virtual_node = f"{server}:{i}"
            hash_val = self._hash(virtual_node)
            self.nodes[hash_val] = server
            bisect.insort(self.ring, hash_val)
    
    def remove_server(self, server):
        """Remove server from ring"""
        for i in range(self.replicas):
            virtual_node = f"{server}:{i}"
            hash_val = self._hash(virtual_node)
            if hash_val in self.nodes:
                del self.nodes[hash_val]
                self.ring.remove(hash_val)
    
    def get_server(self, client_key):
        """Get server for client key"""
        if not self.ring:
            return None
        
        hash_val = self._hash(client_key)
        
        # Find first server clockwise from hash
        idx = bisect.bisect_right(self.ring, hash_val)
        if idx == len(self.ring):
            idx = 0
        
        return self.nodes[self.ring[idx]]

# Usage
servers = ['10.0.1.10', '10.0.1.11', '10.0.1.12']
balancer = ConsistentHashBalancer(servers)

# Test distribution
distribution = {}
for i in range(1000):
    client = f"client_{i}"
    server = balancer.get_server(client)
    distribution[server] = distribution.get(server, 0) + 1

print("Distribution:", distribution)

# Add new server (minimal remapping)
balancer.add_server('10.0.1.13')

# Check how many clients moved
moved = 0
for i in range(1000):
    client = f"client_{i}"
    new_server = balancer.get_server(client)
    # Compare with old assignment...
```

---

## **6.4 Health Checks and Circuit Breakers**

Load balancers must know which backend servers are healthy and available. Health checks monitor server status, while circuit breakers prevent cascading failures.

### **Health Checks**

**Active Health Checks**: Load balancer periodically pings servers to check health.

**Types**:
1. **TCP Check**: Try to establish TCP connection
2. **HTTP Check**: Send HTTP request, check response code
3. **Custom Check**: Application-specific health endpoint

**Implementation** (NGINX):
```nginx
upstream backend {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
    
    # Health check
    health_check interval=5s fails=3 passes=2 
                 uri=/health 
                 http_200;
}

server {
    location / {
        proxy_pass http://backend;
    }
}
```

**Implementation** (Python):
```python
import threading
import requests
import time
from enum import Enum

class ServerStatus(Enum):
    HEALTHY = "healthy"
    UNHEALTHY = "unhealthy"
    CHECKING = "checking"

class HealthChecker:
    def __init__(self, servers, check_interval=5, timeout=2):
        self.servers = {server: ServerStatus.HEALTHY for server in servers}
        self.check_interval = check_interval
        self.timeout = timeout
        self.healthy_servers = set(servers)
        self.lock = threading.Lock()
        
        # Start health check thread
        self.checker_thread = threading.Thread(target=self._health_check_loop, daemon=True)
        self.checker_thread.start()
    
    def _check_server(self, server):
        """Check if server is healthy"""
        try:
            # HTTP health check
            response = requests.get(f"http://{server}/health", timeout=self.timeout)
            return response.status_code == 200
        except:
            return False
    
    def _health_check_loop(self):
        """Continuously check server health"""
        while True:
            for server in list(self.servers.keys()):
                is_healthy = self._check_server(server)
                
                with self.lock:
                    current_status = self.servers[server]
                    
                    if is_healthy and current_status != ServerStatus.HEALTHY:
                        # Server recovered
                        self.servers[server] = ServerStatus.HEALTHY
                        self.healthy_servers.add(server)
                        print(f"Server {server} is now HEALTHY")
                    
                    elif not is_healthy and current_status == ServerStatus.HEALTHY:
                        # Server failed
                        self.servers[server] = ServerStatus.UNHEALTHY
                        self.healthy_servers.discard(server)
                        print(f"Server {server} is now UNHEALTHY")
            
            time.sleep(self.check_interval)
    
    def get_healthy_servers(self):
        """Return list of healthy servers"""
        with self.lock:
            return list(self.healthy_servers)

# Usage
servers = ['10.0.1.10:8080', '10.0.1.11:8080', '10.0.1.12:8080']
health_checker = HealthChecker(servers)

# Get only healthy servers for load balancing
healthy_servers = health_checker.get_healthy_servers()
balancer = RoundRobinBalancer(healthy_servers)
```

---

### **Circuit Breakers**

**Concept**: Prevent cascading failures by temporarily rejecting requests to failing services, giving them time to recover.

**States**:
1. **Closed**: Normal operation, requests pass through
2. **Open**: Failure threshold exceeded, requests fail fast (no call to service)
3. **Half-Open**: Testing if service recovered, limited requests allowed

**Implementation** (using `pybreaker` library):
```python
import pybreaker
import requests
from flask import Flask, jsonify

app = Flask(__name__)

# Circuit breaker configuration
circuit_breaker = pybreaker.CircuitBreaker(
    fail_max=5,      # Open after 5 failures
    reset_timeout=60,  # Try again after 60 seconds
    expected_exception=requests.RequestException
)

@circuit_breaker
def call_external_service():
    """Call external API with circuit breaker protection"""
    response = requests.get('http://external-api.com/data', timeout=2)
    return response.json()

@app.route('/data')
def get_data():
    try:
        data = call_external_service()
        return jsonify(data)
    except pybreaker.CircuitBreakerError:
        # Circuit is open - fail fast
        return jsonify({"error": "Service temporarily unavailable"}), 503
    except requests.RequestException:
        # Other request errors
        return jsonify({"error": "Service error"}), 502

# Manual implementation
class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = 'CLOSED'  # CLOSED, OPEN, HALF_OPEN
    
    def call(self, func, *args, **kwargs):
        """Call function with circuit breaker protection"""
        if self.state == 'OPEN':
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = 'HALF_OPEN'
                print("Circuit breaker entering HALF_OPEN state")
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise e
    
    def _on_success(self):
        """Handle successful call"""
        if self.state == 'HALF_OPEN':
            self.state = 'CLOSED'
            print("Circuit breaker CLOSED (service recovered)")
        self.failure_count = 0
    
    def _on_failure(self):
        """Handle failed call"""
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = 'OPEN'
            print(f"Circuit breaker OPENED after {self.failure_count} failures")

# Usage
cb = CircuitBreaker(failure_threshold=3, recovery_timeout=30)

def make_request():
    # This might fail
    response = requests.get('http://unreliable-service.com')
    return response.json()

try:
    result = cb.call(make_request)
except Exception as e:
    print(f"Request failed: {e}")
```

---

## **6.5 Global Server Load Balancing (GSLB) and Geo-DNS**

**GSLB**: Distributes traffic across multiple geographically distributed data centers based on proximity, load, or health.

**How It Works**:
```
User in Tokyo requests api.example.com
    │
    ▼
DNS Server (GSLB)
    │
    ├─► Checks user's location (via DNS resolver IP)
    ├─► Checks health of data centers
    ├─► Checks load of data centers
    │
    └─► Returns IP of Tokyo Data Center (closest, healthy, low load)

User connects to Tokyo Data Center (low latency: 20ms)

Alternative:
User in New York requests api.example.com
    │
    ▼
DNS Server (GSLB)
    │
    └─► Returns IP of Virginia Data Center (closest to New York)

User connects to Virginia Data Center (low latency: 30ms)
```

**Implementation** (AWS Route 53):
```python
import boto3

# Route 53 configuration for GSLB
route53 = boto3.client('route53')

# Create latency-based routing policy
response = route53.change_resource_record_sets(
    HostedZoneId='ZONE_ID',
    ChangeBatch={
        'Changes': [
            {
                'Action': 'CREATE',
                'ResourceRecordSet': {
                    'Name': 'api.example.com',
                    'Type': 'A',
                    'SetIdentifier': 'Tokyo',
                    'Region': 'ap-northeast-1',  # Asia Pacific (Tokyo)
                    'HealthCheckId': 'tokyo-health-check-id',
                    'TTL': 60,
                    'ResourceRecords': [{'Value': '203.0.113.10'}]
                }
            },
            {
                'Action': 'CREATE',
                'ResourceRecordSet': {
                    'Name': 'api.example.com',
                    'Type': 'A',
                    'SetIdentifier': 'Virginia',
                    'Region': 'us-east-1',  # US East (Virginia)
                    'HealthCheckId': 'virginia-health-check-id',
                    'TTL': 60,
                    'ResourceRecords': [{'Value': '198.51.100.10'}]
                }
            },
            {
                'Action': 'CREATE',
                'ResourceRecordSet': {
                    'Name': 'api.example.com',
                    'Type': 'A',
                    'SetIdentifier': 'Ireland',
                    'Region': 'eu-west-1',  # Europe (Ireland)
                    'HealthCheckId': 'ireland-health-check-id',
                    'TTL': 60,
                    'ResourceRecords': [{'Value': '192.0.2.10'}]
                }
            }
        ]
    }
)
```

---

## **6.6 API Gateway Pattern**

**API Gateway**: A single entry point for all clients, handling cross-cutting concerns like authentication, rate limiting, caching, and routing to microservices.

**Architecture**:
```
Clients (Web, Mobile, 3rd Party)
    │
    ▼
┌─────────────────────────────────────┐
│         API Gateway                  │
│  ┌───────────────────────────────┐  │
│  │ Authentication & Authorization│  │
│  │ Rate Limiting                 │  │
│  │ SSL Termination               │  │
│  │ Caching                       │  │
│  │ Request/Response Transformation│ │
│  │ Routing                       │  │
│  └───────────────────────────────┘  │
└────────┬──────────┬─────────────────┘
         │          │
         ▼          ▼
   User Service  Order Service
   (Users)       (Orders)
         │
         ▼
   Inventory Service
   (Inventory)
```

**Implementation** (Kong API Gateway):
```yaml
# kong.yml configuration
services:
  - name: user-service
    url: http://user-service:8080
    routes:
      - name: user-routes
        paths:
          - /api/users
    plugins:
      - name: rate-limiting
        config:
          minute: 100
      - name: jwt
        config:
          uri_param_names: []
          cookie_names: []
          key_claim_name: iss
          secret_is_base64: false
          claims_to_verify:
            - exp

  - name: order-service
    url: http://order-service:8080
    routes:
      - name: order-routes
        paths:
          - /api/orders
    plugins:
      - name: rate-limiting
        config:
          minute: 50
      - name: proxy-cache
        config:
          content_type:
            - application/json
          cache_ttl: 300
          strategy: memory
```

---

## **6.7 Service Mesh Introduction**

**Service Mesh**: Infrastructure layer that handles service-to-service communication, providing observability, security, and reliability without changing application code.

**Architecture** (Istio/Linkerd):
```
Application Pod                    Application Pod
┌───────────────────────┐         ┌───────────────────────┐
│  App Container        │         │  App Container        │
│  (Your Application)   │<───────>│  (Your Application)   │
│                      │         │                      │
└──────────┬────────────┘         └──────────┬────────────┘
           │                                 │
           │ Sidecar Proxy                   │ Sidecar Proxy
           │ (Envoy)                         │ (Envoy)
           │                                 │
           └───────────────┬─────────────────┘
                           │
                    ┌──────▼──────┐
                    │  Control    │
                    │  Plane      │
                    │ (Istiod)    │
                    └─────────────┘

Features:
- mTLS (automatic encryption between services)
- Traffic routing (canary deployments, A/B testing)
- Observability (metrics, tracing, logs)
- Circuit breaking, retries, timeouts
- No code changes required in application
```

**Example** (Istio Traffic Management):
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: user-service
spec:
  hosts:
  - user-service
  http:
  - route:
    - destination:
        host: user-service
        subset: v1
      weight: 90
    - destination:
        host: user-service
        subset: v2
      weight: 10
    timeout: 5s
    retries:
      attempts: 3
      perTryTimeout: 2s
```

---

## **6.8 Key Takeaways**

1. **Layer 4 for performance, Layer 7 for features**: Use Layer 4 (TCP/UDP) for maximum performance and protocol flexibility. Use Layer 7 (HTTP) for content-based routing, SSL termination, and caching.

2. **Choose algorithms wisely**: Round Robin for equal capacity, Weighted Round Robin for heterogeneous servers, Least Connections for long-lived connections, IP Hash for session affinity.

3. **Health checks are essential**: Active health checks prevent routing to failed servers. Circuit breakers prevent cascading failures.

4. **GSLB for global scale**: Use Geo-DNS or latency-based routing to direct users to the nearest healthy data center.

5. **API Gateway for cross-cutting concerns**: Centralize authentication, rate limiting, and routing in the gateway, keeping microservices focused on business logic.

6. **Service Mesh for microservices**: When you have many services, use a service mesh (Istio, Linkerd) to handle communication, security, and observability without code changes.

---

## **Chapter Summary**

In this chapter, we explored load balancing and traffic management—the critical infrastructure that enables high availability and scalability in distributed systems. We compared Layer 4 (transport) and Layer 7 (application) load balancing, understanding when each is appropriate.

We examined various load balancing algorithms (Round Robin, Weighted, Least Connections, IP Hash) and their use cases. We covered health checks and circuit breakers for maintaining system resilience, and explored Global Server Load Balancing (GSLB) for distributing traffic across geographic regions.

We introduced the API Gateway pattern for centralizing cross-cutting concerns, and touched on Service Mesh (Istio, Linkerd) as a modern approach to handling service-to-service communication in microservices architectures.

**Coming up next**: In Chapter 7, we'll explore Microservices Architecture—covering monolithic vs. microservices, service decomposition, inter-service communication, service discovery, and the challenges of distributed systems.

---

**Exercises**:

1. **Load Balancer Selection**: For each scenario, would you use Layer 4 or Layer 7 load balancing? Which algorithm?
   - Database cluster (PostgreSQL read replicas)
   - Web application with static assets and API endpoints
   - WebSocket chat application requiring session affinity
   - Microservices architecture with 50 different services

2. **Circuit Breaker Implementation**: Implement a circuit breaker for a payment service that:
   - Opens after 3 consecutive failures
   - Enters half-open state after 30 seconds
   - Closes after 2 consecutive successes in half-open state
   - Tracks failure statistics

3. **GSLB Design**: You're designing a global application with users in North America, Europe, and Asia. Each region has a data center. Design a GSLB strategy that:
   - Routes users to nearest healthy data center
   - Fails over to next nearest if primary is down
   - Handles data sovereignty (EU data stays in EU)

4. **API Gateway Configuration**: Design an API Gateway configuration for an e-commerce platform with:
   - Public APIs (rate limited, cached)
   - Partner APIs (authenticated, different rate limits)
   - Internal APIs (no rate limiting, service mesh)

5. **Load Balancing Math**: You have 3 servers with capacities:
   - Server A: 32 CPU cores, 64GB RAM (high capacity)
   - Server B: 16 CPU cores, 32GB RAM (medium capacity)
   - Server C: 8 CPU cores, 16GB RAM (low capacity)
   
   If using Weighted Round Robin, what weights would you assign? If you expect 10,000 requests and Server B fails after 5,000 requests, how many requests does each server handle?

---


<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='5. message_queues_and_event_driven_architecture.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='../3. Distributes_systems_fundamentals/7. communication_protocols_and_data_formats.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
