# **Chapter 4: Caching – Speed at Scale**

In system design, caching is the most impactful optimization you can implement. A well-designed caching strategy can improve performance by 10-1000x while reducing infrastructure costs significantly. This chapter explores caching patterns, eviction policies, distributed caching, CDNs, and cache consistency strategies.

---

## **4.1 Introduction to Caching**

**Caching**: Storing frequently accessed data in fast storage to reduce access time and improve performance.

### **Why Cache Matters**

**The Speed Gap**:
```
Storage Type        Access Time    Cost per GB
────────────────────────────────────────────────
CPU Registers       0.1-1 ns       $100,000+
CPU L1 Cache        1-4 ns         $10,000
CPU L2 Cache        4-10 ns        $5,000
CPU L3 Cache        10-50 ns       $500
RAM (Main Memory)   100 ns         $10
SSD                 100,000 ns     $0.50
HDD                 10,000,000 ns  $0.05

Insight: RAM is 1,000x faster than SSD and 100,000x faster than HDD.
```

**Real-World Example**: Amazon's Prime Day
```
Without caching:
- 1 million requests per second
- Each request queries database (100ms)
- Total database load: 100,000 concurrent queries (database crashes)

With caching (95% cache hit rate):
- 1 million requests per second
- 950,000 requests served from cache (0.1ms)
- 50,000 requests query database (100ms)
- Total database load: 5,000 concurrent queries (database happy)

Result: 20x less database load, 1000x faster response time for 95% of requests
```

### **The Caching Hierarchy**

```
┌─────────────────────────────────────────────────────────────┐
│                    Application Layer                         │
├─────────────────────────────────────────────────────────────┤
│                   Browser Cache (Client)                     │
│  - Static assets (images, CSS, JS)                           │
│  - HTTP cache headers (Cache-Control, ETag)                  │
├─────────────────────────────────────────────────────────────┤
│                  CDN Cache (Edge)                            │
│  - Geographically distributed edge servers                   │
│  - Static content and API responses                          │
├─────────────────────────────────────────────────────────────┤
│              Application Server Cache (Local)                │
│  - In-memory cache (LRU, LFU)                                │
│  - Hot data frequently accessed                              │
├─────────────────────────────────────────────────────────────┤
│                Distributed Cache (Redis/Memcached)            │
│  - Shared cache across application servers                    │
│  - Session data, user profiles, computed results             │
├─────────────────────────────────────────────────────────────┤
│                    Database Cache                            │
│  - Query cache (MySQL query cache)                           │
│  - Buffer pool (PostgreSQL shared buffers)                   │
│  - Index cache (B-Tree nodes in memory)                      │
├─────────────────────────────────────────────────────────────┤
│                    Disk Cache (OS)                           │
│  - Page cache (Linux file system cache)                      │
│  - Buffer cache (frequently accessed disk blocks)            │
├─────────────────────────────────────────────────────────────┤
│                   Storage (SSD/HDD)                          │
│  - Persistent storage (slowest layer)                        │
└─────────────────────────────────────────────────────────────┘

Each layer is faster but more expensive (or smaller).
Optimal strategy: Keep data as high as possible in the hierarchy.
```

### **Benefits of Caching**

1. **Reduced Latency**: Serving from RAM (0.1ms) vs. disk (10ms) or network (100ms)
2. **Reduced Database Load**: Fewer queries mean less expensive database infrastructure
3. **Improved Scalability**: Application servers can handle more requests with less backend pressure
4. **Cost Reduction**: Cache servers (Redis) are cheaper than database servers
5. **Better User Experience**: Faster response times mean happier users

### **Caching Trade-offs**

1. **Complexity**: Caching adds complexity to application code
2. **Stale Data**: Caches may serve outdated data
3. **Consistency**: Keeping caches synchronized with data sources is challenging
4. **Memory Costs**: In-memory caching requires significant RAM
5. **Cache Invalidation**: Determining when to invalidate caches is non-trivial

---

## **4.2 Caching Patterns**

Caching patterns describe how data moves between the cache, application, and primary data store.

### **Cache-Aside (Lazy Loading)**

**Concept**: Application code manages the cache. On cache miss, load data from database and populate cache.

**How It Works**:
```
1. Application needs data
2. Check cache
   ├─→ Cache hit: Return data from cache (fast!)
   └─→ Cache miss: Query database, populate cache, return data
3. Next request: Data is in cache (cache hit)
```

**Implementation**:
```python
import redis
import json
import time

# Redis connection
redis_client = redis.Redis(host='localhost', port=6379, db=0)

# Database connection (simulated)
database = {
    'user:123': {
        'id': 123,
        'name': 'Alice Johnson',
        'email': 'alice@example.com',
        'created_at': '2024-01-15T08:30:00Z'
    }
}

def get_user(user_id):
    cache_key = f'user:{user_id}'
    
    # Step 1: Try to get from cache
    cached_data = redis_client.get(cache_key)
    if cached_data:
        print(f"Cache HIT for {cache_key}")
        return json.loads(cached_data)
    
    # Step 2: Cache miss - get from database
    print(f"Cache MISS for {cache_key}")
    user_data = database.get(f'user:{user_id}')
    if user_data is None:
        return None  # User doesn't exist
    
    # Step 3: Populate cache for next request
    # Set expiration to 1 hour (3600 seconds)
    redis_client.setex(cache_key, 3600, json.dumps(user_data))
    
    return user_data

# First request: Cache miss (loads from database)
user = get_user(123)
print(f"User: {user['name']}")  # Output: "User: Alice Johnson"

# Second request: Cache hit (loads from Redis)
user = get_user(123)
print(f"User: {user['name']}")  # Output: "User: Alice Johnson"
```

**Advantages**:
- **Simple**: Easy to implement and understand
- **On-demand**: Only cache what's actually accessed
- **Flexible**: Application controls cache population

**Disadvantages**:
- **Thundering herd**: Many simultaneous cache misses can overwhelm database
- **Stale data**: Cache may contain outdated data until expiration
- **Code duplication**: Cache logic embedded in application code

**When to Use**:
- Read-heavy workloads
- Data accessed frequently
- When you need control over what's cached

---

### **Read-Through**

**Concept**: Cache library manages cache population. Application only interacts with cache; cache handles cache misses.

**How It Works**:
```
1. Application requests data from cache
2. Cache checks if data exists
   ├─→ Cache hit: Return data
   └─→ Cache miss: Cache loads from database, populates itself, returns data
3. Application doesn't know about database (only interacts with cache)
```

**Implementation** (using Redis with read-through):
```python
class ReadThroughCache:
    def __init__(self, redis_client, database):
        self.redis_client = redis_client
        self.database = database
        self.default_ttl = 3600  # 1 hour
    
    def get(self, key, loader=None, ttl=None):
        # Check cache
        cached_data = self.redis_client.get(key)
        if cached_data:
            print(f"Cache HIT for {key}")
            return json.loads(cached_data)
        
        # Cache miss - load from database
        print(f"Cache MISS for {key}")
        
        # Use provided loader or default database lookup
        data = loader(key) if loader else self.database.get(key)
        if data is None:
            return None
        
        # Populate cache
        self.redis_client.setex(key, ttl or self.default_ttl, json.dumps(data))
        return data
    
    def set(self, key, data, ttl=None):
        self.redis_client.setex(key, ttl or self.default_ttl, json.dumps(data))
    
    def delete(self, key):
        self.redis_client.delete(key)

# Usage
cache = ReadThroughCache(redis_client, database)

# Custom loader function
def load_user(user_id):
    # Simulate database query
    return database.get(f'user:{user_id}')

# Application only interacts with cache (doesn't know about database)
user = cache.get(f'user:{123}', loader=load_user)
print(f"User: {user['name']}")
```

**Advantages**:
- **Simpler application code**: Application doesn't handle cache misses
- **Consistent caching behavior**: All cache operations go through cache library
- **Reduced code duplication**: Cache logic centralized in cache library

**Disadvantages**:
- **Less control**: Application has less control over cache behavior
- **Complex cache library**: Cache library must handle database interaction
- **Potential bottlenecks**: Cache library becomes single point of failure

**When to Use**:
- When you want to simplify application code
- When multiple parts of application access same data
- When you need consistent caching behavior across application

---

### **Write-Through**

**Concept**: Write to both cache and database synchronously. Cache is always consistent with database.

**How It Works**:
```
1. Application updates data
2. Application writes to cache
3. Application writes to database
4. Both writes complete synchronously before returning

Result: Cache and database always in sync
```

**Implementation**:
```python
class WriteThroughCache:
    def __init__(self, redis_client, database):
        self.redis_client = redis_client
        self.database = database
    
    def update_user(self, user_id, user_data):
        # Update cache synchronously
        cache_key = f'user:{user_id}'
        print(f"Updating cache for {cache_key}")
        self.redis_client.setex(cache_key, 3600, json.dumps(user_data))
        
        # Update database synchronously
        print(f"Updating database for user:{user_id}")
        self.database[f'user:{user_id}'] = user_data
        
        # Both updates complete before returning
        return True
    
    def get_user(self, user_id):
        cache_key = f'user:{user_id}'
        
        # Try cache first
        cached_data = self.redis_client.get(cache_key)
        if cached_data:
            print(f"Reading from cache: {cache_key}")
            return json.loads(cached_data)
        
        # Cache miss - read from database
        print(f"Reading from database: user:{user_id}")
        user_data = self.database.get(f'user:{user_id}')
        if user_data:
            # Populate cache (optional, for future reads)
            self.redis_client.setex(cache_key, 3600, json.dumps(user_data))
        
        return user_data

# Usage
cache = WriteThroughCache(redis_client, database)

# Update user (writes to both cache and database)
cache.update_user(123, {
    'id': 123,
    'name': 'Alice Johnson',
    'email': 'alice@example.com',
    'updated_at': time.time()
})

# Read user (gets from cache - always up-to-date)
user = cache.get_user(123)
print(f"User: {user['name']}")
```

**Advantages**:
- **Data consistency**: Cache always reflects latest data
- **Read performance**: Reads always fast (data in cache)
- **Simple mental model**: Easy to reason about data consistency

**Disadvantages**:
- **Slower writes**: Each write requires two operations (cache + database)
- **Higher latency**: Write latency = cache write time + database write time
- **Potential cache pollution**: Unneeded data cached (if data never read)

**When to Use**:
- When data consistency is critical
- When reads significantly outnumber writes
- When you need guaranteed up-to-date data on reads

---

### **Write-Behind (Write-Back)**

**Concept**: Write to cache immediately, asynchronously persist to database. Database is updated later in batches.

**How It Works**:
```
1. Application updates data
2. Application writes to cache (synchronous, fast)
3. Cache queues write to database (asynchronous, slower)
4. Database updated later (batched)
5. Application returns immediately (doesn't wait for database write)

Result: Fast writes, but potential data loss if cache fails before database write
```

**Implementation**:
```python
import queue
import threading

class WriteBehindCache:
    def __init__(self, redis_client, database):
        self.redis_client = redis_client
        self.database = database
        self.write_queue = queue.Queue()
        self.worker_thread = threading.Thread(target=self._write_worker, daemon=True)
        self.worker_thread.start()
    
    def _write_worker(self):
        """Background thread to process write queue"""
        while True:
            operation = self.write_queue.get()
            if operation is None:  # Poison pill
                break
            
            # Process write operation
            op_type, key, data = operation
            if op_type == 'set':
                self.database[key] = data
                print(f"Persisted to database: {key}")
            
            self.write_queue.task_done()
    
    def set(self, key, data):
        """Write to cache immediately (synchronous)"""
        # Write to cache (synchronous)
        self.redis_client.setex(key, 3600, json.dumps(data))
        
        # Queue database write (asynchronous)
        self.write_queue.put(('set', key, data))
        
        # Return immediately (don't wait for database write)
        return True
    
    def get(self, key):
        """Read from cache (should be up-to-date)"""
        cached_data = self.redis_client.get(key)
        if cached_data:
            return json.loads(cached_data)
        
        # Cache miss - read from database
        data = self.database.get(key)
        if data:
            # Populate cache (for future reads)
            self.redis_client.setex(key, 3600, json.dumps(data))
        
        return data
    
    def shutdown(self):
        """Wait for pending writes to complete"""
        self.write_queue.join()
        self.write_queue.put(None)  # Poison pill
        self.worker_thread.join()

# Usage
cache = WriteBehindCache(redis_client, database)

# Update user (fast - doesn't wait for database write)
cache.set('user:123', {
    'id': 123,
    'name': 'Alice Johnson',
    'email': 'alice@example.com'
})
print("User updated (immediate return)")

# Read user (gets from cache)
user = cache.get('user:123')
print(f"User: {user['name']}")

# Later, when shutting down, ensure all writes are persisted
cache.shutdown()
```

**Advantages**:
- **Very fast writes**: No database write latency (returns immediately)
- **Batched writes**: Database writes can be batched for efficiency
- **Reduced database load**: Fewer database operations

**Disadvantages**:
- **Data loss risk**: If cache fails before database write, data is lost
- **Complexity**: Requires background thread/process for writes
- **Eventual consistency**: Data in database may be stale until write completes
- **Write ordering**: Writes may be processed out of order (need queue ordering)

**When to Use**:
- When write latency is critical (real-time systems)
- When you can tolerate potential data loss (non-critical data)
- When you need to reduce database load (write-heavy workloads)

---

### **Refresh-Ahead**

**Concept**: Proactively refresh cache entries before they expire, ensuring cache is always populated.

**How It Works**:
```
1. Application reads from cache
2. If cache entry is about to expire (e.g., within 10% of TTL), refresh it
3. Refresh happens asynchronously in background
4. Next request gets fresh data (no cache miss)

Result: Cache always populated (no cache misses for hot data)
```

**Implementation**:
```python
import time
import threading

class RefreshAheadCache:
    def __init__(self, redis_client, database, refresh_threshold=0.1):
        self.redis_client = redis_client
        self.database = database
        self.refresh_threshold = refresh_threshold  # Refresh when 10% of TTL remaining
    
    def _refresh_if_needed(self, key, ttl):
        """Refresh cache entry if it's about to expire"""
        ttl_remaining = self.redis_client.ttl(key)
        
        # If TTL remaining is less than threshold, refresh
        if ttl_remaining > 0 and ttl_remaining < (ttl * self.refresh_threshold):
            print(f"Refreshing cache for {key} (TTL remaining: {ttl_remaining}s)")
            
            # Get data from database
            user_id = key.split(':')[1]
            data = self.database.get(f'user:{user_id}')
            if data:
                # Refresh cache with new TTL
                self.redis_client.setex(key, ttl, json.dumps(data))
    
    def get(self, key, loader=None, ttl=3600):
        # Try to get from cache
        cached_data = self.redis_client.get(key)
        if cached_data:
            print(f"Cache HIT for {key}")
            
            # Refresh if needed (in background)
            self._refresh_if_needed(key, ttl)
            
            return json.loads(cached_data)
        
        # Cache miss - load from database
        print(f"Cache MISS for {key}")
        
        # Load data
        data = loader(key) if loader else self.database.get(key)
        if data is None:
            return None
        
        # Populate cache
        self.redis_client.setex(key, ttl, json.dumps(data))
        return data

# Usage
cache = RefreshAheadCache(redis_client, database)

def load_user(key):
    user_id = key.split(':')[1]
    return database.get(f'user:{user_id}')

# First request: Cache miss
user = cache.get('user:123', loader=load_user, ttl=60)  # 60 second TTL

# Wait 50 seconds (50 of 60 seconds elapsed, within 10% threshold)
time.sleep(50)

# Next request: Cache hit, but refresh triggered in background
user = cache.get('user:123', loader=load_user, ttl=60)
print(f"User: {user['name']}")

# Wait 10 more seconds (TTL elapsed, but cache was refreshed)
time.sleep(10)

# Next request: Cache hit (no miss because of refresh-ahead)
user = cache.get('user:123', loader=load_user, ttl=60)
print(f"User: {user['name']}")
```

**Advantages**:
- **Eliminates cache misses**: Hot data always in cache
- **Better user experience**: No latency spikes from cache misses
- **Reduced database load**: Fewer cache misses mean fewer database queries

**Disadvantages**:
- **Complexity**: Requires background refresh logic
- **Wasted refreshes**: May refresh data that's never accessed again
- **Prediction difficulty**: Hard to predict which data will be accessed next

**When to Use**:
- For hot data accessed frequently
- When cache misses are expensive (slow queries)
- When user experience is critical (no latency spikes)

---

### **Pattern Comparison**

```
┌──────────────────┬───────────────┬──────────────┬────────────────┐
│     Pattern      │ Read Latency  │ Write Latency│ Consistency    │
├──────────────────┼───────────────┼──────────────┼────────────────┤
│ Cache-Aside      │ Miss: Slow    │ Normal       │ Eventual       │
│                  │ Hit: Fast     │              │                │
├──────────────────┼───────────────┼──────────────┼────────────────┤
│ Read-Through     │ Miss: Slow    │ Normal       │ Eventual       │
│                  │ Hit: Fast     │              │                │
├──────────────────┼───────────────┼──────────────┼────────────────┤
│ Write-Through    │ Fast          │ Slow         │ Strong         │
├──────────────────┼───────────────┼──────────────┼────────────────┤
│ Write-Behind     │ Fast          │ Very Fast    │ Eventual       │
├──────────────────┼───────────────┼──────────────┼────────────────┤
│ Refresh-Ahead    │ Fast          │ Normal       │ Strong (mostly)│
└──────────────────┴───────────────┴──────────────┴────────────────┘
```

---

## **4.3 Cache Eviction Policies**

Cache eviction policies determine which data to remove when cache is full. Understanding these policies is critical for maintaining cache efficiency.

### **LRU (Least Recently Used)**

**Concept**: Evict the item that hasn't been accessed for the longest time. Based on temporal locality—recently accessed items likely to be accessed again soon.

**How It Works**:
```
Cache capacity: 3 items
Items: A, B, C, D, E

1. Access A: Cache: [A]
2. Access B: Cache: [A, B]
3. Access C: Cache: [A, B, C]  (cache full)
4. Access D: Evict A (least recently used) → Cache: [B, C, D]
5. Access B: Cache: [C, D, B]  (B moved to most recently used)
6. Access E: Evict C (least recently used) → Cache: [D, B, E]

Access order: A → B → C → D → B → E
Eviction order: A, C, ...
```

**Implementation** (using Redis, which uses an approximation of LRU):
```python
import redis

redis_client = redis.Redis(host='localhost', port=6379, db=0)

# Set maxmemory and eviction policy
redis_client.config_set('maxmemory', '100mb')
redis_client.config_set('maxmemory-policy', 'allkeys-lru')  # LRU eviction

# Add items to cache
redis_client.set('user:1', '{"name": "Alice"}')
redis_client.set('user:2', '{"name": "Bob"}')
redis_client.set('user:3', '{"name": "Charlie"}')
redis_client.set('user:4', '{"name": "Dave"}')  # Evicts user:1 (LRU)

# Access user:2 (makes it most recently used)
redis_client.get('user:2')

# Add more items (evicts user:3 next)
redis_client.set('user:5', '{"name": "Eve"}')
redis_client.set('user:6', '{"name": "Frank"}')  # Evicts user:3
```

**LRU Implementation from Scratch**:
```python
from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()  # Maintains insertion order
    
    def get(self, key):
        # Move to end (most recently used)
        if key in self.cache:
            self.cache.move_to_end(key)
            return self.cache[key]
        return None
    
    def set(self, key, value):
        # Update existing or add new
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        
        # Evict if over capacity
        if len(self.cache) > self.capacity:
            # Pop from front (least recently used)
            self.cache.popitem(last=False)

# Usage
cache = LRUCache(3)

cache.set('A', 'Data A')
cache.set('B', 'Data B')
cache.set('C', 'Data C')
# Cache: [A, B, C]

cache.set('D', 'Data D')  # Evicts A
# Cache: [B, C, D]

print(cache.get('B'))  # Returns 'Data B', B becomes most recently used
# Cache: [C, D, B]

cache.set('E', 'Data E')  # Evicts C
# Cache: [D, B, E]
```

**Advantages**:
- **Intuitive**: Simple to understand—least recently used is least likely needed
- **Good for temporal locality**: Works well when recent access predicts future access
- **Efficient**: O(1) operations with proper implementation

**Disadvantages**:
- **Scan resistant**: Sequential scans can evict useful data
- **Requires access tracking**: Need to track access order
- **Memory overhead**: Maintains access order structure

**When to Use**:
- Workloads with strong temporal locality (access patterns cluster in time)
- General-purpose caching (good default choice)
- When you can't predict access patterns

---

### **LFU (Least Frequently Used)**

**Concept**: Evict the item with the lowest access frequency. Based on frequency locality—frequently accessed items are likely to be accessed again.

**How It Works**:
```
Cache capacity: 3 items
Items: A (accessed 5 times), B (accessed 3 times), C (accessed 2 times), D

1. Access A: Cache: [A:1]
2. Access A: Cache: [A:2]
3. Access B: Cache: [A:2, B:1]
4. Access C: Cache: [A:2, B:1, C:1]  (cache full)
5. Access D: Evict C (least frequently used) → Cache: [A:2, B:1, D:1]
6. Access A: Cache: [A:3, B:1, D:1]
7. Access E: Evict B and D (tie, evict least recently among them) → Cache: [A:3, E:1]

Access frequency: A (highest), B/D (tied), C (evicted)
```

**Implementation** (Redis LFU):
```python
import redis

redis_client = redis.Redis(host='localhost', port=6379, db=0)

# Configure Redis for LFU eviction
redis_client.config_set('maxmemory', '100mb')
redis_client.config_set('maxmemory-policy', 'allkeys-lfu')  # LFU eviction

# Add items
redis_client.set('user:1', '{"name": "Alice"}')
redis_client.set('user:2', '{"name": "Bob"}')
redis_client.set('user:3', '{"name": "Charlie"}')

# Access user:1 multiple times (increases its LFU counter)
for _ in range(10):
    redis_client.get('user:1')

# Access user:2 a few times
for _ in range(3):
    redis_client.get('user:2')

# Add new items (user:3 evicted first - least frequently accessed)
redis_client.set('user:4', '{"name": "Dave"}')
redis_client.set('user:5', '{"name": "Eve"}')

# user:1 (high frequency) still in cache
# user:2 (medium frequency) still in cache
# user:4 and user:5 in cache (user:3 evicted)
```

**LFU Implementation from Scratch**:
```python
import heapq
import time

class LFUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}  # key → (value, frequency, timestamp)
        self.frequency_heap = []  # (frequency, timestamp, key)
        self.timestamp = 0  # Monotonic counter for tie-breaking
    
    def _update_frequency(self, key):
        """Update frequency and reinsert into heap"""
        value, frequency, _ = self.cache[key]
        self.timestamp += 1
        self.cache[key] = (value, frequency + 1, self.timestamp)
        heapq.heappush(self.frequency_heap, (frequency + 1, self.timestamp, key))
    
    def get(self, key):
        if key not in self.cache:
            return None
        
        self._update_frequency(key)
        return self.cache[key][0]
    
    def set(self, key, value):
        # Update existing
        if key in self.cache:
            self.cache[key] = (value, self.cache[key][1], self.cache[key][2])
            self._update_frequency(key)
            return
        
        # Add new
        self.timestamp += 1
        self.cache[key] = (value, 1, self.timestamp)
        heapq.heappush(self.frequency_heap, (1, self.timestamp, key))
        
        # Evict if over capacity
        if len(self.cache) > self.capacity:
            # Find lowest frequency item
            while self.frequency_heap:
                freq, ts, k = self.frequency_heap[0]
                
                # Check if this entry is stale (frequency updated)
                if k in self.cache and self.cache[k][1] == freq and self.cache[k][2] == ts:
                    # Valid entry - evict it
                    heapq.heappop(self.frequency_heap)
                    del self.cache[k]
                    break
                else:
                    # Stale entry - skip
                    heapq.heappop(self.frequency_heap)

# Usage
cache = LFUCache(3)

cache.set('A', 'Data A')
cache.set('B', 'Data B')
cache.set('C', 'Data C')

# Access A multiple times
for _ in range(5):
    cache.get('A')

# Access B a few times
for _ in range(2):
    cache.get('B')

# Add D (evicts C - least frequently used)
cache.set('D', 'Data D')

# Access A (still in cache - high frequency)
print(cache.get('A'))  # Returns 'Data A'

# Add E (evicts B - second least frequently used)
cache.set('E', 'Data E')

print(cache.get('A'))  # Still 'Data A' (highest frequency)
print(cache.get('D'))  # Still 'Data D' (tied with E, but accessed first)
print(cache.get('E'))  # Still 'Data E'
print(cache.get('B'))  # None (evicted)
print(cache.get('C'))  # None (evicted)
```

**Advantages**:
- **Frequency-aware**: Keeps popular items in cache longer
- **Scan resistant**: Sequential scans don't evict popular items
- **Good for long-term patterns**: Works well when access patterns are stable

**Disadvantages**:
- **Memory overhead**: Tracks access frequency for each item
- **Cold start problem**: New items have low frequency, get evicted quickly
- **Frequency decay**: May keep old popular items that are no longer accessed

**When to Use**:
- Workloads with stable access patterns (some items always popular)
- When you want to protect popular items from eviction
- When access patterns are frequency-based (not time-based)

---

### **TTL (Time To Live)**

**Concept**: Each cache entry has an expiration time. When TTL expires, entry is evicted automatically.

**How It Works**:
```
Cache capacity: Unlimited (but each item expires)

1. Set A with TTL 60 seconds: Cache: [A (expires in 60s)]
2. Set B with TTL 120 seconds: Cache: [A (50s), B (120s)]
3. Wait 60 seconds: Cache: [B (60s)] (A expired)
4. Set C with TTL 30 seconds: Cache: [B (60s), C (30s)]
5. Wait 30 seconds: Cache: [B (30s)] (C expired)
```

**Implementation**:
```python
import time

class TTLCache:
    def __init__(self):
        self.cache = {}  # key → (value, expiration_time)
    
    def set(self, key, value, ttl_seconds):
        """Set key with TTL (time to live)"""
        expiration_time = time.time() + ttl_seconds
        self.cache[key] = (value, expiration_time)
    
    def get(self, key):
        """Get key if not expired"""
        if key not in self.cache:
            return None
        
        value, expiration_time = self.cache[key]
        
        # Check if expired
        if time.time() > expiration_time:
            del self.cache[key]
            return None
        
        return value
    
    def cleanup_expired(self):
        """Remove all expired entries"""
        current_time = time.time()
        expired_keys = [
            key for key, (_, exp_time) in self.cache.items()
            if current_time > exp_time
        ]
        for key in expired_keys:
            del self.cache[key]
        return len(expired_keys)

# Usage
cache = TTLCache()

# Set items with different TTLs
cache.set('user:1', '{"name": "Alice"}', ttl_seconds=60)  # 1 minute
cache.set('user:2', '{"name": "Bob"}', ttl_seconds=120)   # 2 minutes

# Get item before expiration
print(cache.get('user:1'))  # Returns '{"name": "Alice"}'

# Wait 70 seconds
time.sleep(70)

# user:1 expired, user:2 still valid
print(cache.get('user:1'))  # Returns None (expired)
print(cache.get('user:2'))  # Returns '{"name": "Bob"}'

# Cleanup expired entries
expired_count = cache.cleanup_expired()
print(f"Cleaned up {expired_count} expired entries")
```

**Redis TTL Example**:
```python
import redis
import time

redis_client = redis.Redis(host='localhost', port=6379, db=0)

# Set key with TTL
redis_client.setex('session:123', 'user_data', 3600)  # Expires in 1 hour

# Check TTL
ttl = redis_client.ttl('session:123')
print(f"Session expires in {ttl} seconds")

# Update TTL (refresh session)
redis_client.expire('session:123', 7200)  # Extend to 2 hours

# Wait and check again
time.sleep(60)
ttl = redis_client.ttl('session:123')
print(f"Session expires in {ttl} seconds")  # Should be ~7140 seconds
```

**Advantages**:
- **Automatic expiration**: No manual eviction needed
- **Freshness guarantee**: Data never older than TTL
- **Simple**: Easy to understand and implement

**Disadvantages**:
- **TTL selection**: Choosing optimal TTL is difficult
- **Cache stampede**: Many items expiring simultaneously causes load spikes
- **No reuse**: Items evicted even if they would be accessed again

**When to Use**:
- Data with natural expiration (sessions, tokens)
- Time-sensitive data (stock prices, weather data)
- When you want guaranteed freshness

---

### **Random Replacement**

**Concept**: Randomly select an item to evict when cache is full. Simple but surprisingly effective in some scenarios.

**How It Works**:
```
Cache capacity: 3 items
Items: A, B, C, D, E

1. Add A: Cache: [A]
2. Add B: Cache: [A, B]
3. Add C: Cache: [A, B, C]  (cache full)
4. Add D: Randomly evict C → Cache: [A, B, D]
5. Add E: Randomly evict B → Cache: [A, D, E]

Random eviction: C, B, ...
```

**Implementation**:
```python
import random

class RandomReplacementCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}  # key → value
    
    def get(self, key):
        return self.cache.get(key)
    
    def set(self, key, value):
        # Update existing or add new
        if key in self.cache:
            self.cache[key] = value
            return
        
        self.cache[key] = value
        
        # Evict random item if over capacity
        if len(self.cache) > self.capacity:
            # Choose random key to evict
            key_to_evict = random.choice(list(self.cache.keys()))
            del self.cache[key_to_evict]

# Usage
cache = RandomReplacementCache(3)

cache.set('A', 'Data A')
cache.set('B', 'Data B')
cache.set('C', 'Data C')

cache.set('D', 'Data D')  # Randomly evicts one of A, B, or C

print(cache.get('A'))  # Might be None if A was evicted
print(cache.get('B'))  # Might be None if B was evicted
print(cache.get('C'))  # Might be None if C was evicted
print(cache.get('D'))  # Returns 'Data D'
```

**Advantages**:
- **Simplest to implement**: No tracking needed
- **Fast**: O(1) operations with minimal overhead
- **Fair**: All items have equal chance of staying in cache

**Disadvantages**:
- **No intelligence**: Evicts potentially useful items
- **Poor performance**: Usually performs worse than LRU/LFU
- **Unpredictable**: Hard to reason about cache contents

**When to Use**:
- When simplicity is more important than performance
- When access patterns are truly random (no patterns)
- As a baseline for comparison with smarter policies

---

### **Eviction Policy Comparison**

```
┌───────────────────┬──────────────┬──────────────┬────────────────┐
│     Policy        │ Hit Rate     │ Complexity   │ Best For       │
├───────────────────┼──────────────┼──────────────┼────────────────┤
│ LRU               │ Good         │ Medium       │ General-purpose │
│ LFU               │ Very Good    │ High         │ Stable patterns │
│ TTL               │ Variable     │ Low          │ Time-sensitive  │
│ Random            │ Poor         │ Very Low     │ Random access   │
│ FIFO              │ Poor         │ Low          │ Simple use cases│
└───────────────────┴──────────────┴──────────────┴────────────────┘
```

---

## **4.4 Distributed Caching**

**Distributed Cache**: A cache that spans multiple machines, sharing the cache load and providing fault tolerance.

### **Why Distributed Caching?**

**Problem**: Single-machine cache has limits.
```
Single Cache Server:
- Memory: 64 GB (max affordable RAM)
- QPS: 100,000 (single machine limit)
- SPOF: Single point of failure (if it crashes, all cached data lost)

Distributed Cache:
- Memory: 64 GB × 10 servers = 640 GB (10x more)
- QPS: 100,000 × 10 servers = 1,000,000 (10x more)
- High availability: If one server fails, others continue serving
```

### **Redis Cluster: Distributed In-Memory Cache**

**Redis Cluster**: Distributed Redis implementation that automatically shards data across multiple nodes.

**Architecture**:
```
Application Servers
    │
    ├───► Redis Cluster
    │       │
    │       ├─── Node 1 (Shards 0-5460)
    │       ├─── Node 2 (Shards 5461-10922)
    │       └─── Node 3 (Shards 10923-16383)
    │
    └───► Application Servers (each knows which node has which shard)

Sharding: 16384 hash slots (shards)
- Each key is hashed to determine its slot
- Each node manages a range of slots
- Automatic failover if node fails
```

**Implementation**:
```python
import redis
from redis.cluster import RedisCluster

# Connect to Redis Cluster
redis_cluster = RedisCluster(
    startup_nodes=[
        {"host": "redis-node1", "port": 6379},
        {"host": "redis-node2", "port": 6379},
        {"host": "redis-node3", "port": 6379}
    ],
    decode_responses=True
)

# Set value (automatically routed to correct node)
redis_cluster.set('user:123', '{"name": "Alice"}')

# Get value (automatically routed to correct node)
user_data = redis_cluster.get('user:123')

# Pipeline (multiple operations, distributed across cluster)
pipe = redis_cluster.pipeline()
pipe.set('user:1', '{"name": "Alice"}')
pipe.set('user:2', '{"name": "Bob"}')
pipe.set('user:3', '{"name": "Charlie"}')
results = pipe.execute()

# Transaction (multi-key operation, must be on same node)
pipe = redis_cluster.pipeline(transaction=True)
pipe.set('counter:views', 0)
pipe.incr('counter:views')
pipe.incr('counter:views')
results = pipe.execute()

# Check which node has a key
slot = redis_cluster.keyslot('user:123')
node_info = redis_cluster.cluster_nodes()
print(f"user:123 is on slot {slot}")
```

**Redis Cluster Features**:
- **Automatic sharding**: Data distributed across nodes using hash slots
- **Automatic failover**: If master fails, replica promoted to master
- **Horizontal scaling**: Add nodes to increase capacity
- **Redis compatibility**: Same Redis commands and data structures
- **High availability**: Multiple replicas for each master

**Redis Cluster Limitations**:
- **Multi-key operations**: Keys must be in same slot (use hash tags)
- **Cross-slot transactions**: Not supported
- **Smaller keys**: Recommended to keep keys under 512 KB
- **Network partitions**: Requires majority of masters to be available

**Hash Tags**: Ensuring related keys are on same node.
```python
# Use hash tags {} to ensure keys are on same node
# Hash tags are only part of key used for hashing

# These keys will be on different nodes (no hash tag)
redis_cluster.set('user:123:name', 'Alice')
redis_cluster.set('user:123:email', 'alice@example.com')
# Problem: Can't use transaction (different nodes)

# These keys will be on same node (using hash tag)
redis_cluster.set('{user:123}:name', 'Alice')
redis_cluster.set('{user:123}:email', 'alice@example.com')
# Only {user:123} is used for hashing → same node → transaction works

# Transaction (multi-key operation on same node)
pipe = redis_cluster.pipeline(transaction=True)
pipe.set('{user:123}:name', 'Alice')
pipe.set('{user:123}:email', 'alice@example.com')
results = pipe.execute()  # Works!
```

---

### **Memcached: Simple Distributed Cache**

**Memcached**: High-performance, distributed memory object caching system. Simpler than Redis but less feature-rich.

**Architecture**:
```
Application Servers
    │
    ├───► Memcached Client
    │       │
    │       ├───► Consistent Hashing
    │       │       │
    │       │       ├─── Node 1
    │       │       ├─── Node 2
    │       │       └─── Node 3
    │       │
    │       └───► Routes key to correct node
    │
    └───► Application Servers (each has client library)

Key difference from Redis Cluster:
- No built-in clustering (client-side sharding)
- No persistence (purely in-memory)
- Simpler data structures (only key-value)
- Faster for simple use cases
```

**Implementation**:
```python
import memcache

# Connect to Memcached cluster
mc = memcache.Client(['memcached-node1:11211', 'memcached-node2:11211', 'memcached-node3:11211'])

# Set value
mc.set('user:123', '{"name": "Alice"}', time=3600)  # 1 hour TTL

# Get value
user_data = mc.get('user:123')

# Set multiple values
mc.set_multi({
    'user:1': '{"name": "Alice"}',
    'user:2': '{"name": "Bob"}',
    'user:3': '{"name": "Charlie"}'
}, time=3600)

# Get multiple values
users = mc.get_multi(['user:1', 'user:2', 'user:3'])

# Increment (atomic operation)
mc.set('counter:views', 0)
mc.incr('counter:views')
mc.incr('counter:views')
views = mc.get('counter:views')  # Returns 2

# Add (only sets if key doesn't exist)
mc.add('lock:resource:123', 'locked', time=10)  # 10 second lock

# Delete
mc.delete('user:123')
```

**Memcached vs. Redis**:
```
┌───────────────────────┬──────────────────┬─────────────────────┐
│ Feature               │ Memcached        │ Redis               │
├───────────────────────┼──────────────────┼─────────────────────┤
│ Data Structures       │ Key-value only   │ Rich (lists, sets,  │
│                       │                  │ hashes, sorted sets) │
├───────────────────────┼──────────────────┼─────────────────────┤
│ Persistence           │ None             │ RDB, AOF            │
├───────────────────────┼──────────────────┼─────────────────────┤
│ Replication           │ None             │ Master-slave        │
├───────────────────────┼──────────────────┼─────────────────────┤
│ Clustering            │ Client-side      │ Built-in            │
├───────────────────────┼──────────────────┼─────────────────────┤
│ Memory Usage          │ Lower            │ Higher              │
├───────────────────────┼──────────────────┼─────────────────────┤
│ Performance           │ Very Fast        │ Fast                │
├───────────────────────┼──────────────────┼─────────────────────┤
│ Complexity            │ Simple           │ More complex        │
└───────────────────────┴──────────────────┴─────────────────────┘
```

---

### **Consistent Hashing in Distributed Caches**

**Problem**: When adding/removing cache nodes, most keys need to be remapped.

**Solution**: Consistent hashing—minimizes key remapping when nodes change.

**How It Works**:
```
Ring Visualization:

                    Key: user:123
                          ↓
    ┌────────────────────────────────────────┐
    │            Ring                         │
    │                                        │
    │  Key: user:456      Node A            │
    │       ↓               ↑                │
    │  Node B ───► Node C ───► Node D       │
    │      ↑                   ↓             │
    │  Key: user:789   Key: user:000       │
    │                                        │
    └────────────────────────────────────────┘

Key assignment rule: Each key assigned to next node clockwise

Adding Node E:
- Node E added between C and D
- Only keys between C and E move to E
- Other keys stay on same nodes (minimal remapping)
```

**Implementation**:
```python
import hashlib
from bisect import bisect_left

class ConsistentHash:
    def __init__(self, nodes=None, replicas=100):
        """Initialize consistent hash ring
        
        Args:
            nodes: List of node identifiers
            replicas: Number of virtual nodes per physical node
        """
        self.replicas = replicas
        self.ring = []
        self.nodes = {}  # hash → node
        
        if nodes:
            for node in nodes:
                self.add_node(node)
    
    def _hash(self, key):
        """Hash key to integer"""
        return int(hashlib.md5(key.encode('utf-8')).hexdigest(), 16)
    
    def add_node(self, node):
        """Add node to ring"""
        for i in range(self.replicas):
            # Create virtual node: "node:0", "node:1", etc.
            virtual_node = f"{node}:{i}"
            hash_value = self._hash(virtual_node)
            self.nodes[hash_value] = node
            self.ring.append(hash_value)
        
        self.ring.sort()  # Keep ring sorted
    
    def remove_node(self, node):
        """Remove node from ring"""
        for i in range(self.replicas):
            virtual_node = f"{node}:{i}"
            hash_value = self._hash(virtual_node)
            if hash_value in self.nodes:
                del self.nodes[hash_value]
                self.ring.remove(hash_value)
    
    def get_node(self, key):
        """Get node for key"""
        if not self.ring:
            return None
        
        hash_value = self._hash(key)
        
        # Find first node clockwise from hash_value
        index = bisect_left(self.ring, hash_value)
        if index == len(self.ring):
            # Wrap around to first node
            index = 0
        
        hash_value = self.ring[index]
        return self.nodes[hash_value]

# Usage
cache_ring = ConsistentHash()

# Add nodes
cache_ring.add_node('cache-node-1')
cache_ring.add_node('cache-node-2')
cache_ring.add_node('cache-node-3')

# Get node for key
node = cache_ring.get_node('user:123')
print(f"user:123 should be cached on {node}")

node = cache_ring.get_node('user:456')
print(f"user:456 should be cached on {node}")

# Add new node (minimal key remapping)
cache_ring.add_node('cache-node-4')

# Most keys still on same nodes
node = cache_ring.get_node('user:123')
print(f"user:123 still on {node}")

# Remove node (minimal key remapping)
cache_ring.remove_node('cache-node-2')

# Most keys still on same nodes (except those on cache-node-2)
node = cache_ring.get_node('user:123')
print(f"user:123 now on {node}")
```

**Benefits of Consistent Hashing**:
- **Minimal remapping**: Adding/removing nodes only affects nearby keys
- **Balanced distribution**: Virtual nodes ensure even distribution
- **Scalability**: Easy to add/remove nodes dynamically
- **Fault tolerance**: If node fails, its keys are redistributed to neighbors

---

### **Client-Side vs. Server-Side Sharding**

**Client-Side Sharding**: Application determines which cache node to use.

**Implementation**:
```python
class ClientSideShardedCache:
    def __init__(self, nodes):
        """Initialize with list of cache nodes
        
        Args:
            nodes: List of (host, port) tuples
        """
        self.nodes = nodes
        self.ring = ConsistentHash([f"{host}:{port}" for host, port in nodes])
        self.connections = {f"{host}:{port}": redis.Redis(host=host, port=port)
                           for host, port in nodes}
    
    def _get_connection(self, key):
        """Get Redis connection for key"""
        node = self.ring.get_node(key)
        return self.connections[node]
    
    def set(self, key, value, ttl=3600):
        """Set key-value pair (routes to correct node)"""
        conn = self._get_connection(key)
        conn.setex(key, ttl, value)
    
    def get(self, key):
        """Get value for key (routes to correct node)"""
        conn = self._get_connection(key)
        return conn.get(key)

# Usage
cache = ClientSideShardedCache([
    ('cache-node-1', 6379),
    ('cache-node-2', 6379),
    ('cache-node-3', 6379)
])

# These operations are routed to correct nodes automatically
cache.set('user:123', '{"name": "Alice"}')
user = cache.get('user:123')
```

**Server-Side Sharding**: Cache infrastructure determines node placement (Redis Cluster, Memcached).

**Comparison**:
```
┌────────────────────────┬─────────────────────┬──────────────────────┐
│ Feature                │ Client-Side         │ Server-Side          │
├────────────────────────┼─────────────────────┼──────────────────────┤
│ Complexity             │ Application         │ Cache infrastructure │
│                        │ manages sharding    │ manages sharding     │
├────────────────────────┼─────────────────────┼──────────────────────┤
│ Flexibility            │ High (custom        │ Low (fixed           │
│                        │ sharding logic)     │ sharding logic)      │
├────────────────────────┼─────────────────────┼──────────────────────┤
│ Transparency           │ Low (app knows      │ High (app sees       │
│                        │ about nodes)        │ single cache)        │
├────────────────────────┼─────────────────────┼──────────────────────┤
│ Failover               │ Manual (app must    │ Automatic (cache     │
│                        │ handle)             │ handles)             │
├────────────────────────┼─────────────────────┼──────────────────────┤
│ Multi-key operations   │ Difficult (keys     │ Possible if keys     │
│                        │ on different nodes) │ on same node         │
├────────────────────────┼─────────────────────┼──────────────────────┤
│ Example                │ Custom client       │ Redis Cluster,       │
│                        │ sharding            │ Memcached            │
└────────────────────────┴─────────────────────┴──────────────────────┘
```

---

## **4.5 CDN Architecture**

**CDN (Content Delivery Network)**: Geographically distributed network of servers that deliver content to users based on their geographic location, reducing latency and improving performance.

### **What is a CDN?**

**Concept**: Store copies of content on edge servers closer to users. When user requests content, it's served from the nearest edge server, not the origin server.

**How It Works**:
```
User in Tokyo requests video
    │
    ▼
DNS resolves to Tokyo CDN edge (not origin in US)
    │
    ▼
CDN edge checks if content cached
    ├─→ Cache hit: Serve from Tokyo edge (10ms)
    └─→ Cache miss: Fetch from origin, cache, serve (200ms)

Next request from Tokyo user: Served from Tokyo edge (10ms)
User in New York requests same video
    │
    ▼
DNS resolves to New York CDN edge
    │
    ▼
CDN edge checks if content cached
    ├─→ Cache hit: Serve from New York edge (20ms)
    └─→ Cache miss: Fetch from origin, cache, serve (150ms)
```

**Without CDN**:
```
User in Tokyo → Origin server in US (200ms)
User in London → Origin server in US (100ms)
User in New York → Origin server in US (30ms)
Average latency: 110ms
```

**With CDN**:
```
User in Tokyo → Tokyo CDN edge (10ms)
User in London → London CDN edge (15ms)
User in New York → New York CDN edge (20ms)
Average latency: 15ms (7x improvement!)
```

---

### **CDN Caching Strategies**

**1. Static Content Caching**

**Static Content**: Content that doesn't change frequently (images, CSS, JavaScript, videos).

**Implementation**:
```html
<!-- HTML file -->
<!DOCTYPE html>
<html>
<head>
    <title>My Website</title>
    
    <!-- CSS - cached for 1 year -->
    <link rel="stylesheet" href="https://cdn.example.com/styles.css">
    
    <!-- JavaScript - cached for 1 year with cache busting -->
    <script src="https://cdn.example.com/app.v1.2.3.js"></script>
</head>
<body>
    <!-- Images - cached for 1 year -->
    <img src="https://cdn.example.com/logo.png" alt="Logo">
    
    <!-- Content -->
    <h1>Hello, World!</h1>
</body>
</html>
```

**HTTP Headers for Static Content**:
```http
HTTP/1.1 200 OK
Content-Type: text/css
Cache-Control: public, max-age=31536000, immutable
ETag: "abc123"
Last-Modified: Wed, 01 Jan 2024 00:00:00 GMT

Explanation:
- Cache-Control: public (can be cached by CDNs), max-age=31536000 (1 year)
- immutable (content never changes, browsers won't revalidate)
- ETag and Last-Modified: For revalidation (if needed)
```

**2. Dynamic Content Caching**

**Dynamic Content**: Content that changes frequently (API responses, personalized content).

**Implementation**:
```python
from flask import Flask, jsonify, request
import redis
import hashlib
import json

app = Flask(__name__)
redis_client = redis.Redis(host='localhost', port=6379, db=0)

@app.route('/api/user/<int:user_id>')
def get_user(user_id):
    # Generate cache key based on request parameters
    cache_key = f"user:{user_id}"
    
    # Check cache
    cached_data = redis_client.get(cache_key)
    if cached_data:
        # Return cached response with CDN-friendly headers
        response = jsonify(json.loads(cached_data))
        response.headers['Cache-Control'] = 'public, max-age=60, s-maxage=300'
        response.headers['Vary'] = 'Accept-Encoding'
        return response
    
    # Cache miss - generate response
    user_data = {
        'id': user_id,
        'name': 'Alice Johnson',
        'email': 'alice@example.com'
    }
    
    # Cache for 60 seconds (browser), 300 seconds (CDN)
    redis_client.setex(cache_key, 300, json.dumps(user_data))
    
    # Return response with CDN-friendly headers
    response = jsonify(user_data)
    response.headers['Cache-Control'] = 'public, max-age=60, s-maxage=300'
    response.headers['Vary'] = 'Accept-Encoding'
    return response

# Explanation of headers:
# - Cache-Control: public (can be cached by CDNs)
# - max-age=60 (browser cache for 60 seconds)
# - s-maxage=300 (CDN cache for 300 seconds)
# - Vary: Accept-Encoding (cache based on Accept-Encoding header)
```

**3. Cache Busting**

**Problem**: How to invalidate cached content when it changes?

**Solution 1: URL Versioning**
```html
<!-- Version in filename -->
<script src="https://cdn.example.com/app.v1.2.3.js"></script>

<!-- When updating, change version -->
<script src="https://cdn.example.com/app.v1.2.4.js"></script>
<!-- New URL = new cached version (CDN treats as different file) -->
```

**Solution 2: Query Parameter Versioning**
```html
<!-- Version in query parameter -->
<script src="https://cdn.example.com/app.js?v=1.2.3"></script>

<!-- When updating, change version -->
<script src="https://cdn.example.com/app.js?v=1.2.4"></script>
<!-- New URL = new cached version -->
```

**Solution 3: Content Hash (Automatic Cache Busting)**
```html
<!-- Hash based on content -->
<link rel="stylesheet" href="https://cdn.example.com/styles.abc123def456.css">

<!-- When content changes, hash changes -->
<link rel="stylesheet" href="https://cdn.example.com/styles.xyz789ghi012.css">
<!-- Different hash = different URL = new cache entry -->
```

---

### **CDN Cache Invalidation**

**Problem**: How to remove outdated content from CDN edge servers?

**1. Time-Based Expiration**

**Concept**: Content automatically expires after configured TTL.

**Implementation**:
```python
# Set cache headers with expiration
response.headers['Cache-Control'] = 'public, max-age=3600'  # Expire after 1 hour

# Content automatically removed from CDN cache after 1 hour
```

**2. Manual Invalidation**

**Concept**: Explicitly invalidate cached content.

**Implementation** (AWS CloudFront):
```python
import boto3

# Create CloudFront client
cloudfront = boto3.client('cloudfront')

# Invalidate specific paths
response = cloudfront.create_invalidation(
    DistributionId='E1234567890AB',  # Your CloudFront distribution ID
    InvalidationBatch={
        'CallerReference': 'invalidate-styles-2024-01-15',  # Unique ID
        'Paths': {
            'Quantity': 2,
            'Items': [
                '/styles.css',  # Specific file
                '/images/*'     # Wildcard pattern
            ]
        }
    }
)

# Get invalidation ID
invalidation_id = response['Invalidation']['Id']
print(f"Invalidation created: {invalidation_id}")

# Check invalidation status
response = cloudfront.get_invalidation(
    DistributionId='E1234567890AB',
    Id=invalidation_id
)

status = response['Invalidation']['Status']
print(f"Invalidation status: {status}")  # InProgress or Completed
```

**3. Tag-Based Invalidation**

**Concept**: Tag content, invalidate by tag.

**Implementation** (Cloudflare):
```python
# Set cache tags on response
response.headers['Cache-Tag'] = 'user-profile,user:123'

# Later, invalidate all content with tag 'user:123'
# (via Cloudflare API or dashboard)

# When user data changes:
# 1. Update user data in database
# 2. Invalidate all cached content with tag 'user:123'
# 3. Next request fetches fresh data and re-caches
```

**4. Purge All**

**Concept**: Invalidate all cached content (nuclear option).

**Implementation**:
```python
# Invalidate all content in CDN (use carefully!)
response = cloudfront.create_invalidation(
    DistributionId='E1234567890AB',
    InvalidationBatch={
        'CallerReference': 'purge-all-2024-01-15',
        'Paths': {
            'Quantity': 1,
            'Items': ['/*']  # Wildcard matches all paths
        }
    }
)
```

**Warning**: Purging all content removes all cached items, causing massive load on origin server. Use sparingly.

---

### **Popular CDN Providers**

**1. Cloudflare**

**Features**:
- Free tier available
- Global network (200+ locations)
- DDoS protection
- DNS services
- Edge functions (Cloudflare Workers)

**Implementation**:
```python
# Cloudflare CDN via API
import requests

# Purge cache by URL
response = requests.post(
    'https://api.cloudflare.com/client/v4/zones/ZONE_ID/purge_cache',
    headers={
        'Authorization': 'Bearer API_TOKEN',
        'Content-Type': 'application/json'
    },
    json={
        'files': [
            'https://example.com/styles.css',
            'https://example.com/app.js'
        ]
    }
)

print(response.json())
```

**2. AWS CloudFront**

**Features**:
- Integrated with AWS (S3, EC2, Lambda@Edge)
- Global network (400+ locations)
- Custom SSL certificates
- Lambda@Edge (run code at edge)

**Implementation**:
```python
# CloudFront signed URLs (for private content)
from botocore.signers import CloudFrontSigner
from datetime import datetime, timedelta

# CloudFront key pair ID and private key
key_pair_id = 'APKAEXAMPLEKEY'
private_key = open('private_key.pem', 'rb').read()

# Create signer
signer = CloudFrontSigner(key_pair_id, lambda **kwargs: private_key)

# Generate signed URL
url = 'https://d1234567890.cloudfront.net/private/video.mp4'
expire_time = datetime.utcnow() + timedelta(hours=1)
signed_url = signer.generate_presigned_url(url, date_less_than=expire_time)

print(f"Signed URL (expires in 1 hour): {signed_url}")
```

**3. Fastly**

**Features**:
- High-performance edge cloud
- Edge dictionary (key-value store at edge)
- VCL (Varnish Configuration Language) for advanced caching
- Real-time logging

**Implementation**:
```python
# Fastly API
import requests

# Purge specific URL
response = requests.request(
    'PURGE',
    'https://example.com/api/user/123',
    headers={
        'Fastly-Key': 'API_KEY',
        'Accept': 'application/json'
    }
)

print(f"Status: {response.status_code}")
```

---

### **Origin Shield: Protecting Your Origin Server**

**Problem**: CDN edge servers requesting content from origin simultaneously can overwhelm it.

**Solution**: Origin Shield—regional cache that protects origin server from thundering herd.

**Architecture**:
```
User Request
    │
    ▼
CDN Edge (Tokyo)
    │
    ▼    Cache Miss
Origin Shield (Asia Pacific)
    │
    ▼    Cache Miss
Origin Server (US)

Benefits:
- CDN edges don't all hit origin simultaneously
- Origin Shield caches content regionally
- Origin sees fewer requests (reduced load)
```

**Implementation** (AWS CloudFront Origin Shield):
```python
# Create distribution with Origin Shield
response = cloudfront.create_distribution(
    DistributionConfig={
        'CallerReference': 'distribution-with-origin-shield',
        'Origins': {
            'Quantity': 1,
            'Items': [{
                'Id': 'my-origin',
                'DomainName': 'origin.example.com',
                'OriginShield': {
                    'Enabled': True,
                    'OriginShieldRegion': 'us-east-1'  # Origin Shield region
                }
            }]
        },
        # ... other configuration
    }
)
```

---

## **4.6 Cache Consistency**

Cache consistency ensures cached data matches the source of truth. Inconsistencies arise when data changes but cache isn't updated.

### **Cache Invalidation Strategies**

**Strategy 1: Time-Based Expiration**

**Concept**: Cache entries expire automatically after TTL.

**Implementation**:
```python
def get_user(user_id):
    cache_key = f'user:{user_id}'
    
    # Check cache
    cached_data = redis_client.get(cache_key)
    if cached_data:
        return json.loads(cached_data)
    
    # Cache miss - load from database
    user_data = database.query(f'SELECT * FROM users WHERE id = {user_id}')
    
    # Cache with 1 hour TTL
    redis_client.setex(cache_key, 3600, json.dumps(user_data))
    
    return user_data

def update_user(user_id, user_data):
    # Update database
    database.update(user_id, user_data)
    
    # No cache invalidation - let it expire naturally
    # Next cache miss (within 1 hour) loads fresh data
```

**Pros**: Simple, no manual invalidation
**Cons**: Stale data until expiration, unpredictable inconsistency window

---

**Strategy 2: Write-Through Invalidation**

**Concept**: Update cache when data changes (write-through).

**Implementation**:
```python
def update_user(user_id, user_data):
    # Update database
    database.update(user_id, user_data)
    
    # Update cache synchronously
    cache_key = f'user:{user_id}'
    redis_client.setex(cache_key, 3600, json.dumps(user_data))
    
    # Cache always up-to-date
```

**Pros**: Strong consistency, cache always fresh
**Cons**: Slower writes (must update cache), complex invalidation logic

---

**Strategy 3: Write-Back (Delayed) Invalidation**

**Concept**: Invalidate cache asynchronously after data changes.

**Implementation**:
```python
import queue
import threading

class WriteBackCache:
    def __init__(self):
        self.invalidate_queue = queue.Queue()
        self.worker_thread = threading.Thread(target=self._invalidate_worker, daemon=True)
        self.worker_thread.start()
    
    def _invalidate_worker(self):
        """Background thread to process invalidations"""
        while True:
            cache_key = self.invalidate_queue.get()
            redis_client.delete(cache_key)
            print(f"Invalidated {cache_key}")
            self.invalidate_queue.task_done()
    
    def update_user(self, user_id, user_data):
        # Update database (synchronous)
        database.update(user_id, user_data)
        
        # Queue cache invalidation (asynchronous)
        cache_key = f'user:{user_id}'
        self.invalidate_queue.put(cache_key)
        
        # Return immediately (cache invalidated in background)

# Usage
cache = WriteBackCache()

# Update user (fast - doesn't wait for cache invalidation)
cache.update_user(123, {'name': 'Alice Updated'})

# User might see stale data for a few milliseconds (until invalidation completes)
```

**Pros**: Fast writes, reduced contention
**Cons**: Brief inconsistency window, complexity

---

**Strategy 4: Cache Tagging**

**Concept**: Tag cache entries, invalidate by tag.

**Implementation**:
```python
def get_user(user_id):
    cache_key = f'user:{user_id}'
    
    # Check cache
    cached_data = redis_client.get(cache_key)
    if cached_data:
        return json.loads(cached_data)
    
    # Cache miss - load from database
    user_data = database.query(f'SELECT * FROM users WHERE id = {user_id}')
    
    # Cache with tags
    redis_client.hset('cache_tags', cache_key, 'user')  # Tag this entry
    redis_client.setex(cache_key, 3600, json.dumps(user_data))
    
    return user_data

def invalidate_all_users():
    # Invalidate all entries tagged with 'user'
    cache_keys = redis_client.hgetall('cache_tags')
    for key, tag in cache_keys.items():
        if tag == b'user':
            redis_client.delete(key.decode())
            redis_client.hdel('cache_tags', key.decode())

# When user schema changes (affects all users)
invalidate_all_users()  # Invalidate all user caches
```

**Pros**: Batch invalidation, flexible
**Cons**: Additional metadata storage, complexity

---

### **Thundering Herd Problem**

**Problem**: Many simultaneous cache misses cause overwhelming load on backend.

**Scenario**:
```
Time 0: Popular item (user:123) expires from cache
Time 1: 1000 users request user:123 simultaneously
Time 2: All 1000 requests check cache → miss
Time 3: All 1000 requests query database simultaneously
Time 4: Database overwhelmed (crashes)

Result: Database failure due to thundering herd
```

**Solution 1: Cache Locking**

**Concept**: Only one request populates cache; others wait.

**Implementation**:
```python
def get_user_with_lock(user_id):
    cache_key = f'user:{user_id}'
    lock_key = f'lock:{cache_key}'
    
    # Check cache
    cached_data = redis_client.get(cache_key)
    if cached_data:
        return json.loads(cached_data)
    
    # Try to acquire lock
    lock_acquired = redis_client.set(lock_key, '1', nx=True, ex=10)  # 10 second lock
    if lock_acquired:
        try:
            # We have the lock - load from database
            user_data = database.query(f'SELECT * FROM users WHERE id = {user_id}')
            
            # Populate cache
            redis_client.setex(cache_key, 3600, json.dumps(user_data))
            
            return user_data
        finally:
            # Release lock
            redis_client.delete(lock_key)
    else:
        # Someone else has the lock - wait and retry
        time.sleep(0.1)  # Wait 100ms
        
        # Try cache again (might be populated by now)
        return get_user_with_lock(user_id)  # Retry

# When user:123 expires:
# Request 1: Acquires lock, loads from database
# Requests 2-1000: Wait for lock, then get from cache
# Result: Only 1 database query (not 1000)
```

**Solution 2: Probabilistic Early Expiration**

**Concept**: Some requests refresh cache early to prevent mass expiration.

**Implementation**:
```python
import random

def get_user_with_early_expiration(user_id):
    cache_key = f'user:{user_id}'
    
    # Check cache
    cached_data = redis_client.get(cache_key)
    if cached_data:
        # Check TTL
        ttl = redis_client.ttl(cache_key)
        
        # 10% chance to refresh early if TTL < 60 seconds
        if ttl < 60 and random.random() < 0.1:
            print(f"Early refresh for {cache_key}")
            # Refresh cache asynchronously
            refresh_user_async(user_id)
        
        return json.loads(cached_data)
    
    # Cache miss - load from database
    user_data = database.query(f'SELECT * FROM users WHERE id = {user_id}')
    redis_client.setex(cache_key, 3600, json.dumps(user_data))
    return user_data

def refresh_user_async(user_id):
    """Refresh user cache asynchronously"""
    user_data = database.query(f'SELECT * FROM users WHERE id = {user_id}')
    redis_client.setex(f'user:{user_id}', 3600, json.dumps(user_data))

# Result: Cache entries refreshed before mass expiration
# Fewer simultaneous cache misses
```

**Solution 3: Request Coalescing**

**Concept**: Combine multiple requests for same data into single backend request.

**Implementation**:
```python
from concurrent.futures import ThreadPoolExecutor

class RequestCoalescer:
    def __init__(self):
        self.pending_requests = {}
        self.executor = ThreadPoolExecutor(max_workers=10)
    
    def get_user(self, user_id):
        cache_key = f'user:{user_id}'
        
        # Check cache
        cached_data = redis_client.get(cache_key)
        if cached_data:
            return json.loads(cached_data)
        
        # Check if request is already pending
        if cache_key in self.pending_requests:
            print(f"Coalescing request for {cache_key}")
            # Wait for existing request to complete
            return self.pending_requests[cache_key].result()
        
        # Create new request
        future = self.executor.submit(self._load_user, user_id)
        self.pending_requests[cache_key] = future
        
        try:
            return future.result()
        finally:
            # Clean up
            del self.pending_requests[cache_key]
    
    def _load_user(self, user_id):
        """Load user from database"""
        user_data = database.query(f'SELECT * FROM users WHERE id = {user_id}')
        cache_key = f'user:{user_id}'
        redis_client.setex(cache_key, 3600, json.dumps(user_data))
        return user_data

# Usage
coalescer = RequestCoalescer()

# When 1000 users request user:123 simultaneously:
# Request 1: Loads from database
# Requests 2-1000: Wait for request 1's result
# Result: Only 1 database query
```

---

### **Cache Stampede Problem**

**Problem**: Similar to thundering herd, but caused by malicious or erroneous mass invalidation.

**Scenario**:
```
Normal operation:
- 1000 requests/second for popular content
- 95% cache hit rate
- 50 requests/second to database (cache misses)

Cache stampede:
- Bug or invalidation causes all cache entries to be invalidated
- 1000 requests/second become cache misses
- Database overwhelmed with 1000 requests/second
- System crashes
```

**Solution: Rate Limiting Cache Misses**

**Implementation**:
```python
def get_user_with_rate_limit(user_id):
    cache_key = f'user:{user_id}'
    miss_key = f'miss:{cache_key}'
    
    # Check cache
    cached_data = redis_client.get(cache_key)
    if cached_data:
        return json.loads(cached_data)
    
    # Check if too many recent cache misses
    miss_count = redis_client.incr(miss_key)
    if miss_count == 1:
        redis_client.expire(miss_key, 60)  # Count misses in last 60 seconds
    
    # If more than 10 cache misses in last 60 seconds, block
    if miss_count > 10:
        print(f"Rate limiting cache misses for {cache_key}")
        # Return stale data or error
        return {'error': 'Too many requests, please retry'}
    
    # Load from database
    user_data = database.query(f'SELECT * FROM users WHERE id = {user_id}')
    redis_client.setex(cache_key, 3600, json.dumps(user_data))
    return user_data

# Result: Limits cache misses, prevents database overload
```

---

### **Cache Penetration Problem**

**Problem**: Repeated requests for non-existent data bypass cache and hit database.

**Scenario**:
```
Attacker requests:
GET /api/user/999999 (doesn't exist)
GET /api/user/999998 (doesn't exist)
GET /api/user/999997 (doesn't exist)
...

Each request:
1. Check cache → miss
2. Query database → miss (user doesn't exist)
3. Return 404

Problem:
- All requests bypass cache (nothing to cache for non-existent data)
- Database overwhelmed with queries for non-existent data
```

**Solution 1: Cache Null Values**

**Implementation**:
```python
def get_user_with_null_caching(user_id):
    cache_key = f'user:{user_id}'
    
    # Check cache
    cached_data = redis_client.get(cache_key)
    if cached_data:
        if cached_data == b'NULL':
            # Cached null value - user doesn't exist
            return None
        return json.loads(cached_data)
    
    # Query database
    user_data = database.query(f'SELECT * FROM users WHERE id = {user_id}')
    
    if user_data is None:
        # Cache null value (short TTL)
        redis_client.setex(cache_key, 60, 'NULL')  # Cache for 1 minute
        return None
    
    # Cache user data
    redis_client.setex(cache_key, 3600, json.dumps(user_data))
    return user_data

# Result: Non-existent users cached (shorter TTL)
# Fewer database queries for non-existent data
```

**Solution 2: Bloom Filter**

**Implementation**:
```python
from pybloom_live import ScalableBloomFilter

# Create bloom filter
user_bloom_filter = ScalableBloomFilter(initial_capacity=1000000, error_rate=0.001)

# Populate bloom filter with existing user IDs
for user_id in get_all_user_ids():
    user_bloom_filter.add(user_id)

def get_user_with_bloom_filter(user_id):
    # Check bloom filter
    if user_id not in user_bloom_filter:
        # Definitely not in database (bloom filter guarantees no false negatives)
        return None
    
    # Might be in database (bloom filter has false positives)
    # Query database to confirm
    cache_key = f'user:{user_id}'
    
    cached_data = redis_client.get(cache_key)
    if cached_data:
        return json.loads(cached_data)
    
    user_data = database.query(f'SELECT * FROM users WHERE id = {user_id}')
    if user_data:
        redis_client.setex(cache_key, 3600, json.dumps(user_data))
        return user_data
    
    # User doesn't exist (false positive from bloom filter)
    return None

# Result:
# Non-existent users filtered out by bloom filter
- No database queries for guaranteed non-existent users
- Minimal database queries for false positives (rare)
```

---

## **4.7 Application-Level Caching**

### **Browser Caching**

**Concept**: Store resources in user's browser to reduce server requests.

**HTTP Cache Headers**:
```http
# Strong caching (no revalidation)
Cache-Control: public, max-age=31536000, immutable
ETag: "abc123"

# Revalidation (check if changed)
Cache-Control: public, max-age=3600
ETag: "abc123"
Last-Modified: Wed, 01 Jan 2024 00:00:00 GMT

# No caching
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
```

**Implementation**:
```python
from flask import Flask, jsonify, request, make_response

app = Flask(__name__)

@app.route('/api/user/<int:user_id>')
def get_user(user_id):
    user_data = database.get_user(user_id)
    
    # Generate ETag (hash of response)
    import hashlib
    etag = hashlib.md5(json.dumps(user_data).encode()).hexdigest()
    
    # Check if client has cached version
    if request.headers.get('If-None-Match') == etag:
        # Client has latest version - return 304 (Not Modified)
        response = make_response('', 304)
        response.headers['ETag'] = etag
        return response
    
    # Return user data with cache headers
    response = jsonify(user_data)
    response.headers['Cache-Control'] = 'public, max-age=60'
    response.headers['ETag'] = etag
    return response

# Flow:
# 1. Client requests user data
# 2. Server returns data with ETag and Cache-Control headers
# 3. Client caches data locally
# 4. Next request: Client sends If-None-Match header with ETag
# 5. Server checks if data changed (by comparing ETags)
# 6. If unchanged: Returns 304 (client uses cached data)
# 7. If changed: Returns new data with new ETag
```

---

### **Local Application Cache**

**Concept**: Cache data in application process memory (fastest, but not shared between processes).

**Implementation**:
```python
from functools import lru_cache
import time

# Python LRU cache (in-memory)
@lru_cache(maxsize=1000)
def get_user_local(user_id):
    """Cache user data in local process memory"""
    print(f"Loading user {user_id} from database")
    return database.get_user(user_id)

# Usage
# First call: Loads from database
user = get_user_local(123)

# Second call: Returns from cache (no database query)
user = get_user_local(123)

# Clear cache (if needed)
get_user_local.cache_clear()

# Note: This cache is per-process (not shared across multiple application servers)
# Use for data that doesn't change often and is accessed frequently
```

---

### **Distributed Cache Hierarchy**

**Concept**: Layered caching—try faster caches first, fall back to slower ones.

**Architecture**:
```
Request
    │
    ▼
Browser Cache (Client)
    │
    ▼    Cache Miss
CDN Cache (Edge)
    │
    ▼    Cache Miss
Local Application Cache (Process Memory)
    │
    ▼    Cache Miss
Distributed Cache (Redis Cluster)
    │
    ▼    Cache Miss
Database (PostgreSQL)

Each layer is slower but larger.
Cache hit rate improves as data moves up hierarchy.
```

**Implementation**:
```python
class HierarchicalCache:
    def __init__(self, redis_client):
        self.local_cache = {}  # In-memory cache
        self.redis_client = redis_client
    
    def get(self, key):
        # Level 1: Local cache
        if key in self.local_cache:
            print(f"Local cache HIT for {key}")
            return self.local_cache[key]
        
        # Level 2: Distributed cache
        cached_data = self.redis_client.get(key)
        if cached_data:
            print(f"Distributed cache HIT for {key}")
            data = json.loads(cached_data)
            # Populate local cache
            self.local_cache[key] = data
            return data
        
        # Level 3: Database
        print(f"Cache MISS for {key} - loading from database")
        data = self.load_from_database(key)
        
        # Populate distributed cache
        self.redis_client.setex(key, 3600, json.dumps(data))
        
        # Populate local cache
        self.local_cache[key] = data
        
        return data
    
    def set(self, key, data, ttl=3600):
        # Update all cache levels
        self.local_cache[key] = data
        self.redis_client.setex(key, ttl, json.dumps(data))
    
    def invalidate(self, key):
        # Invalidate all cache levels
        if key in self.local_cache:
            del self.local_cache[key]
        self.redis_client.delete(key)

# Usage
cache = HierarchicalCache(redis_client)

# First request: Loads from database, populates all cache levels
data = cache.get('user:123')

# Subsequent requests: Returns from local cache (fastest)
data = cache.get('user:123')

# Invalidates all cache levels
cache.invalidate('user:123')
```

---

## **4.8 Caching Best Practices**

### **Cache Key Design**

**Principle**: Design cache keys that are descriptive, consistent, and support invalidation.

**Good Cache Keys**:
```python
# Hierarchical keys (namespace organization)
cache_key = f'user:{user_id}:profile'  # User profile
cache_key = f'user:{user_id}:settings'  # User settings
cache_key = f'product:{product_id}:details'  # Product details

# Keys with version (for schema changes)
cache_key = f'user:{user_id}:profile:v2'  # Version 2 of user profile

# Keys with parameters (for query result caching)
cache_key = f'users:page:{page}:limit:{limit}:sort:{sort}'

# Keys with hash tags (for distributed caching)
cache_key = f'{user_id}:profile'  # Ensures related keys on same node
```

**Bad Cache Keys**:
```python
# Too generic (hard to invalidate)
cache_key = 'data'  # What data? When to invalidate?

# Inconsistent naming
cache_key = f'userProfile{user_id}'  # Mixes camelCase and snake_case

# Too long (inefficient)
cache_key = f'user:{user_id}:profile:including:all:details:and:metadata:which:is:very:long'
```

**Cache Key Versioning**:
```python
def get_cache_key(user_id, version=2):
    """Generate cache key with version"""
    return f'user:{user_id}:profile:v{version}'

# When schema changes, increment version
# Old version cached data ignored (treated as different key)
cache_key_v2 = get_cache_key(123, version=2)
cache_key_v3 = get_cache_key(123, version=3)
```

---

### **Cache Warming**

**Concept**: Pre-populate cache with expected data before users request it.

**Implementation**:
```python
def warm_user_cache():
    """Pre-populate cache with popular user data"""
    # Get most frequently accessed user IDs
    popular_user_ids = database.query("""
        SELECT user_id 
        FROM access_logs 
        WHERE access_time > NOW() - INTERVAL '1 day'
        GROUP BY user_id 
        ORDER BY COUNT(*) DESC 
        LIMIT 1000
    """)
    
    # Load and cache each user
    for user_id in popular_user_ids:
        user_data = database.get_user(user_id)
        cache_key = f'user:{user_id}'
        redis_client.setex(cache_key, 3600, json.dumps(user_data))
        print(f"Warmed cache for user {user_id}")

# Run cache warming during off-peak hours
# (e.g., 3 AM when traffic is low)
warm_user_cache()
```

**Benefits**:
- Reduced cache misses during peak hours
- Better user experience (faster response times)
- Lower database load during peak traffic

---

### **Cache Monitoring**

**Metrics to Monitor**:
```python
import time
from functools import wraps

def monitor_cache(func):
    """Decorator to monitor cache performance"""
    @wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        
        # Call original function
        result = func(*args, **kwargs)
        
        # Calculate duration
        duration = time.time() - start_time
        
        # Log metrics
        print(f"Cache operation: {func.__name__}, Duration: {duration:.3f}s")
        
        # Send to monitoring system
        # monitoring_system.gauge('cache.operation.duration', duration)
        # monitoring_system.increment('cache.operation.count')
        
        return result
    return wrapper

@monitor_cache
def get_user(user_id):
    cache_key = f'user:{user_id}'
    
    cached_data = redis_client.get(cache_key)
    if cached_data:
        # Cache hit
        monitoring_system.increment('cache.hits')
        return json.loads(cached_data)
    
    # Cache miss
    monitoring_system.increment('cache.misses')
    user_data = database.get_user(user_id)
    redis_client.setex(cache_key, 3600, json.dumps(user_data))
    return user_data

# Important metrics:
# - Cache hit rate: hits / (hits + misses)
# - Cache latency: Time to get data from cache
# - Cache size: Memory usage of cache
# - Eviction rate: How often items are evicted
# - TTL distribution: How long items stay in cache
```

---

### **Common Caching Pitfalls**

**Pitfall 1: Caching Everything**

**Problem**: Caching too much data wastes memory and increases complexity.

**Solution**: Cache judiciously based on access patterns.
```python
# Bad: Cache all data regardless of access frequency
def get_any_data(key):
    cached_data = redis_client.get(key)
    if cached_data:
        return json.loads(cached_data)
    
    data = load_from_source(key)
    redis_client.setex(key, 3600, json.dumps(data))
    return data

# Good: Cache only frequently accessed data
def get_frequently_accessed_data(key):
    # Check if key is in "frequently accessed" set
    if not redis_client.sismember('frequently_accessed', key):
        # Not frequently accessed - don't cache
        return load_from_source(key)
    
    # Frequently accessed - cache it
    cached_data = redis_client.get(key)
    if cached_data:
        return json.loads(cached_data)
    
    data = load_from_source(key)
    redis_client.setex(key, 3600, json.dumps(data))
    return data
```

---

**Pitfall 2: Incorrect TTL**

**Problem**: TTL too short (cache ineffective) or too long (stale data).

**Solution**: Choose TTL based on data freshness requirements.
```python
# Bad: Fixed TTL for all data
def cache_data(key, data):
    redis_client.setex(key, 3600, json.dumps(data))  # 1 hour for everything

# Good: TTL based on data characteristics
def cache_data(key, data, data_type):
    ttl_by_type = {
        'static': 86400,      # 24 hours (static assets)
        'session': 3600,      # 1 hour (user sessions)
        'dynamic': 60,        # 1 minute (dynamic data)
        'realtime': 5         # 5 seconds (real-time data)
    }
    
    ttl = ttl_by_type.get(data_type, 3600)
    redis_client.setex(key, ttl, json.dumps(data))
```

---

**Pitfall 3: Cache Inconsistency**

**Problem**: Cache and database out of sync.

**Solution**: Implement cache invalidation strategy.
```python
# Bad: No cache invalidation
def update_user(user_id, user_data):
    database.update(user_id, user_data)
    # Cache not updated - serves stale data!

# Good: Write-through cache invalidation
def update_user(user_id, user_data):
    database.update(user_id, user_data)
    
    # Invalidate cache
    cache_key = f'user:{user_id}'
    redis_client.delete(cache_key)
    
    # Optionally, repopulate cache with new data
    redis_client.setex(cache_key, 3600, json.dumps(user_data))
```

---

## **4.9 Real-World Caching Examples**

### **Instagram's Caching Architecture**

**Challenge**: Instagram serves billions of photos and videos per day.

**Solution**: Multi-layer caching strategy.
```
1. CDN: Static assets (images, videos) cached globally
2. Edge Cache: API responses cached at edge locations
3. Application Cache: User profiles, feed data cached in Redis
4. Database Cache: Query results cached in PostgreSQL buffer pool

Result:
- 95%+ cache hit rate for static assets
- 80%+ cache hit rate for API responses
- Reduced database load by 90%
- Sub-second response times for most requests
```

---

### **Twitter's Timeline Caching**

**Challenge**: Generate personalized timelines for 300+ million users.

**Solution**: Pre-compute and cache timelines.
```python
# User timeline generation
def generate_timeline(user_id):
    timeline_key = f'timeline:{user_id}'
    
    # Check cache
    cached_timeline = redis_client.get(timeline_key)
    if cached_timeline:
        return json.loads(cached_timeline)
    
    # Generate timeline (expensive operation)
    followed_users = get_followed_users(user_id)
    tweets = []
    for followed_user in followed_users:
        user_tweets = get_recent_tweets(followed_user, limit=10)
        tweets.extend(user_tweets)
    
    # Sort by timestamp (most recent first)
    tweets.sort(key=lambda x: x['timestamp'], reverse=True)
    
    # Cache timeline (5 minute TTL)
    redis_client.setex(timeline_key, 300, json.dumps(tweets))
    
    return tweets

# When a user tweets:
def post_tweet(user_id, tweet_content):
    # Save tweet
    tweet_id = save_tweet(user_id, tweet_content)
    
    # Invalidate followers' timelines
    followers = get_followers(user_id)
    for follower_id in followers:
        timeline_key = f'timeline:{follower_id}'
        redis_client.delete(timeline_key)
    
    # Alternatively, update timelines directly (faster for followers)
    # (but more complex)
```

---

### **Netflix's Content Delivery Caching**

**Challenge**: Stream high-quality video to millions of users simultaneously.

**Solution**: Hierarchical CDN caching.
```
1. Open Connect: Netflix's own CDN (deployed in ISP networks)
2. Regional Caches: Cache popular content regionally
3. Edge Caches: Cache content at the edge (close to users)
4. Client Caching: Buffer content in user's device

Optimizations:
- Adaptive bitrate streaming (adjust quality based on bandwidth)
- Pre-fetching (download next segment before user watches)
- Predictive caching (predict what user will watch next)
- P2P caching (share cached content between nearby users)

Result:
- 99.9%+ uptime
- Sub-second start times
- High-quality streaming (4K, HDR)
- Reduced bandwidth costs (90%+ cached)
```

---

## **4.10 Key Takeaways**

1. **Caching is impactful**: Well-implemented caching can improve performance by 10-1000x while reducing costs.

2. **Choose the right pattern**: Cache-aside for simplicity, read-through for reduced application complexity, write-through for consistency, write-behind for performance.

3. **Eviction policies matter**: LRU for general use, LFU for stable patterns, TTL for time-sensitive data.

4. **Distributed caching scales**: Use Redis Cluster or Memcached for large-scale deployments.

5. **CDNs reduce latency**: Edge caching significantly improves user experience globally.

6. **Cache consistency is complex**: Implement proper invalidation strategies (time-based, write-through, tagging).

7. **Avoid caching pitfalls**: Don't cache everything, choose appropriate TTLs, implement invalidation, monitor cache performance.

---

## **Chapter Summary**

In this chapter, we explored caching—a critical optimization for system design. We covered caching patterns (cache-aside, read-through, write-through, write-behind, refresh-ahead), eviction policies (LRU, LFU, TTL, random), distributed caching (Redis Cluster, Memcached), and CDN architecture.

We understood cache consistency challenges and solutions (invalidation strategies, thundering herd, cache stampede, cache penetration). We explored application-level caching (browser, local, distributed cache hierarchy) and caching best practices (cache key design, warming, monitoring).

Finally, we examined real-world caching examples from Instagram, Twitter, and Netflix, understanding how top companies implement caching at scale.

**Coming up next**: In Chapter 5, we'll explore Message Queues & Event-Driven Architecture, covering synchronous vs. asynchronous communication, message queue patterns, Apache Kafka, RabbitMQ, event sourcing, and backpressure handling.

---

**Exercises**:

1. **Caching Pattern Selection**: For each scenario, which caching pattern would you use and why?
   - A banking application where account balances must always be accurate
   - A social media news feed where speed is more important than perfect consistency
   - An analytics system processing large datasets (write-heavy)
   - A real-time multiplayer game where low latency is critical

2. **Cache Eviction Policy**: You're building a music streaming service with 10 million songs. Users frequently listen to popular songs (top 1000) but also explore new songs. Which eviction policy would you use? Why?

3. **Distributed Cache Design**: You need to design a distributed cache for a global e-commerce platform with 100 million products. Requirements:
   - Products accessed from all regions
   - Some products are very popular (hot spots)
   - System must remain available if one cache node fails
   How would you design this? Which distributed caching solution would you use?

4. **Cache Invalidation Strategy**: You're building a collaborative document editor (like Google Docs). Multiple users can edit the same document simultaneously. How would you handle cache invalidation? What strategy would you use to ensure all users see consistent document state?

5. **CDN Caching Strategy**: You're launching a video streaming service. Requirements:
   - Videos are large (1-10 GB each)
   - Some videos are very popular (millions of views)
   - Videos are updated occasionally (new versions released)
   How would you design CDN caching for this service? What cache invalidation strategy would you use?

---