# Chapter 36: DSA in Production Systems

> *"In production, data structures are not just academic exercises—they are the backbone of systems that serve millions of users. Efficiency, reliability, and scalability are paramount."* — Anonymous

---

## 36.1 Introduction

While the previous chapters focused on the theory and implementation of data structures, this chapter bridges the gap to real‑world engineering. In production systems, we face constraints such as massive scale, concurrency, fault tolerance, and hardware limitations. The data structures we choose must not only be asymptotically efficient but also perform well under these conditions.

### 36.1.1 Why Production Systems Differ

```
┌─────────────────────────────────────────────────────────────────────┐
│                    PRODUCTION CONSIDERATIONS                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. SCALE: Millions or billions of items, terabytes of data.        │
│  2. CONCURRENCY: Thousands of simultaneous requests.                │
│  3. PERSISTENCE: Data must survive crashes (disk storage).          │
│  4. DISTRIBUTION: Data spread across multiple machines.             │
│  5. REAL‑TIME RESPONSES: Sub‑millisecond latency requirements.      │
│  6. FAULT TOLERANCE: System must handle failures gracefully.        │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
```

In this chapter, we explore how classic data structures are adapted and combined to meet these demands in production systems.

---

## 36.2 Database Indexing Structures

Databases rely on indexing to provide fast access to data without scanning entire tables. Two dominant index structures are **B+ Trees** and **LSM Trees**.

### 36.2.1 B+ Trees

B+ trees are the standard index structure in relational databases (MySQL InnoDB, PostgreSQL, Oracle). They are a variant of B‑trees where all data resides in leaves, and internal nodes only store keys (routing information). Leaves are linked for efficient range scans.

**Why B+ Trees?**
- **High fanout:** Each node holds many keys (order = block size / key size), so the tree height is small (typically 3–4 levels even for billions of rows).
- **Disk‑optimized:** Nodes correspond to disk pages, minimizing I/O.
- **Range queries:** Leaf linkage allows sequential scans without going back up the tree.

**Operations:**
- **Search:** Follow path from root to leaf; one disk read per level.
- **Insert:** Find leaf, insert key; if leaf overflows, split and propagate.
- **Delete:** Find leaf, remove key; if underflow, borrow from sibling or merge.

**Example (conceptual):**
```
Root: [50, 100]
         /    |    \
    [10,20] [60,70] [110,120]   (internal nodes)
       |       |        |
    data    data     data       (leaf nodes)
     ↓       ↓         ↓
    [10,20] [60,70]  [110,120]  (leaves linked)
```

### 36.2.2 LSM Trees (Log‑Structured Merge Trees)

LSM trees are used in modern NoSQL databases (Cassandra, RocksDB, LevelDB) and are optimized for high write throughput.

**Structure:**
- **Memtable:** An in‑memory balanced tree (often a skip list or red‑black tree) that accepts writes. Writes are appended to a write‑ahead log (WAL) for durability.
- **SSTables (Sorted String Tables):** When the memtable is full, it is flushed to disk as an immutable SSTable. Each SSTable is a sorted file.
- **Compaction:** Background process merges SSTables to keep the number of files manageable and to remove deleted or outdated entries.

**Advantages:**
- **Write‑optimized:** Writes are O(log n) in‑memory and then sequential disk writes.
- **Reads:** May need to check multiple SSTables; **bloom filters** are often used to avoid checking files that definitely do not contain the key.
- **Space amplification:** Compaction trades off write amplification for read performance.

**Trade‑offs:**
- Read amplification (checking multiple files) can be high, mitigated by caching and bloom filters.
- Compaction consumes I/O and CPU.

**Implementation sketch (simplified):**
```python
# Not actual production code; illustrates concept
class LSMTree:
    def __init__(self):
        self.memtable = {}  # in‑memory sorted structure (e.g., sorted dict)
        self.sstables = []  # list of on‑disk sorted files
        self.wal = open("wal.log", "a")

    def put(self, key, value):
        self.wal.write(f"{key},{value}\n")
        self.memtable[key] = value
        if len(self.memtable) > THRESHOLD:
            self.flush()

    def get(self, key):
        if key in self.memtable:
            return self.memtable[key]
        for sstable in reversed(self.sstables):  # newest first
            if key in sstable:
                return sstable[key]
        return None

    def flush(self):
        # write memtable to new SSTable
        sstable = sorted(self.memtable.items())
        filename = f"sstable_{len(self.sstables)}.data"
        with open(filename, "w") as f:
            for k, v in sstable:
                f.write(f"{k},{v}\n")
        self.sstables.append(filename)
        self.memtable.clear()
        self.wal.truncate(0)  # truncate WAL (in practice, rotate)
```

**In production**, LSM trees use skip lists for memtables, bloom filters for fast negative lookups, and sophisticated compaction strategies (size‑tiered, leveled).

---

## 36.3 Caching Strategies

Caches store frequently accessed data in fast memory (RAM) to reduce latency and load on backend systems. Two classic eviction policies are **LRU** and **LFU**.

### 36.3.1 LRU Cache (Least Recently Used)

LRU evicts the item that has not been used for the longest time. It requires:
- Fast lookup (hash map)
- Fast way to move an item to the front on access (doubly linked list)

**Implementation using `OrderedDict` (Python) or `LinkedHashMap` (Java):**

```python
from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)  # mark as recently used
        return self.cache[key]

    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # remove least recently used
```

**Manual implementation with dict + doubly linked list** (common interview question):

```python
class Node:
    def __init__(self, key, val):
        self.key = key
        self.val = val
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity):
        self.cap = capacity
        self.cache = {}
        self.head = Node(0, 0)
        self.tail = Node(0, 0)
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

    def _add(self, node):
        node.prev = self.head
        node.next = self.head.next
        self.head.next.prev = node
        self.head.next = node

    def get(self, key):
        if key in self.cache:
            node = self.cache[key]
            self._remove(node)
            self._add(node)
            return node.val
        return -1

    def put(self, key, value):
        if key in self.cache:
            self._remove(self.cache[key])
        node = Node(key, value)
        self._add(node)
        self.cache[key] = node
        if len(self.cache) > self.cap:
            lru = self.tail.prev
            self._remove(lru)
            del self.cache[lru.key]
```

### 36.3.2 LFU Cache (Least Frequently Used)

LFU evicts the item with the lowest access frequency. If multiple items have the same frequency, evict the least recently used among them (LRU tie‑breaker).

Implementation is more complex; often uses a hash map from frequency to a doubly linked list of nodes, plus another hash map from key to node. Frequency counts are updated on access.

**High‑level design:**
- `key_node` map: key → (value, freq, node in freq list)
- `freq_list` map: freq → doubly linked list of keys with that frequency
- On access, move key from current freq list to freq+1 list (creating list if needed).

**Time:** O(1) per operation.

**Example code (simplified, not full production):**

```python
class LFUCache:
    def __init__(self, capacity):
        self.cap = capacity
        self.key_node = {}  # key -> (value, freq, node_in_freq_list)
        self.freq_nodes = {}  # freq -> doubly linked list (head, tail)
        self.min_freq = 0

    def _add_to_freq(self, key, freq):
        # add key to freq list (as most recent)
        pass

    def _remove_from_freq(self, key, freq):
        # remove key from freq list
        pass

    def get(self, key):
        if key not in self.key_node:
            return -1
        value, freq, _ = self.key_node[key]
        self._remove_from_freq(key, freq)
        self._add_to_freq(key, freq+1)
        if freq == self.min_freq and len(self.freq_nodes[freq]) == 0:
            self.min_freq += 1
        self.key_node[key] = (value, freq+1, None)
        return value

    def put(self, key, value):
        if self.cap == 0:
            return
        if key in self.key_node:
            # update value and increase freq
            self.key_node[key] = (value, self.key_node[key][1], None)
            self.get(key)  # to increase freq
            return
        if len(self.key_node) >= self.cap:
            # evict from min_freq list (LRU among min freq)
            evict_key = self._pop_lru(self.min_freq)
            del self.key_node[evict_key]
        self.key_node[key] = (value, 1, None)
        self._add_to_freq(key, 1)
        self.min_freq = 1
```

**Note:** In production, caches like Redis use approximations (e.g., **approximate LRU**) to reduce overhead.

---

## 36.4 Rate Limiting

Rate limiting controls the rate of requests to an API or service to prevent abuse and ensure fair usage. Common algorithms:

### 36.4.1 Token Bucket

- A bucket holds tokens (capacity). Tokens are added at a fixed rate (e.g., 10 per second).
- When a request arrives, it consumes one token if available; otherwise, the request is denied.
- Simple, allows bursts up to capacity.

**Implementation (pseudocode):**
```python
class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate
        self.capacity = capacity
        self.tokens = capacity
        self.last_refill = time.time()

    def allow(self):
        now = time.time()
        # refill based on elapsed time
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False
```

### 36.4.2 Leaky Bucket

- Requests are processed at a constant rate. Think of a bucket with a leak; requests drip out at a fixed rate. If the bucket overflows, requests are discarded.
- Smooths bursts but does not allow bursts beyond capacity.

**Implementation:** Use a queue; when a request arrives, if the queue is full, drop it; otherwise, enqueue and process at a fixed rate from a background thread.

### 36.4.3 Sliding Window Log

- Keep a log of request timestamps. For each request, count how many timestamps fall within the last window (e.g., last minute). If count < limit, allow.
- Memory‑intensive if many requests.

**Implementation with a deque:**
```python
from collections import deque
import time

class SlidingWindowLog:
    def __init__(self, limit, window_sec):
        self.limit = limit
        self.window = window_sec
        self.requests = deque()

    def allow(self):
        now = time.time()
        # remove old requests
        while self.requests and self.requests[0] < now - self.window:
            self.requests.popleft()
        if len(self.requests) < self.limit:
            self.requests.append(now)
            return True
        return False
```

### 36.4.4 Sliding Window Counter (e.g., Redis)

- Hybrid approach: track counts in small buckets (e.g., per second) and sum over the sliding window. Less memory than full log, more accurate than fixed window.

---

## 36.5 Consistent Hashing and Load Balancing

In distributed systems, we need to distribute data or requests across multiple servers. Simple modulo hashing (`hash(key) % N`) fails when servers are added or removed, causing massive redistribution. **Consistent hashing** solves this.

### 36.5.1 How It Works

- Each server is assigned one or more points on a hash ring (using a hash function like MD5).
- Each key is hashed to a point on the ring, and the key is assigned to the nearest server clockwise.
- When a server is added, only keys between its predecessor and itself are remapped; others stay put.
- To handle imbalance, **virtual nodes** (multiple points per server) are used.

**Implementation sketch:**

```python
import hashlib
import bisect

class ConsistentHash:
    def __init__(self, nodes=None, replicas=100):
        self.replicas = replicas
        self.ring = {}
        self.sorted_keys = []
        if nodes:
            for node in nodes:
                self.add_node(node)

    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add_node(self, node):
        for i in range(self.replicas):
            h = self._hash(f"{node}:{i}")
            self.ring[h] = node
            bisect.insort(self.sorted_keys, h)

    def remove_node(self, node):
        for i in range(self.replicas):
            h = self._hash(f"{node}:{i}")
            del self.ring[h]
            self.sorted_keys.remove(h)

    def get_node(self, key):
        if not self.ring:
            return None
        h = self._hash(key)
        idx = bisect.bisect_right(self.sorted_keys, h) % len(self.sorted_keys)
        return self.ring[self.sorted_keys[idx]]
```

**Use cases:** Distributed caches (Memcached, Redis Cluster), load balancers, partitioners in databases (Cassandra uses consistent hashing with virtual nodes).

---

## 36.6 Probabilistic Data Structures

When exact answers are too expensive (time or memory), probabilistic structures provide approximations with bounded error. They are widely used in big data and streaming.

### 36.6.1 Bloom Filters

A Bloom filter is a space‑efficient probabilistic data structure for set membership. It may return **false positives** but never false negatives.

**How it works:**
- A bit array of size m.
- k independent hash functions, each mapping an element to a bit position.
- **Insert:** Set all k bits to 1.
- **Query:** Check if all k bits are 1; if any is 0, element is definitely not present; if all are 1, element *may* be present.

**Choosing parameters:**
- For desired false positive rate p and expected number of elements n:
  - Optimal m = - (n ln p) / (ln 2)²
  - Optimal k = (m/n) ln 2

**Implementation:**
```python
import math
import mmh3  # non‑cryptographic hash

class BloomFilter:
    def __init__(self, n, p):
        self.m = int(- (n * math.log(p)) / (math.log(2)**2))
        self.k = int((self.m / n) * math.log(2))
        self.bits = [False] * self.m

    def add(self, item):
        for i in range(self.k):
            idx = mmh3.hash(item, i) % self.m
            self.bits[idx] = True

    def contains(self, item):
        for i in range(self.k):
            idx = mmh3.hash(item, i) % self.m
            if not self.bits[idx]:
                return False
        return True
```

**Use cases:** Cache filtering (avoid disk lookups for non‑existent keys), web URL deduplication, databases (LSM trees use bloom filters to reduce reads).

### 36.6.2 Count‑Min Sketch

A Count‑Min Sketch is a probabilistic data structure for estimating frequencies of events in a stream. It uses a matrix of counters and multiple hash functions.

**How it works:**
- 2D array of width w and depth d (counters).
- d hash functions, each mapping an element to a column in its row.
- **Update:** For each hash function, increment the corresponding counter.
- **Query:** Take the minimum of the counters across all rows (since collisions can only overestimate).

**Parameters:**
- Given desired error ε and confidence δ:
  - w = ceil(e/ε)
  - d = ceil(ln(1/δ))

**Implementation:**
```python
import math
import mmh3

class CountMinSketch:
    def __init__(self, epsilon, delta):
        self.width = int(math.ceil(math.e / epsilon))
        self.depth = int(math.ceil(math.log(1.0 / delta)))
        self.counters = [[0] * self.width for _ in range(self.depth)]

    def update(self, item, count=1):
        for i in range(self.depth):
            idx = mmh3.hash(item, i) % self.width
            self.counters[i][idx] += count

    def estimate(self, item):
        res = float('inf')
        for i in range(self.depth):
            idx = mmh3.hash(item, i) % self.width
            res = min(res, self.counters[i][idx])
        return res
```

**Use cases:** Heavy hitters, frequency estimation in streams, network traffic monitoring.

### 36.6.3 HyperLogLog

HyperLogLog estimates the **cardinality** (number of distinct elements) of a multiset using very little memory (e.g., 1.5 KB for billions of distinct items).

**How it works:**
- Hash each element to a binary string.
- Observe the position of the leftmost 1‑bit (rank).
- Keep the maximum rank observed across multiple registers.
- Use harmonic mean to combine registers and correct bias.

**Implementation** is more involved; libraries like `hyperloglog` exist.

**Use cases:** Counting unique visitors, distinct IP addresses, etc.

---

## 36.7 Summary

```
┌─────────────────────────────────────────────────────────────────────┐
│                    DSA IN PRODUCTION SYSTEMS SUMMARY                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Indexing: B+ Trees (disk‑optimized, range queries)                │
│            LSM Trees (write‑optimized, compaction)                  │
│                                                                      │
│  Caching: LRU (dict + linked list), LFU (frequency lists)           │
│                                                                      │
│  Rate Limiting: Token Bucket, Leaky Bucket, Sliding Window Log      │
│                                                                      │
│  Load Balancing: Consistent Hashing (ring, virtual nodes)           │
│                                                                      │
│  Probabilistic: Bloom Filters (membership, false positives)         │
│                 Count‑Min Sketch (frequency)                         │
│                 HyperLogLog (cardinality)                            │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
```

---

## 36.8 Practice Problems

1. **Implement an LRU cache** (LeetCode 146).
2. **Implement an LFU cache** (LeetCode 460).
3. **Design a rate limiter** for an API (system design problem).
4. **Implement a Bloom filter** with given capacity and false positive rate.
5. **Use Count‑Min Sketch** to find top‑k frequent items in a stream (with heap).
6. **Design a distributed key‑value store** (like Cassandra) – discuss partitioning using consistent hashing.
7. **Explain why a B+ tree is used for database indexing** rather than a hash table.
8. **Compare LSM trees and B+ trees** in terms of read/write performance.
9. **Design a web crawler** that avoids revisiting URLs using a Bloom filter.
10. **Estimate the number of unique IPs** visiting a website using HyperLogLog (conceptually).

---

## 36.9 Further Reading

1. **"Designing Data‑Intensive Applications"** by Martin Kleppmann – Covers LSM trees, consistent hashing, and more.
2. **"Database Internals"** by Alex Petrov – Deep dive into B‑trees, LSM trees.
3. **"High Performance MySQL"** – Indexing chapter.
4. **"Redis in Action"** – Caching strategies.
5. **Original Papers**:
   - Bloom, B. H. (1970) – "Space/time trade‑offs in hash coding with allowable errors"
   - Cormode, G., & Muthukrishnan, S. (2005) – "An improved data stream summary: the count‑min sketch and its applications"
   - Flajolet, P., Fusy, É., Gandouet, O., & Meunier, F. (2007) – "HyperLogLog: the analysis of a near‑optimal cardinality estimation algorithm"
   - Karger, D., et al. (1997) – "Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web"

---

> **Coming in Chapter 37**: **Concurrency and Parallel Algorithms** – We'll explore lock‑free data structures, concurrent hash maps, parallel sorting, and the Map‑Reduce paradigm.

---

**End of Chapter 36**

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='35. choosing_the_right_data_structure.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='37. concurrency_and_parallel_algorithms.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
