### [🏠 **Home**](NoteBookIndex.ipynb) &nbsp; | &nbsp; [⏪ **Prev** (07-observability-and-maintenance)](senior-architecture-patterns_20251215_1232_05_07-observability-and-maintenance.ipynb) &nbsp; | &nbsp; [**Next** (06-operational-and-deployment) ⏩](senior-architecture-patterns_20251215_1232_07_06-operational-and-deployment.ipynb)
---

# FOLDER: 04-scalability-and-performance
**Generated:** 2025-12-15 12:32

**Contains:** 4 files | **Total Size:** 0.02 MB

## 📂 `04-scalability-and-performance/`

#### 📄 `04-scalability-and-performance/17-sharding-partitioning.md`

# 17\. Sharding (Database Partitioning)

## 1\. The Concept

Sharding is a method of splitting and storing a single logical dataset (like a "Users" table) across multiple databases or machines. By distributing the data, you distribute the load. Instead of one massive server handling 100% of the traffic, you might have 10 servers, each handling 10% of the traffic.

## 2\. The Problem

  * **Scenario:** Your application has hit 100 million users.
  * **The Vertical Limit:** You have already upgraded your database server to the largest instance available (128 cores, 2TB RAM). It's still hitting 100% CPU during peak hours. You physically cannot buy a bigger computer (Vertical Scaling limit reached).
  * **The Bottleneck:** Writes are slow because of lock contention. Indexes are too big to fit in RAM, causing disk thrashing. Backups take 48 hours to run.

## 3\. The Solution

Break the database into smaller chunks called **Shards**.
Each shard holds a subset of the data. The application uses a **Shard Key** to determine which server to talk to.

  * **Shard A:** Users ID 1 - 1,000,000
  * **Shard B:** Users ID 1,000,001 - 2,000,000
  * ...

### Junior vs. Senior View

| Perspective | Approach | Outcome |
| :--- | :--- | :--- |
| **Junior** | "The database is slow. Let's just add a Read Replica." | **Write Bottleneck.** Replicas help with reads, but every write still has to go to the single Master. The Master eventually dies. |
| **Senior** | "We are write-bound. We need to Shard. Let's partition by `RegionID` so users in Europe hit the EU Shard and users in US hit the US Shard." | **Linear Scalability.** We can theoretically scale to infinity by just adding more servers. Write throughput multiplies by N. |

## 4\. Visual Diagram

## 5\. Sharding Strategies

Choosing the right **Shard Key** is the most critical decision.

### A. Range Based (e.g., by User ID)

  * *Method:* IDs 1-100 go to DB1, 101-200 go to DB2.
  * *Pro:* Easy to implement.
  * *Con:* **Hotspots.** If all new users (IDs 900+) are active, and old users (IDs 1-100) are inactive, DB1 is idle while DB9 is melting down.

### B. Hash Based (e.g., `hash(UserID) % 4`)

  * *Method:* Apply a hash function to the ID to assign it to a server.
  * *Pro:* Even distribution of data. No hotspots.
  * *Con:* **Resharding is painful.** If you add a 5th server, the formula changes (`% 5`), and you have to move almost ALL data to new locations.

### C. Directory Based (Lookup Table)

  * *Method:* A separate "Lookup Service" tells you where "User A" lives.
  * *Pro:* Total flexibility. You can move individual users without changing code.
  * *Con:* **Single Point of Failure.** If the Lookup Service goes down, nobody can find their data.

## 6\. When to Use It (and When NOT to)

  * ✅ **Use when:**
      * **Massive Data:** TBs or PBs of data.
      * **Write Heavy:** You have more write traffic than a single node can handle.
      * **Geographic Needs:** You want EU user data to physically stay in EU servers (GDPR).
  * ❌ **Avoid when:**
      * **You haven't optimized queries:** Bad SQL is usually the problem, not the server size. Fix the code first.
      * **You need complex Joins:** You cannot easily JOIN tables across two different servers. You have to do it in application code (slow).
      * **Small Teams:** The operational complexity of managing 10 databases instead of 1 is huge.

## 7\. Implementation Example (Pseudo-code)

**Scenario:** A library wrapper that routes queries to the correct shard based on `user_id`.

```python
# Configuration: Map shards to connection strings
SHARD_MAP = {
    0: "postgres://db-shard-alpha...",
    1: "postgres://db-shard-beta...",
    2: "postgres://db-shard-gamma..."
}

def get_shard_connection(user_id):
    # 1. Determine Shard ID (Hash Strategy)
    # Using modulo to distribute users evenly across 3 shards
    num_shards = len(SHARD_MAP)
    shard_id = hash(user_id) % num_shards
    
    # 2. Connect to the specific database
    connection_string = SHARD_MAP[shard_id]
    return connect_to_db(connection_string)

def save_user(user):
    # The application logic doesn't know about the physical servers.
    # It just asks for "the right connection".
    conn = get_shard_connection(user.id)
    
    conn.execute("INSERT INTO users ...", user)
    conn.close()
```

## 8\. The "Resharding" Nightmare

Eventually, Shard A will get full. You need to split it into Shard A and Shard B.

  * **The Senior Reality:** This is terrifying.
  * **The Strategy:** Consistent Hashing or Virtual Buckets.
      * Instead of mapping `User -> Server`, map `User -> Bucket` (e.g., 1024 buckets).
      * Then map `Bucket -> Server`.
      * When you add a server, you just move a few buckets over, rather than calculating new hashes for every user.

## 9\. Limitations (The Trade-offs)

1.  **No Cross-Shard Transactions:** You cannot start a transaction that updates User A (Shard 1) and User B (Shard 2). You must use **Sagas (Pattern \#14)**.
2.  **No Cross-Shard Joins:** You cannot `SELECT * FROM Orders JOIN Users`. You must fetch User, then fetch Orders, and combine them in Python/Java.
3.  **Unique Constraints:** You cannot enforce "Unique Email" across the whole system easily, because Shard 1 doesn't know what emails Shard 2 has.

#### 📄 `04-scalability-and-performance/18-cache-aside-lazy-loading.md`

# 18\. Cache-Aside (Lazy Loading)

## 1\. The Concept

Cache-Aside (also known as Lazy Loading) is the most common caching strategy. The application logic ("the Aside") serves as the coordinator between the data store (Database) and the cache (e.g., Redis/Memcached). The cache does not talk to the database directly. Instead, the application lazily loads data into the cache only when it is actually requested.

## 2\. The Problem

  * **Scenario:** You have a high-traffic e-commerce site. The "Product Details" page executes complex SQL queries (joins across Pricing, Inventory, and Specs tables).
  * **The Reality:** 95% of users are looking at the same 5 popular products (e.g., the latest iPhone).
  * **The Performance Hit:** Your database is hammering the disk to calculate the exact same result thousands of times per second. Latency spikes, and the database CPU hits 100%.

## 3\. The Solution

Treat the Cache as a temporary key-value storage for the result of those expensive queries.

1.  **Read:** When the app needs data, it checks the Cache first.
      * **Hit:** Return data immediately (0ms).
      * **Miss:** Query the Database, write the result to the Cache, then return data.
2.  **Write:** When the app updates data, it updates the Database and **deletes (invalidates)** the Cache entry so the next read forces a fresh fetch.

### Junior vs. Senior View

| Perspective | Approach | Outcome |
| :--- | :--- | :--- |
| **Junior** | "I'll write a script to load *all* our products into Redis when the server starts." | **Cold Start & Waste.** Startup takes forever. You fill RAM with data nobody wants (products from 2012). If Redis restarts, the app crashes because the cache is empty. |
| **Senior** | "Load nothing on startup. Let the traffic dictate what gets cached. Set a Time-To-Live (TTL) so unused data naturally drops out of RAM." | **Efficiency.** The cache only contains the 'Working Set' (currently popular items). Memory is used efficiently. The system handles empty caches gracefully. |

## 4\. Visual Diagram

## 5\. When to Use It (and When NOT to)

  * ✅ **Use when:**
      * **Read-Heavy Workloads:** News sites, blogs, catalogs, social media feeds.
      * **General Purpose:** This is the default caching strategy for 80% of web apps.
      * **Resilience:** If the Cache goes down, the system still works (just slower) because it falls back to the DB.
  * ❌ **Avoid when:**
      * **Write-Heavy Workloads:** If data changes every second, you are constantly invalidating the cache. You spend more time writing to Redis than reading from it.
      * **Critical Consistency:** If the user *must* see the absolute latest version (e.g., Bank Balance), caching introduces the risk of stale data.

## 6\. Implementation Example (Pseudo-code)

**Scenario:** Fetching a User Profile.

```python
import redis
import json

# Connection to Cache
cache = redis.Redis(host='localhost', port=6379)
TTL_SECONDS = 300 # 5 minutes

def get_user_profile(user_id):
    cache_key = f"user:{user_id}"

    # 1. Try Cache (The "Aside")
    cached_data = cache.get(cache_key)
    
    if cached_data:
        print("Cache Hit!")
        return json.loads(cached_data)

    # 2. Cache Miss - Go to Source of Truth
    print("Cache Miss - Querying DB...")
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    if user:
        # 3. Populate Cache (Lazy Load)
        # We serialize to JSON because Redis stores strings/bytes
        cache.setex(
            name=cache_key, 
            time=TTL_SECONDS, 
            value=json.dumps(user)
        )
    
    return user

def update_user_email(user_id, new_email):
    # 1. Update Source of Truth
    db.execute("UPDATE users SET email = ? ...", new_email)
    
    # 2. Invalidate Cache
    # Next time someone asks for this user, it will be a "Miss"
    # and they will fetch the new email from DB.
    cache.delete(f"user:{user_id}")
```

## 7\. The "Thundering Herd" Problem (Senior Nuance)

There is a specific danger in Cache-Aside.

  * **Scenario:** The cache key for "Homepage\_News" expires at 12:00:00.
  * **The Spike:** At 12:00:01, you have 5,000 concurrent users hitting the homepage.
  * **The Herd:** All 5,000 requests check the cache. All 5,000 get a "Miss." All 5,000 hit the Database simultaneously to generate the same news feed.
  * **Result:** The database crashes.

**The Senior Fix:** **Locking** or **Probabilistic Early Expiration**.

  * *Locking:* Only allow *one* thread to query the DB for "Homepage\_News." The other 4,999 wait for that thread to finish and populate the cache.
  * *Soft TTL:* Tell Redis the TTL is 60s, but tell the App the TTL is 50s. The first user to hit it between 50s and 60s re-generates the cache in the background while everyone else is still served the old (but valid) data.

## 8\. Cache Invalidation Strategies

"There are only two hard things in Computer Science: Cache Invalidation and naming things."

1.  **TTL (Time To Live):** The safety net. Even if your code fails to delete the key, it will disappear eventually (e.g., 10 minutes). Always set a TTL.
2.  **Write-Through (Alternative):** The application writes to the Cache *and* DB simultaneously. Good for read performance, but slower writes.
3.  **Delete vs. Update:** In Cache-Aside, prefer **Deleting** the key on update. If you try to **Update** the cache key, you risk race conditions (two threads updating the cache in the wrong order). Deleting is safer.

#### 📄 `04-scalability-and-performance/19-static-content-offloading-cdn.md`


# 19\. Static Content Offloading (CDN)

## 1\. The Concept

Static Content Offloading is the practice of moving non-changing files (images, CSS, JavaScript, Videos, Fonts) away from the primary application server and onto a Content Delivery Network (CDN). A CDN is a geographically distributed network of proxy servers. The goal is to serve content to end-users with high availability and high performance by serving it from a location closest to them.

## 2\. The Problem

  * **Scenario:** Your application server is hosted in **Virginia, USA (us-east-1)**.
  * **The Latency Issue:** A user in **Singapore** visits your site. Every request for `logo.png` or `main.js` has to travel halfway around the world and back. The latency is 250ms+ per file. If your site has 50 files, the page load takes 10+ seconds.
  * **The Capacity Issue:** Your expensive App Server (optimized for CPU and Logic) is busy streaming a 50MB video file to a user. During that time, it cannot process login requests or checkout transactions. You are wasting expensive CPU cycles on "dumb" file transfer tasks.

## 3\. The Solution

Separate the roles:

1.  **The App Server:** Handles **Dynamic** content only (JSON, Business Logic, Database interactions).
2.  **The CDN:** Handles **Static** content.
      * You upload files to "Object Storage" (e.g., AWS S3, Google Cloud Storage).
      * The CDN (e.g., CloudFront, Cloudflare, Akamai) caches these files at hundreds of "Edge Locations" worldwide.
      * The user in Singapore downloads the logo from a Singapore Edge Server (10ms latency).

### Junior vs. Senior View

| Perspective | Approach | Outcome |
| :--- | :--- | :--- |
| **Junior** | "I'll put the images in the `/public/images` folder of my Express/Django app and serve them directly." | **Server Suffocation.** A viral traffic spike hits. The server runs out of I/O threads serving JPEGs. The API stops responding. The site goes down. |
| **Senior** | "The application server should never serve a file. Push assets to S3 during the build pipeline. Put CloudFront in front. The app server only speaks JSON." | **Global Scale.** The static assets load instantly worldwide. The app server is bored and ready to handle business logic. Bandwidth costs drop significantly. |

## 4\. Visual Diagram

## 5\. When to Use It (and When NOT to)

  * ✅ **Use when:**
      * **Global Audience:** Users are not physically near your data center.
      * **Media Heavy:** The site has large images, videos, or PDFs.
      * **High Traffic:** You expect spikes that would crush a single server.
      * **Security:** CDNs often provide DDoS protection (WAF) at the edge, shielding your origin server.
  * ❌ **Avoid when:**
      * **Internal Tools:** An admin panel used by 5 people in the same office as the server.
      * **Strictly Dynamic:** An API-only service that serves zero HTML/CSS/Images.

## 6\. Implementation Strategy

### Step 1: The Build Pipeline

Don't commit binary files to Git if possible. During the deployment process (CI/CD):

1.  Build the React/Vue/Angular app.
2.  Upload the `./dist` or `./build` folder to an S3 Bucket.
3.  Deploy the Backend Code to the App Server.

### Step 2: The URL Rewrite

In your HTML/Code, you point to the CDN domain, not the relative path.

**Before (Junior):**

```html
<img src="/static/logo.png" />
```

**After (Senior):**

```html
<img src="https://d12345.cloudfront.net/assets/logo.png" />
```

### Step 3: Cache Control (The Critical Header)

You must tell the CDN how long to keep the file.

  * **Mutable Files (e.g., `index.html`):** Short cache.
      * `Cache-Control: public, max-age=60` (1 minute).
      * *Reason:* If you deploy a new release, you want users to see it quickly.
  * **Immutable Files (e.g., `main.a1b2c3.js`):** Infinite cache.
      * `Cache-Control: public, max-age=31536000, immutable` (1 year).
      * *Reason:* This file will *never* change. If the code changes, the filename changes (see below).

## 7\. The "Cache Busting" Pattern

How do we update a file if the CDN has cached it for 1 year?
**We don't.** We change the name.

  * **Bad:** `style.css`. If you change the CSS and upload it, the CDN might still serve the old one for days.
  * **Good (Versioning):** `style.v1.css`, `style.v2.css`.
  * **Best (Content Hashing):** `style.8f4a2c.css`.
      * Webpack/Vite does this automatically.
      * If the file content changes, the hash changes.
      * If the hash changes, it's a "new" file to the CDN.
      * This guarantees that users **never** see a mix of old HTML and new CSS (which breaks layouts).

## 8\. Pseudo-Code Example (S3 Upload Script)

```python
import boto3
import mimetypes
import os

def deploy_assets_to_cdn(build_folder, bucket_name):
    s3 = boto3.client('s3')
    
    for root, dirs, files in os.walk(build_folder):
        for file in files:
            file_path = os.path.join(root, file)
            
            # Determine Content Type
            content_type, _ = mimetypes.guess_type(file_path)
            
            # Determine Cache Strategy
            if file.endswith(".html"):
                # HTML changes frequently (entry point)
                cache_control = "public, max-age=60"
            else:
                # Hash-named assets (JS/CSS/Images) are forever
                cache_control = "public, max-age=31536000, immutable"

            print(f"Uploading {file} with {cache_control}...")
            
            s3.upload_file(
                file_path, 
                bucket_name, 
                file, 
                ExtraArgs={
                    'ContentType': content_type,
                    'CacheControl': cache_control
                }
            )

# Run during CI/CD
deploy_assets_to_cdn("./build", "my-production-assets")


#### 📄 `04-scalability-and-performance/README.md`


# 🚀 Group 4: Scalability & Performance

## Overview

**"Scalability is the property of a system to handle a growing amount of work by adding resources to the system."**

In the early days of a startup, you survive on a single server. But as you grow from 1,000 to 1,000,000 users, "Vertical Scaling" (buying a bigger CPU) hits a physical wall. You must switch to "Horizontal Scaling" (adding more machines).

This module covers the strategies Senior Architects use to handle massive traffic and data volume without degrading performance. It focuses on removing bottlenecks at the Database layer, the Application layer, and the Network layer.

## 📜 Pattern Index

| Pattern | Goal | Senior "Soundbite" |
| :--- | :--- | :--- |
| **[17. Sharding (Partitioning)](https://www.google.com/search?q=./17-sharding-partitioning.md)** | **Horizontal Data Scaling** | "We can't buy a bigger database server. We must split the users based on Region ID." |
| **[18. Cache-Aside (Lazy Loading)](https://www.google.com/search?q=./18-cache-aside-lazy-loading.md)** | **Read Optimization** | "The fastest query is the one you don't make. Check Redis first." |
| **[19. Static Content Offloading](https://www.google.com/search?q=./19-static-content-offloading-cdn.md)** | **Network Optimization** | "The application server is for business logic, not for serving 5MB JPEGs. Use a CDN." |

## 🧠 The Scalability Checklist

Before launching a marketing campaign or a new feature, a Senior Architect asks:

1.  **The "One Million" Test:** If we suddenly get 1,000,000 users tomorrow, which component breaks first? (Usually the Database).
2.  **The "Cache Miss" Test:** If Redis goes down and empties the cache, will the database survive the "Thundering Herd" of requests trying to repopulate it?
3.  **The "Physics" Test:** Are we asking a user in Australia to download a 10MB file from a server in New York? (CDN required).
4.  **The "Hotspot" Test:** In our sharded database, are 90% of the writes going to Shard A because we chose a bad Shard Key?

## ⚠️ Common Pitfalls in This Module

  * **Caching Everything:** Caching data that changes frequently or is rarely read. You just waste RAM and CPU for serialization.
  * **Premature Sharding:** Sharding adds massive operational complexity (backups, resharding, cross-shard joins). Don't do it until you have exhausted Indexing, Read Replicas, and Caching.
  * **Ignoring Cache Invalidation:** Showing a user their old bank balance because the cache wasn't cleared after a deposit. This destroys trust.

