# Lesson 9: Load Testing (Locust)

**Module 5: Model Deployment**  
**Estimated Time**: 1 hour  
**Difficulty**: Intermediate

---

## ðŸŽ¯ Learning Objectives

By the end of this lesson, you will:

âœ… Understand **RPS** (Requests Per Second) and **P99 Latency**  
âœ… Write a **Locust** script to stress test your API  
âœ… Identify performance bottlenecks (CPU vs I/O)  
âœ… Answer interview questions on scalability  

---

## ðŸ“š Table of Contents

1. [Why Load Test?](#1-why)
2. [The Tool: Locust](#2-locust)
3. [Hands-On: Stress Testing an API](#3-hands-on)
4. [Interview Preparation](#4-interview-questions)

---

## 1. Why Load Test?

It works on your machine. But what happens when 1,000 users hit it at once?
- **Latency**: Does it slow down to 10 seconds?
- **Errors**: Does it crash with 500 OOM (Out of Memory)?

You must know your **Breaking Point** before Production.

## 2. The Tool: Locust

Locust is a Python-based load testing tool.
- You write "User Behaviors" in Python.
- It spawns thousands of users to swarm your system.
- It provides a Web UI with graphs.

## 3. Hands-On: Stress Testing an API

We write a `locustfile.py`.

In [None]:
locust_code = """
from locust import HttpUser, task, between
import random

class MLUser(HttpUser):
    wait_time = between(1, 2)  # Wait 1-2s between tasks

    @task
    def predict_home(self):
        # Send random data
        payload = {
            "sqft": random.randint(500, 3000),
            "bedrooms": random.randint(1, 5),
            "location": "NY"
        }
        self.client.post("/predict", json=payload)
"""

print(locust_code)

print("\n--- Running Locust ---")
print("$ locust -f locustfile.py")
print("Then open http://localhost:8089")

## 4. Interview Preparation

### Common Questions

#### Q1: "Your API has high latency. How do you debug?"
**Answer**: "1. Check **CPU/Memory** usage (is it saturated?). 2. Check **I/O** (is it waiting for DB/S3?). 3. Profile the code (flame graph). If CPU is high, I need more replicas/Optimize model (ONNX). If I/O is high, I need async/caching."

#### Q2: "What is P99?"
**Answer**: "The 99th Percentile Latency. Being 'fast on average' isn't enough. If P99 is 2s, it means 1 in 100 requests takes 2s. For a site with 1M users, that's 10,000 angry users per day. We optimize for P99/P99.9, not Average."