🔥 Now it’s time to stress-test that model like a warzone simulation.

You’ve tracked metrics. You’ve set alerts.  
But can your model handle **concurrent users hammering the endpoint**?  
Let’s unleash a **swarm of virtual users** and find the **breaking point**.

---

# 🧪 `11_lab_concurrent_traffic_with_locust.ipynb`  
### 📁 `06_mlops/05_model_monitoring`  
> Use **Locust** to simulate **multiple concurrent requests** to your model server.  
Measure **throughput, failure rates, latency under load**, and pinpoint system bottlenecks.

---

## 🎯 Learning Goals

- Use **Locust** to simulate load on `/predict` endpoint  
- Monitor model server under **stress conditions**  
- Track how **latency, QPS, and error rates change**  
- Practice **load testing ML inference APIs**

---

## 💻 Runtime Setup

| Component         | Spec             |
|-------------------|------------------|
| Server            | Flask (running model API) ✅  
| Load Generator    | Locust ✅  
| Metrics           | Latency, QPS, Failures ✅  
| Deployment        | Localhost ✅  

---

## 🔧 Section 1: Install Locust

```bash
!pip install locust
```

---

## 🐜 Section 2: Define Locust Load Test File (`locustfile.py`)

```python
from locust import HttpUser, task, between

class ModelUser(HttpUser):
    wait_time = between(0.5, 1)

    @task
    def predict(self):
        with open("digit_sample.png", "rb") as img:
            self.client.post("/predict", files={"file": img})
```

---

## 🧪 Section 3: Start Locust Dashboard

```bash
# In terminal
locust -f locustfile.py --host=http://localhost:5000
```

Then visit:
```
http://localhost:8089
```

Set:
- Users: 100  
- Spawn rate: 10/sec  

And click ➤ Start Swarming

---

## 📊 Section 4: Monitor Metrics Live

- Requests/sec (QPS)  
- Median / 95% latency  
- Failures per second  
- CPU/memory usage (via htop or Prometheus)

You’ll likely see:
| Load  | Latency | Error Rate |
|-------|---------|------------|
| 10 RPS | Low    | 0%         |
| 100+ RPS | High | ⚠️ Timeouts |

---

## 💥 Section 5: Analyze Bottlenecks

- Flask’s GIL?
- Model warm-up delays?
- Upload/Decode time for images?

Add `@latency_hist.time()` in Flask to correlate latency spikes.

---

## ✅ Lab Summary

| Feature                              | ✅ |
|--------------------------------------|----|
| Locust simulated user traffic        | ✅ |
| Server hit with live concurrent load | ✅ |
| Bottlenecks exposed under pressure   | ✅ |
| Prometheus ready for correlation     | ✅ |

---

## 🧠 What You Learned

- Load testing is **crucial before production**  
- Locust = user simulator that hits hard and fast  
- Bottlenecks usually lie in **I/O, batching, Flask limits**  
- You’re now **load-aware**, not just model-aware

---

Next up:  
> `12_lab_grafana_dashboard_for_live_model_metrics.ipynb`  
Let’s take **all the metrics we’ve collected** and turn them into a **beautiful live Grafana dashboard**.  
Latency, drift, accuracy — one screen to rule them all.

Wanna light up that dashboard, Professor?