🚨 Let’s flip the switch from passive metrics to **active alerting**.  
No more *“Oops, the model failed yesterday”* — now we get real-time **prometheus-powered pings** when something goes off.  
Welcome to **ML observability that talks back.**

---

# 🧪 `09_lab_alerting_pipeline_with_prometheus.ipynb`  
### 📁 `06_mlops/05_model_monitoring`  
> Pipe your model’s performance metrics into **Prometheus**,  
set **alerting rules**, and simulate **real-time notification triggers** (Slack, email, etc.).

---

## 🎯 Learning Goals

- Understand how to **expose metrics** via HTTP  
- Use **Prometheus + Flask** to scrape model stats  
- Set **alerts on accuracy or latency thresholds**  
- Simulate alert firing + message dispatching  

---

## 💻 Runtime Spec

| Component      | Spec                  |
|----------------|-----------------------|
| Monitoring     | Prometheus ✅  
| Model          | Simple Flask API ✅  
| Alerts         | Rules.yaml ✅  
| Integration    | Slack/webhook mock ✅  
| Runtime        | Colab + localhost or Docker ✅  

---

## 🧠 Section 1: Flask App with Prometheus Metrics

```python
from flask import Flask, jsonify
from prometheus_client import start_http_server, Gauge
import random, time

app = Flask(__name__)

# Create a Prometheus metric
accuracy_gauge = Gauge("model_accuracy", "Current accuracy of the ML model")

@app.route("/predict")
def predict():
    # Simulate varying model performance
    acc = random.uniform(0.85, 0.99) if time.time() % 2 > 1 else random.uniform(0.5, 0.7)
    accuracy_gauge.set(acc)
    return jsonify({"accuracy": acc})

# Start metric server on :8000/metrics
start_http_server(8000)
```

---

## 📊 Section 2: Prometheus Config (`prometheus.yml`)

```yaml
global:
  scrape_interval: 5s

scrape_configs:
  - job_name: 'model-metrics'
    static_configs:
      - targets: ['localhost:8000']
```

---

## 🚨 Section 3: Alert Rule File (`alert_rules.yml`)

```yaml
groups:
  - name: model_alerts
    rules:
      - alert: ModelAccuracyDrop
        expr: model_accuracy < 0.75
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Model accuracy dropped below threshold"
```

---

## 🧪 Section 4: Run Prometheus with Alertmanager

```bash
# (Linux/macOS only unless using Docker)
prometheus --config.file=prometheus.yml --web.enable-lifecycle --alertmanager.config.file=alert_rules.yml
```

---

## 🔔 Section 5: AlertManager Mock (Optional Slack Integration)

You can simulate Slack/email by printing, or mock with:
```bash
# Replace with your webhook
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"🚨 Model accuracy below threshold!"}' \
https://hooks.slack.com/services/TXXXX/BXXXX/XXXXX
```

---

## ✅ Lab Wrap-Up

| Task                                 | ✅ |
|--------------------------------------|----|
| Exposed live model metric            | ✅ |
| Scraped it with Prometheus           | ✅ |
| Fired alert when metric dropped      | ✅ |
| Simulated notification to Slack      | ✅ |

---

## 🧠 What You Learned

- Prometheus = **metric time-series engine**  
- Gauge/Counter expose model health  
- Alerting rules are declarative, scalable, and **infra-ready**  
- Now your model isn’t just monitored — **it defends itself** with alerts 🔥

---

Next in line, boss:  
> `10_lab_monitor_model_latency_with_prometheus.ipynb`  
Same vibe — this time we’re tracking **response times** instead of accuracy.  
🕒 Slow model? You’ll know before the users do.  
Let’s roll?