# From Chaos to MLflow: A Guided Journey

## Workshop Objective

You just experienced the **messy notebook** (`01_messy_notebook.ipynb`). You probably noticed:

- "Wait, which model was the best again?" 
- Metrics scattered across cells, hard to compare
- No way to reproduce exact results
- Manual tracking in markdown tables (that you forgot to update)

**In this notebook, we'll fix all of that with MLflow.**

---

## How This Notebook Works

We'll progressively increase the difficulty:

| Part | Style | You will... |
|------|-------|-------------|
| **Part 1** | Fully Guided | Run code, observe, learn concepts |
| **Part 2** | Progressive Reveal | Predict what's needed, then see solution |
| **Part 3** | Fill-in-the-Blanks | Complete the code yourself |

**Reference**: See `02_mlflow_organized.ipynb` for the complete solution.

**Cheatsheet**: See `../docs/mlflow_cheatsheet.md` for quick reference.

---

## Prerequisites

Before starting, ensure:
1. MLflow server is running: `docker-compose up -d` (from project root)
2. Check http://localhost:5000 is accessible

## Setup: Load the Messy Notebook Data

First, let's reproduce the same data preparation from the messy notebook.

In [None]:
# Standard imports (same as messy notebook)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import warnings
warnings.filterwarnings('ignore')

print("Imports done!")

In [None]:
# Load and prepare data (same as messy notebook)
df = pd.read_csv('../data/customer_data.csv')

# Feature engineering
df['recency_frequency_ratio'] = df['recency_days'] / (df['frequency'] + 1)
df['monetary_per_order'] = df['monetary_value'] / (df['total_orders'] + 1)
df['order_frequency'] = df['total_orders'] / (df['days_since_signup'] + 1)
df['support_per_order'] = df['support_tickets'] / (df['total_orders'] + 1)
df['r_score'] = pd.qcut(df['recency_days'], q=5, labels=[5, 4, 3, 2, 1]).astype(int)
df['f_score'] = pd.qcut(df['frequency'].rank(method='first'), q=5, labels=[1, 2, 3, 4, 5]).astype(int)
df['m_score'] = pd.qcut(df['monetary_value'].rank(method='first'), q=5, labels=[1, 2, 3, 4, 5]).astype(int)
df['rfm_score'] = df['r_score'] + df['f_score'] + df['m_score']

# Define features
FEATURE_COLS = [
    'recency_days', 'frequency', 'monetary_value', 'avg_order_value',
    'days_since_signup', 'total_orders', 'support_tickets', 'age',
    'recency_frequency_ratio', 'monetary_per_order', 'order_frequency',
    'support_per_order', 'rfm_score'
]

X = df[FEATURE_COLS]
y = df['churned']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Data ready: {len(X_train)} train, {len(X_test)} test samples")
print(f"Features: {len(FEATURE_COLS)}")

---

# PART 1: Fully Guided

## 1.1 What is MLflow?

MLflow is an open-source platform for managing the **full ML lifecycle**. It has 4 core components:

### 1. **Tracking** (our main focus today)
Log and query experiments: parameters, metrics, artifacts, and code versions.
- *"What hyperparameters did I use for my best model?"*
- *"Compare accuracy across 50 runs"*

### 2. **Models**
Package ML models in a standard format for deployment to various platforms (Docker, Kubernetes, cloud services, etc.).
- *"Deploy this sklearn model as a REST API"*
- *"Convert my model for batch inference"*

### 3. **Model Registry** (we'll use this too)
Centralized model store with versioning, stage transitions (Staging → Production), and annotations.
- *"Promote model v3 to production"*
- *"Who approved this model version?"*

### 4. **Projects**
Package code and dependencies for reproducible runs on any platform.
- *"Run this experiment with the exact same environment"*
- *"Share my pipeline with the team"*

---

### MLflow for LLMs (Bonus knowledge)

MLflow now supports **LLM/GenAI applications**:
- **Tracing**: Debug and monitor LLM calls, RAG pipelines, and multi-step agents
- **Evaluation**: Benchmark LLM outputs with built-in metrics
- **Deployment**: Serve LLM models with the same model-serving infrastructure

*We won't cover LLM tracking today, but know that MLflow works beyond traditional ML!*

---

**Today we focus on Tracking and Registry** - the foundation for experiment management.

## 1.2 Connect to MLflow

First, we import MLflow and tell it where our tracking server is.

In [None]:
# NEW: Import MLflow
import mlflow
import mlflow.sklearn

print(f"MLflow version: {mlflow.__version__}")

In [None]:
# Connect to the MLflow server (running in Docker)
mlflow.set_tracking_uri("http://localhost:5000")

# Verify connection
import requests
try:
    response = requests.get("http://localhost:5000/health")
    if response.status_code == 200:
        print("MLflow server is running!")
    else:
        print(f"Server responded with: {response.status_code}")
except Exception as e:
    print(f"Cannot connect to MLflow server: {e}")
    print("Run 'docker-compose up -d' from the project root!")

## 1.3 Create an Experiment

An **Experiment** groups related runs together. Think of it as a project folder.

In [None]:
# Create or get an experiment
experiment_name = "workshop-churn-learning"
mlflow.set_experiment(experiment_name)

print(f"Using experiment: {experiment_name}")
print(f"View it at: http://localhost:5000")

## 1.4 Your First MLflow Run

A **Run** is a single execution of your training code. Inside a run, you can log:

- **Parameters**: Inputs (hyperparameters, config)
- **Metrics**: Outputs (accuracy, loss, F1)
- **Artifacts**: Files (models, plots, data)
- **Tags**: Metadata (author, version)

In [None]:
# Let's train a simple model and LOG EVERYTHING

# Start a run
with mlflow.start_run(run_name="my-first-mlflow-run"):
    
    # 1. LOG PARAMETERS (inputs to your model)
    mlflow.log_param("model_type", "RandomForest")
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("max_depth", 10)
    
    # 2. Train the model (same code as before)
    model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
    model.fit(X_train, y_train)
    
    # 3. Make predictions
    y_pred = model.predict(X_test)
    
    # 4. LOG METRICS (outputs/results)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("f1", f1)
    
    print(f"Accuracy: {accuracy:.4f}")
    print(f"F1 Score: {f1:.4f}")
    print("\nRun logged to MLflow!")
    print("Go to http://localhost:5000 to see it!")

### Go check the MLflow UI!

1. Open http://localhost:5000
2. Click on "workshop-churn-learning" experiment
3. Click on "my-first-mlflow-run"
4. See your parameters and metrics!

**This is already better than the messy notebook** - your results are saved and organized!

## 1.5 Log Multiple Metrics at Once

Instead of calling `log_metric` multiple times, you can log a dictionary.

In [None]:
# More efficient: log multiple metrics at once
with mlflow.start_run(run_name="multiple-metrics-example"):
    
    # Log params as dict
    mlflow.log_params({
        "model_type": "RandomForest",
        "n_estimators": 150,
        "max_depth": 12
    })
    
    # Train
    model = RandomForestClassifier(n_estimators=150, max_depth=12, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    # Log metrics as dict
    mlflow.log_metrics({
        "accuracy": accuracy_score(y_test, y_pred),
        "precision": precision_score(y_test, y_pred),
        "recall": recall_score(y_test, y_pred),
        "f1": f1_score(y_test, y_pred)
    })
    
    print("All metrics logged!")

---

# PART 2: Progressive Reveal

Now you understand the basics. Let's add more MLflow features.

**Format**: I'll show you a problem, you think about what's needed, then reveal the solution.

## 2.1 Problem: Saving the Model

In the messy notebook, we had:
```python
# save model? idk where
# import pickle
# with open('model.pkl', 'wb') as f:
#     pickle.dump(best_rf, f)
```

**Question**: How can we save the model WITH the run, so it's always linked to its metrics?

*Think about it... then run the next cell for the solution.*

In [None]:
# SOLUTION: Use mlflow.sklearn.log_model()

with mlflow.start_run(run_name="with-model-artifact"):
    
    mlflow.log_params({"n_estimators": 100, "max_depth": 10})
    
    model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    mlflow.log_metrics({
        "accuracy": accuracy_score(y_test, y_pred),
        "f1": f1_score(y_test, y_pred)
    })
    
    # LOG THE MODEL - it's now saved with this run!
    mlflow.sklearn.log_model(model, name="model")
    
    print("Model saved as artifact!")
    print("Check the 'Artifacts' tab in the MLflow UI")

## 2.2 Problem: Saving Plots

In the messy notebook, we created confusion matrices and feature importance plots.
But they just displayed in the notebook - not saved anywhere!

**Question**: How can we save a matplotlib figure as an artifact?

*Think... then reveal.*

In [None]:
# SOLUTION: Use mlflow.log_figure()

with mlflow.start_run(run_name="with-plots"):
    
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    mlflow.log_metrics({"accuracy": accuracy_score(y_test, y_pred)})
    
    # Create confusion matrix plot
    cm = confusion_matrix(y_test, y_pred)
    fig, ax = plt.subplots(figsize=(6, 4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax)
    ax.set_xlabel('Predicted')
    ax.set_ylabel('Actual')
    ax.set_title('Confusion Matrix')
    
    # LOG THE FIGURE - no local file created!
    mlflow.log_figure(fig, "confusion_matrix.png")
    plt.close()
    
    print("Plot saved as artifact!")

## 2.3 Problem: The Scaler Issue

For Logistic Regression, we need to scale the features:
```python
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
```

**Problem**: At inference time, we need the SAME scaler. If we don't save it, we can't make correct predictions later!

**Question**: How do we save the scaler with the model?

*Think... then reveal.*

In [None]:
# SOLUTION: Save the scaler as an artifact
import joblib
import tempfile
import os

with mlflow.start_run(run_name="logistic-with-scaler"):
    
    # Scale the data
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Save scaler as artifact (using tempfile to avoid leaving files)
    with tempfile.TemporaryDirectory() as tmpdir:
        scaler_path = os.path.join(tmpdir, "scaler.pkl")
        joblib.dump(scaler, scaler_path)
        mlflow.log_artifact(scaler_path, artifact_path="preprocessing")
    
    # Train model
    model = LogisticRegression(random_state=42, max_iter=1000)
    model.fit(X_train_scaled, y_train)
    y_pred = model.predict(X_test_scaled)
    
    mlflow.log_metrics({"accuracy": accuracy_score(y_test, y_pred)})
    mlflow.sklearn.log_model(model, name="model")
    
    # Tag to remember this model needs scaling
    mlflow.set_tag("requires_scaling", "true")
    
    print("Model + scaler saved!")
    print("Check artifacts: preprocessing/scaler.pkl")

## 2.4 Problem: Comparing Runs

In the messy notebook:
```python
# ok I give up trying to track all these experiments
# which one was the best? I think it was the grid search one?
```

**Question**: How do we programmatically find the best run?

*Think... then reveal.*

In [None]:
# SOLUTION: Use MLflow Client to search runs
from mlflow.tracking import MlflowClient

client = MlflowClient()
experiment = client.get_experiment_by_name(experiment_name)

# Search runs, ordered by accuracy (descending)
runs = client.search_runs(
    experiment_ids=[experiment.experiment_id],
    order_by=["metrics.accuracy DESC"],
    max_results=5
)

print("Top 5 runs by accuracy:")
print("=" * 60)
for run in runs:
    acc = run.data.metrics.get('accuracy', 0)
    f1 = run.data.metrics.get('f1', 0)
    print(f"{run.info.run_name}: accuracy={acc:.4f}, f1={f1:.4f}")

---

# PART 3: Fill in the Blanks

Now it's your turn! Complete the code in the following cells.

**Hint**: Refer to the cheatsheet at `../docs/mlflow_cheatsheet.md`

## 3.1 Exercise: Log a Gradient Boosting Model

Complete the code to train and log a Gradient Boosting model with:
- Parameters: n_estimators, learning_rate, max_depth
- Metrics: accuracy, precision, recall, f1
- Artifact: the trained model

In [None]:
# EXERCISE: Complete this code

with mlflow.start_run(run_name="exercise-gradient-boosting"):
    
    # TODO: Log these parameters
    n_estimators = 100
    learning_rate = 0.1
    max_depth = 5
    
    # mlflow.log_params({...})  # <-- Complete this
    
    # Train model
    model = GradientBoostingClassifier(
        n_estimators=n_estimators,
        learning_rate=learning_rate,
        max_depth=max_depth,
        random_state=42
    )
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    # TODO: Calculate and log metrics
    # mlflow.log_metrics({...})  # <-- Complete this
    
    # TODO: Log the model
    # mlflow.sklearn.log_model(...)  # <-- Complete this
    
    print("Exercise completed! Check MLflow UI.")

<details>
<summary>Click to reveal solution</summary>

```python
with mlflow.start_run(run_name="exercise-gradient-boosting"):
    
    n_estimators = 100
    learning_rate = 0.1
    max_depth = 5
    
    mlflow.log_params({
        "n_estimators": n_estimators,
        "learning_rate": learning_rate,
        "max_depth": max_depth
    })
    
    model = GradientBoostingClassifier(
        n_estimators=n_estimators,
        learning_rate=learning_rate,
        max_depth=max_depth,
        random_state=42
    )
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    mlflow.log_metrics({
        "accuracy": accuracy_score(y_test, y_pred),
        "precision": precision_score(y_test, y_pred),
        "recall": recall_score(y_test, y_pred),
        "f1": f1_score(y_test, y_pred)
    })
    
    mlflow.sklearn.log_model(model, name="model")
```

</details>

## 3.2 Exercise: Add Tags and a Confusion Matrix

Enhance your run with:
- Tags: author (your name), model_type
- A confusion matrix plot as artifact

In [None]:
# EXERCISE: Complete this code

with mlflow.start_run(run_name="exercise-with-tags-and-plot"):
    
    # TODO: Add tags
    # mlflow.set_tag("author", "...")  # <-- Your name
    # mlflow.set_tag("model_type", "...")  # <-- Model type
    
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    mlflow.log_metric("accuracy", accuracy_score(y_test, y_pred))
    
    # TODO: Create and log confusion matrix
    # cm = confusion_matrix(y_test, y_pred)
    # fig, ax = plt.subplots(...)
    # sns.heatmap(...)
    # mlflow.log_figure(...)  # <-- Complete this
    # plt.close()
    
    print("Exercise completed!")

## 3.3 Challenge: Find and Load the Best Model

Use the MLflow Client to:
1. Find the run with the highest accuracy
2. Load the model from that run
3. Make predictions on new data

In [None]:
# CHALLENGE: Complete this code

from mlflow.tracking import MlflowClient

client = MlflowClient()
experiment = client.get_experiment_by_name(experiment_name)

# TODO: Search for the best run by accuracy
# best_runs = client.search_runs(
#     experiment_ids=[...],
#     order_by=[...],
#     max_results=1
# )

# if best_runs:
#     best_run = best_runs[0]
#     print(f"Best run: {best_run.info.run_name}")
#     print(f"Accuracy: {best_run.data.metrics.get('accuracy', 0):.4f}")
    
#     # TODO: Load the model
#     # model_uri = f"runs:/{best_run.info.run_id}/model"
#     # loaded_model = mlflow.sklearn.load_model(model_uri)
    
#     # TODO: Make predictions
#     # predictions = loaded_model.predict(X_test[:5])
#     # print(f"Sample predictions: {predictions}")

---

# PART 4: Serving & Inference

You already know how to serve models with joblib + FastAPI. Let's see how MLflow simplifies this.

## 4.1 Register a Model

Before serving, let's register our best model in the Model Registry.

In [None]:
# First, let's train a model we want to register
with mlflow.start_run(run_name="model-for-registry") as run:
    
    mlflow.log_params({"n_estimators": 200, "max_depth": 12})
    
    model = RandomForestClassifier(n_estimators=200, max_depth=12, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    mlflow.log_metrics({
        "accuracy": accuracy_score(y_test, y_pred),
        "f1": f1_score(y_test, y_pred)
    })
    
    # Log the model
    mlflow.sklearn.log_model(model, name="model")
    
    # Save run_id for registration
    run_id = run.info.run_id
    print(f"Model logged. Run ID: {run_id}")

In [None]:
# Register the model in the Model Registry
model_name = "churn-predictor"
model_uri = f"runs:/{run_id}/model"

# Register!
model_version = mlflow.register_model(model_uri, model_name)

print(f"Model registered: {model_name}")
print(f"Version: {model_version.version}")
print(f"\nView in UI: http://localhost:5000/#/models/{model_name}")

## 4.2 Load Model from Registry

Instead of `joblib.load('model.pkl')`, you can load by name and version.

In [None]:
# Load model from registry - compare with what you know!

# OLD WAY (what you already know):
# model = joblib.load('model.pkl')  # Where is it? Which version?

# MLFLOW WAY:
# Load latest version
loaded_model = mlflow.sklearn.load_model(f"models:/{model_name}/latest")

# Or load specific version
# loaded_model = mlflow.sklearn.load_model(f"models:/{model_name}/1")

# Make predictions - same as before!
sample_predictions = loaded_model.predict(X_test[:5])
print(f"Sample predictions: {sample_predictions}")
print(f"\nModel loaded from registry: {model_name}/latest")

## 4.3 Serve Model as REST API

You know how to serve models with FastAPI:

```python
# OLD WAY - FastAPI + joblib
from fastapi import FastAPI
import joblib

app = FastAPI()
model = joblib.load('model.pkl')

@app.post("/predict")
def predict(data: dict):
    X = pd.DataFrame([data])
    return {"prediction": model.predict(X).tolist()}
```

**MLflow does this in one command:**

```bash
mlflow models serve -m models:/churn-predictor/latest -p 5001 --no-conda
```

That's it. No FastAPI code needed.

In [None]:
# If you run the serve command above, you can call it like this:
# (This is the same pattern you know from FastAPI!)

import requests

# Sample data for prediction
sample_data = X_test.head(3).to_dict(orient='split')

# Uncomment when server is running:
# response = requests.post(
#     "http://localhost:5001/invocations",
#     json={"dataframe_split": sample_data}
# )
# print(response.json())

print("To test serving:")
print("1. Open a terminal")
print("2. Run: mlflow models serve -m models:/churn-predictor/latest -p 5001 --no-conda")
print("3. Uncomment the code above and run this cell")

## 4.4 Batch Inference (Load from Registry)

For batch predictions in a pipeline, just load and predict:

In [None]:
# Batch inference - this is what orchestrators will do!

# Load model from registry
model = mlflow.sklearn.load_model(f"models:/{model_name}/latest")

# Load new data (simulate daily batch)
new_customers = df[FEATURE_COLS].head(100)

# Predict
predictions = model.predict(new_customers)
probabilities = model.predict_proba(new_customers)[:, 1]

# Create results
results = pd.DataFrame({
    'customer_id': df['customer_id'].head(100),
    'churn_probability': probabilities,
    'churn_predicted': predictions
})

print(f"Batch predictions for {len(results)} customers:")
print(results.head(10))

## 4.5 Comparison: What You Know vs MLflow

| Task | Manual (joblib + FastAPI) | MLflow |
|------|---------------------------|--------|
| Save model | `joblib.dump(model, 'model_v2.pkl')` | `mlflow.sklearn.log_model(model, name="model")` |
| Load model | `joblib.load('model_v2.pkl')` | `mlflow.sklearn.load_model("models:/name/1")` |
| Versioning | Manual file names (`model_v1.pkl`, `model_v2.pkl`) | Automatic versions (1, 2, 3...) |
| Serve API | Write FastAPI code, run uvicorn | `mlflow models serve -m models:/name/1` |
| Track which model | Hope you remember / check file dates | Registry shows version, metrics, who trained it |
| Rollback | Find old file, hope it works | `mlflow.sklearn.load_model("models:/name/1")` |

**MLflow doesn't replace your skills** - it builds on them with versioning, tracking, and automation.

---

# Summary

## What You Learned

| Concept | Code |
|---------|------|
| Connect to MLflow | `mlflow.set_tracking_uri("http://localhost:5000")` |
| Create experiment | `mlflow.set_experiment("name")` |
| Start a run | `with mlflow.start_run():` |
| Log parameters | `mlflow.log_params({"key": value})` |
| Log metrics | `mlflow.log_metrics({"accuracy": 0.95})` |
| Log model | `mlflow.sklearn.log_model(model, name="model")` |
| Log figure | `mlflow.log_figure(fig, "plot.png")` |
| Log artifact | `mlflow.log_artifact("file.pkl")` |
| Set tags | `mlflow.set_tag("author", "me")` |
| Search runs | `client.search_runs(...)` |
| Register model | `mlflow.register_model(uri, "name")` |
| Load from registry | `mlflow.sklearn.load_model("models:/name/latest")` |
| Serve model | `mlflow models serve -m models:/name/1 -p 5001` |

## The Full ML Pipeline

```
┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   DEVELOP    │ →  │    TRAIN     │ →  │   REGISTER   │ →  │    SERVE     │
├──────────────┤    ├──────────────┤    ├──────────────┤    ├──────────────┤
│  Notebooks   │    │  Orchestrator│    │   Model      │    │  REST API    │
│  Experiments │    │  + Tracking  │    │   Registry   │    │  or Batch    │
└──────────────┘    └──────────────┘    └──────────────┘    └──────────────┘
     Part 1-3            Part 4              Part 4            Part 4
```

## Next Steps

**Reference notebook**: See `02_mlflow_organized.ipynb` for the complete, production-ready version.

**Orchestrators**: Now that you understand MLflow, let's automate this pipeline!
- **Prefect**: See `pipelines/Prefect_ML_Pipeline.py` - trains AND runs inference
- **Dagster**: See `pipelines/Dagster_ML_Pipeline.py` - asset-centric approach
- **Airflow**: See `pipelines/Airflow_ML_Pipeline.py` - industry standard

The orchestrators will:
1. Schedule daily training runs
2. Log everything to MLflow automatically
3. Register new model versions
4. Run batch inference and save predictions

---