# Session 23-24 Machine Learning Operations (MLOps)

# Introduction to MLflow

## Learning Goals

By the end of this session, students will be able to:

* Understand why **MLflow is needed**

* Track **experiments, parameters, and metrics**

* Compare multiple model runs

* Save and load models reproducibly

## Step 0: Why MLflow?

**Problem:**
“I trained many models, but I forgot which hyperparameters gave the best result.”

**MLflow solves:**

* Experiment tracking

* Model versioning

* Reproducibility

## Step 1: Installation & Imports

In [None]:
import mlflow
import mlflow.sklearn

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

### Explanation

* MLflow works with any ML library

* We’ll use scikit-learn + Iris dataset for simplicity

## Step 2: Load Dataset

In [None]:
# Load dataset
X, y = load_iris(return_X_y=True)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

## Step 3: First MLflow Experiment

In [None]:
# This code trains a model, evaluates it, and records how it was built and how well it performed using MLflow.

# Starts one experiment run in MLflow
# Everything inside this block will be automatically tracked
# When the block ends, the run is closed and saved
with mlflow.start_run():

    # Define model
    # Creates a Logistic Regression model
    # max_iter=200 sets the maximum number of training iterations
    # This value will later be logged and compared in MLflow
    model = LogisticRegression(max_iter=200)

    # Train model
    # MLflow does not automatically track training — we log results manually
    model.fit(X_train, y_train)

    # Predict
    # Uses the trained model to make predictions on unseen test data
    y_pred = model.predict(X_test)

    # Evaluate
    # Computes accuracy, a performance metric
    accuracy = accuracy_score(y_test, y_pred)

    # Log parameters
    # Saves model configuration settings (parameters)
    # Parameters help explain why one run performs better than another
    mlflow.log_param("model_type", "LogisticRegression")
    mlflow.log_param("max_iter", 200)

    # Log metrics
    # Logs the evaluation result
    # Metrics are numeric values used for comparison
    mlflow.log_metric("accuracy", accuracy)

    print("Accuracy:", accuracy)

### Explanation

One MLflow run = one complete experiment

with model settings (parameters) and performance results (metrics).

## Step 4: View MLflow UI

In [None]:
# Run this in terminal / new cell
# This command starts a local web dashboard for MLflow
#MLflow reads all logged experiments from the project folder
!mlflow ui

### Open the Browser

Go to:
http://localhost:5000

This is a local website running on your computer

No internet is required

### MLflow UI explanation

**Home** → **Experiments** → Click **Default** → **Default** page

#### Runs
* Each row = **one experiment run**
* A run corresponds to **one model training attempt**
* Multiple runs appear when you try different parameters

#### Parameters
* Shows **how the model was built**
* Examples:
    * `max_iter`
    * model type
* Used to **compare different configurations**

#### Metrics
* Shows **how well the model performed**
* Examples:
    * accuracy
    * loss
* Used to **choose the best model**

## Step 5: Multiple Experiments

In [None]:
C_values = [0.01, 0.1, 1.0, 10.0]

for C in C_values:
    with mlflow.start_run():

        # C is the inverse of regularization strength
        # Small C → stronger regularization → simpler model
        # Large C → weaker regularization → more flexible model
        model = LogisticRegression(C=C, max_iter=200)
        model.fit(X_train, y_train)

        y_pred = model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)

        mlflow.log_param("C", C)
        mlflow.log_metric("accuracy", accuracy)

        print(f"C={C}, Accuracy={accuracy}")

### Explanation

* Each loop = new experiment

* Students can visually compare runs in MLflow UI

This demonstrates hyperparameter tuning + tracking

# Step 6: Log the Model

In Steps 3–5, we trained the model, tested it, and logged its settings and performance so we can compare different experiments and see which model works best.

In Step 6, we log the model itself so the best model can be saved, reused, and deployed later instead of being trained again.

Let's assume `C=1.0` is the **best model**, so we want to log this model.

In [None]:
with mlflow.start_run():

    model = LogisticRegression(C=1.0, max_iter=200)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    mlflow.log_param("C", 1.0)
    mlflow.log_metric("accuracy", accuracy)

    # Log model
    # This step saves the actual trained model.
    # It stores: model weights, model structure, environment information
    # It links the model to: parameters, metrics, run ID
    mlflow.sklearn.log_model(model, "model")

    print("Model logged with accuracy:", accuracy)

### Explanation

* MLflow stores:

    * model file

    * environment

    * parameters

* Enables reproducible deployment

## Where Is the Logged Model Stored?

### In the MLflow UI (Easiest way)

Go to the **Default** page, and click the **Model** tab in the left sidebar.

Inside that folder, MLflow stores:

* The trained model

* Model metadata

* Environment information

### On Your Local File System

By default, MLflow saves everything in a folder called: `mlruns/`

Typical structure:

mlruns/

 └── experiment_id/
     
     └── run_id/
     
         └── artifacts/
         
             └── model/

# Step 7: Load Model from MLflow

In [None]:
# pyfunc is MLflow’s generic model interface
# It allows you to load models without caring how they were trained
# Works for models trained with scikit-learn, TensorFlow, PyTorch, etc.
import mlflow.pyfunc

# model_uri = "runs:/<RUN_ID>/model"
# runs:/ tells MLflow to load a model from a specific experiment run
# <RUN_ID> uniquely identifies one MLflow run
# model is the name of the logged model artifact
model_uri = "runs:/9c76479f59e24c1ab14a76572e6260f7/model"

# Loads the exact trained model saved in MLflow
# No retraining is needed
# Ensures the same model can be reused later or on another machine
loaded_model = mlflow.pyfunc.load_model(model_uri)

predictions = loaded_model.predict(X_test)
print(predictions[:5])

# Exercise: Regression with the Diabetes Datset

## Dataset & Model

**Dataset**: Diabetes dataset (from scikit-learn)

**Task**: Regression (predict disease progression)

**Model**: Linear Regression / Ridge Regression

You will train regression models with different hyperparameters and use MLflow to track performance.

Your goal is to **identify the best-performing model**.

## Step 1: Load Dataset and Split Data

**Task**: Load the diabetes dataset and split it into training and test sets.

**Hints**:

* Use `load_diabetes()`

* Use `train_test_split()`

* Use `test_size=0.2`

In [3]:
# Load the Diabetes dataset
from sklearn.datasets import load_diabetes
X, y = load_diabetes(return_X_y=True)

## Step 2: Choose a Regression Model

**Task**: Use **Ridge Regression** and test different values of `alpha`.

**Hints**:

* Import `Ridge` from `sklearn.linear_model`

* Try `alpha = [0.1, 1.0, 10.0, 100.0]`

## Step 3: Train the Model and Evaluate Performance

**Task**:

For each `alpha value`:

* Train the model

* Predict on test data

* Compute **Mean Squared Error (MSE)**

**Hints**:

* Use `mean_squared_error()`

* Lower MSE = better model

## Step 4: Track Experiments with MLflow

**Task**:

Inside an MLflow run, log:

* Parameter: `alpha`

* Metric: `mse`

Each `alpha` value should create **one MLflow run**.

## Step 5: View and Compare Results in MLflow UI

**Task**:

* Launch the MLflow UI

* Compare runs

* Identify the **best alpha**

## Step 6: Log the Best Model

**Task**:

* Retrain the best model

* Log it using MLflow