# Experiment Tracking with MLflow

In this notebook, we’ll explore **experiment tracking** — a key MLOps practice — using **MLflow**, one of the most popular open-source tools for managing machine learning workflows.

MLflow helps track experiments, log parameters and metrics, version models, and even deploy them efficiently.

## 🔍 Why Experiment Tracking?

When building ML models, we try different algorithms, hyperparameters, and preprocessing steps. Without a systematic way to record them, it becomes difficult to:
- Compare model versions
- Understand why one performed better than another
- Reproduce past results

**Experiment tracking** solves this problem by automatically recording metadata for each run — parameters, metrics, artifacts, and source code.

In [ ]:
# Install MLflow if not already installed
# !pip install mlflow

## ⚙️ Setting Up MLflow

MLflow can be run locally or connected to a central tracking server.

By default, MLflow logs to a local directory called `mlruns/`. You can also set an experiment name.

In [ ]:
import mlflow
import mlflow.sklearn

# Set experiment name
mlflow.set_experiment("Iris_Classification_Experiment")

## 🧠 Example: Tracking a Simple Model

Let’s train a simple **Logistic Regression** classifier on the **Iris dataset** and log parameters, metrics, and the model with MLflow.

In [ ]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load data
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

with mlflow.start_run(run_name="Logistic_Regression_Run"):
    
    # Model with different regularization strength
    model = LogisticRegression(C=1.0, max_iter=200)
    model.fit(X_train, y_train)
    
    # Predictions
    preds = model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    
    # Log parameters and metrics
    mlflow.log_param("C", 1.0)
    mlflow.log_param("max_iter", 200)
    mlflow.log_metric("accuracy", acc)
    
    # Log model
    mlflow.sklearn.log_model(model, "model")
    
    print(f"Logged run with accuracy: {acc:.4f}")

## 📂 Viewing Results

Once the experiment is logged, you can view it with:

```bash
mlflow ui
```

Then open [http://localhost:5000](http://localhost:5000) to explore runs, metrics, and artifacts visually.

You’ll see all your experiment runs with their corresponding hyperparameters and performance metrics.

## 📊 Logging Multiple Runs

We can easily run multiple experiments with different hyperparameters and log them all for comparison.

In [ ]:
for C in [0.1, 1.0, 10.0]:
    with mlflow.start_run(run_name=f"LR_C={C}"):
        model = LogisticRegression(C=C, max_iter=200)
        model.fit(X_train, y_train)
        preds = model.predict(X_test)
        acc = accuracy_score(y_test, preds)
        
        mlflow.log_param("C", C)
        mlflow.log_metric("accuracy", acc)
        
        mlflow.sklearn.log_model(model, "model")
        print(f"C={C}, Accuracy={acc:.4f}")

## 🧩 Logging Artifacts

You can also log **plots, confusion matrices, or datasets** as artifacts using `mlflow.log_artifact()`.

In [ ]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

conf_mat = confusion_matrix(y_test, preds)
plt.figure(figsize=(5,4))
sns.heatmap(conf_mat, annot=True, cmap='Blues', fmt='g')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.savefig('conf_matrix.png')

# Log the image as an artifact
mlflow.log_artifact('conf_matrix.png')

## 🚀 MLflow Components Overview

MLflow consists of 4 key components:

1. **Tracking** – Log experiments, metrics, and artifacts.
2. **Projects** – Package code for reproducibility.
3. **Models** – Manage model versions and formats.
4. **Model Registry** – Central repository for managing model lifecycle (staging → production).

Together, they form a complete workflow from **experiment → deployment**.

## ✅ Summary

In this notebook, you learned how to:
- Track experiments using **MLflow**
- Log parameters, metrics, and artifacts
- Compare multiple runs visually in MLflow UI
- Manage and version ML models efficiently

Next, we’ll explore **03-Model_Registry_and_Packaging.ipynb**, where we manage model lifecycle and deployment using MLflow’s Model Registry.