# 👩‍💻 Track and Compare Multiple Model Runs with MLflow

## 📋 Overview
Embrace the journey of managing and comparing multiple model iterations with MLflow, a robust tool for experiment tracking in machine learning. Through this activity, you will gain hands-on experience in organizing and assessing various model runs to improve both reproducibility and predictability in your projects.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:

- ✅ Perform multiple model runs with different hyperparameters
- ✅ Compare and analyze model performance by examining logged metrics and artifacts

## Task 1: Dataset Selection and Preprocessing

**Context:** Proper dataset selection and preprocessing ensures the data is clean and ready for modeling.

**Steps:**
1. Ensure the dataset is preprocessed: handle missing data, normalize features as needed, and split into training and testing sets.

In [None]:
# Import Libraries

# Task 1: Dataset Selection and Preprocessing
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

💡 **Tip:** Use `train_test_split` from `sklearn.model_selection` for data splitting.

⚙️ **Test Your Work:**
- The dataset should show the features and corresponding labels, demonstrating the preprocessing steps.

## Task 2: Create Multiple Experiment Runs

**Context:** Performing multiple experiments runs with varied hyperparameters allows comparison.

**Steps:**

1. Develop a simple model using Logistic Regression.
2. Conduct several runs by varying hyperparameters such as maximum number of iterations and regularization strength.
3. Log these experiments with unique identifiers in MLflow for easy comparison.

In [None]:
# Task 2: Create Multiple Experiment Runs

💡 **Tip:** Use `mlflow.start_run()` to encapsulate each experiment run.

⚙️ **Test Your Work:**
- Plots should clearly show the relationship between actual and predicted values for each run. 
- Legends should correctly identify each run based on hyperparameter configurations.

## Task 3: Logging Parameters, Metrics, and Artifacts

**Context:** Logging parameters, metrics, and artifacts helps to track and compare model runs in detail.

**Steps:**

1. For each run, log hyperparameters (e.g., regularization parameters), performance metrics (e.g., accuracy), and confusion matrices as artifacts.
2. Ensure that logs are sufficiently detailed to support revisiting and comparing results across all runs.

In [None]:
# Task 3: Logging Parameters, Metrics, and Artifacts

💡 **Tip:** Use `mlflow.log_param`, `mlflow.log_metric`, and `mlflow.log_artifact` to save details for each run.

⚙️ **Test Your Work:**
- Logs should clearly show hyperparameters, metrics, and artifacts for each run in MLflow. 
- Entries should be well-documented to facilitate easy comparison.

## Task 4: Analyze Model Performances

**Context:** Analyzing logged information helps determine patterns and the impact of hyperparameter changes on model performance.

**Steps:**

1. Assess the logged information to determine patterns and impact of hyperparameter changes on model performance.
2. Reflect on how experiment logging aids in understanding model behavior and in decision-making processes for future model iterations.

**💡 Tip:** Look for trends and correlations between hyperparameter settings and performance metrics.

**⚙️ Test Your Work:**
- Analysis should clearly show the relationship between hyperparameter changes and model performance. 
- Reflective documentation should provide insights into model behavior and recommendations for future experiments.

### ✅ Success Checklist

- Successfully selected and preprocessed the dataset
- Performed and logged multiple experiment runs with different hyperparameters
- Logged detailed parameters, metrics, and artifacts for each run
- Analyzed and documented model performance and insights

### 🔍 Common Issues & Solutions

**Problem:** MLflow installation errors.   
**Solution:** Ensure correct installation using `pip install mlflow` and verify using `mlflow --version`.

**Problem:** Experiment logging issues.   
**Solution:** Verify the use of `mlflow.start_run()` and ensure all details are correctly logged using `mlflow.log_param`, `mlflow.log_metric`, and `mlflow.log_artifact`.


### 🔑 Key Points

- MLflow provides robust tools for tracking, comparing, and analyzing multiple model runs.
- Logging detailed parameters, metrics, and artifacts enhances reproducibility and predictability.
- Analyzing logged information helps in decision-making for future experiments.

## 💻 Exemplar Solution

<details>    
<summary><strong>Click HERE to see an exemplar solution</strong></summary>    

```python
# Import Libraries    
import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, confusion_matrix
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
    
    
# Task 1: Dataset Selection and Preprocessing
# Load dataset
iris = load_iris()

# Impute missing values (if any) with the mean of each column
imputer = SimpleImputer(strategy='mean')
X = imputer.fit_transform(iris.data)

# Handle feature scaling using StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
    
# Splitting data   
X_train, X_test, y_train, y_test = train_test_split(X_scaled, iris.target, test_size=0.2, random_state=42)

# Task 2: Create Multiple Experiment Runs
mlflow.set_experiment("Iris Classification")

hyperparameter_configs = [
    {"max_iter": 100, "C": 1.0},
    {"max_iter": 200, "C": 0.5},
    {"max_iter": 300, "C": 0.75}
]

for config in hyperparameter_configs:
    with mlflow.start_run():
        model = LogisticRegression(max_iter=config["max_iter"], C=config["C"], solver='lbfgs', multi_class='auto')
        model.fit(X_train, y_train)
    
# Task 3: Logging Parameters, Metrics, and Artifacts
# Log model and parameters
mlflow.sklearn.log_model(model, "logistic_regression_model")
mlflow.log_param("max_iter", config["max_iter"])
mlflow.log_param("C", config["C"])
    
# Evaluate and log metrics
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
mlflow.log_metric("accuracy", accuracy)
        
# Log confusion matrix as an artifact
confusion_mat = confusion_matrix(y_test, predictions)
cm_df = pd.DataFrame(confusion_mat, index=iris.target_names, columns=iris.target_names)
cm_df.to_csv("confusion_matrix.csv")
mlflow.log_artifact("confusion_matrix.csv")
```                                                                                               