# MLflow Tracking Tutorial

This notebook provides an interactive introduction to MLflow tracking capabilities through both conceptual explanations and hands-on examples.

**Learning Objectives:**
- Understand MLflow's core tracking components
- Configure and customize MLflow for experiment tracking
- Track parameters, metrics, and artifacts for different model types
- Implement hyperparameter tuning with MLflow
- Access and interpret the MLflow UI
- Apply best practices for model management

## Table of Contents
1. [Introduction to MLflow](#intro)
2. [Setting Up the Environment](#setup)
3. [MLflow Tracking Basics](#basics)
4. [Tracking Scikit-learn Models](#sklearn)
5. [Tracking TensorFlow Models](#tensorflow)
6. [Hyperparameter Tuning](#tuning)
7. [Working with the MLflow UI](#ui)
8. [Best Practices](#practices)
9. [Exercises](#exercises)

## 1. Introduction to MLflow <a id="intro"></a>

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It has four main components:

1. **MLflow Tracking**: Records parameters, code versions, metrics, and artifacts
2. **MLflow Projects**: Packages code in a reusable, reproducible form
3. **MLflow Models**: Manages and deploys models from different ML libraries
4. **MLflow Registry**: Centrally manages models through their lifecycle

In this tutorial, we'll focus primarily on MLflow Tracking, which helps you:
- Compare results between runs
- Reproduce experiments
- Share results with team members
- Keep a permanent record of your experiments

### MLflow Tracking Key Concepts

* **Experiment**: A group of runs for a specific task
* **Run**: A single execution of your code
* **Parameters**: Key-value inputs to your code (e.g., hyperparameters)
* **Metrics**: Key-value outputs from your code (e.g., accuracy, loss)
* **Artifacts**: Files generated during the run (e.g., models, plots)
* **Tags**: Additional metadata about the run

Pseudocode for a basic MLflow workflow:

```
import mlflow

# Set the experiment
mlflow.set_experiment("experiment_name")

# Start a run
with mlflow.start_run():
    # Log parameters
    mlflow.log_param("param_name", value)
    
    # Train model
    model.fit(X_train, y_train)
    
    # Log metrics
    mlflow.log_metric("metric_name", value)
    
    # Log the model
    mlflow.sklearn.log_model(model, "model")
```

## 2. Setting Up the Environment <a id="setup"></a>

First, let's import the necessary libraries and set up our environment. We'll use the existing utility functions from our project.

In [None]:
# Import required libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import mlflow
import mlflow.sklearn
import mlflow.tensorflow

# Import utilities from our project
from data_utils import load_dataset
from model_utils import create_sklearn_model, create_tensorflow_model
from mlflow_utils import load_config, setup_mlflow

# Load configuration
config = load_config('config.yaml')
print("Configuration loaded:")
print(f"- Dataset: {config['dataset']}")
print(f"- Task: {config['task']}")
print(f"- Model type: {config['model_type']}")
print(f"- MLflow experiment: {config.get('mlflow', {}).get('experiment_name', 'default')}")

### Understanding the Configuration File

Our project uses a YAML configuration file to control experiments. The key sections are:

1. **General configuration**: Dataset, task type, and optimization settings
2. **MLflow configuration**: Tracking URI and experiment name
3. **Model configuration**: Hyperparameters and model types for both sklearn and TensorFlow

Let's examine a portion of this configuration:

In [None]:
# Display the MLflow section of the configuration
import yaml
with open('config.yaml', 'r') as f:
    config = yaml.safe_load(f)

print("MLflow Configuration:")
print(yaml.dump(config.get('mlflow', {}), default_flow_style=False))

print("\nScikit-learn Configuration:")
print(yaml.dump(config.get('sklearn', {}), default_flow_style=False))

### Setting Up MLflow Tracking

Now let's set up MLflow tracking using our utility function. By default, MLflow will store runs in the local `./mlruns` directory, but you can also configure it to use a remote tracking server or database.

In [None]:
# Set up MLflow
from mlflow_utils import setup_mlflow
setup_mlflow(config)

# Check the tracking URI
print(f"MLflow Tracking URI: {mlflow.get_tracking_uri()}")

# List all experiments
experiments = mlflow.search_experiments()
print("\nAvailable experiments:")
for exp in experiments:
    print(f"- {exp.name} (ID: {exp.experiment_id})")

## 3. MLflow Tracking Basics <a id="basics"></a>

Let's start with a simple example to understand the core concepts of MLflow tracking.

In this example, we'll:
1. Load a dataset
2. Create a simple model
3. Track parameters, metrics, and the model itself

In [None]:
# Load dataset
X_train, X_test, y_train, y_test = load_dataset(
    config['dataset'],
    test_size=config.get('test_size', 0.2),
    random_state=config.get('seed', 42)
)

# First tracking example
with mlflow.start_run(run_name="basic_example"):
    # Log basic parameters
    mlflow.log_param("dataset", config['dataset'])
    mlflow.log_param("test_size", config.get('test_size', 0.2))
    mlflow.log_param("random_state", config.get('seed', 42))
    
    # Create a simple model (Decision Tree)
    from sklearn.tree import DecisionTreeClassifier
    model = DecisionTreeClassifier(max_depth=3, random_state=42)
    
    # Log model parameters
    mlflow.log_param("model_type", "DecisionTree")
    mlflow.log_param("max_depth", 3)
    
    # Train the model
    model.fit(X_train, y_train)
    
    # Make predictions
    y_pred = model.predict(X_test)
    
    # Calculate and log metrics
    from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("precision", precision)
    mlflow.log_metric("recall", recall)
    mlflow.log_metric("f1_score", f1)
    
    # Log the model
    mlflow.sklearn.log_model(
        model, 
        "model",
        signature=mlflow.models.infer_signature(X_test, y_pred)
    )
    
    # Log an artifact (feature importance plot)
    plt.figure(figsize=(10, 6))
    plt.bar(range(X_train.shape[1]), model.feature_importances_)
    plt.title('Feature Importance')
    plt.xlabel('Feature Index')
    plt.ylabel('Importance')
    
    # Save the plot as a file
    plt.savefig('feature_importance.png')
    
    # Log the plot as an artifact
    mlflow.log_artifact('feature_importance.png')
    
    # Get the run ID for reference
    run_id = mlflow.active_run().info.run_id
    print(f"Run ID: {run_id}")
    print(f"Accuracy: {accuracy:.4f}")

# Clean up the plot file after logging
os.remove('feature_importance.png')

### Tracking Components Explained

Let's break down the key components of MLflow tracking:

1. **Run**: The `mlflow.start_run()` context manager creates a new run in the current experiment. Each run represents one execution of your code.

2. **Parameters**: We log parameters with `mlflow.log_param()`. Parameters are inputs that define how your model behaves (e.g., hyperparameters).

3. **Metrics**: We log metrics with `mlflow.log_metric()`. Metrics are outputs that measure how well your model performs (e.g., accuracy).

4. **Artifacts**: We log artifacts with `mlflow.log_artifact()`. Artifacts are files generated during the run (e.g., plots, datasets).

5. **Model**: We log the model with `mlflow.sklearn.log_model()`. This stores the model in a format that can be loaded later.

## 4. Tracking Scikit-learn Models <a id="sklearn"></a>

Let's create a more comprehensive example using scikit-learn models and our existing utility functions. We'll compare multiple runs with different hyperparameters.

In [None]:
# Create and track a scikit-learn Random Forest model
from mlflow_utils import log_sklearn_model
from model_utils import create_sklearn_model

# Define different parameter sets to try
rf_params = [
    {"n_estimators": 50, "max_depth": 10, "random_state": 42},
    {"n_estimators": 100, "max_depth": 20, "random_state": 42},
    {"n_estimators": 200, "max_depth": None, "random_state": 42}
]

# Compare multiple model configurations
for i, params in enumerate(rf_params):
    print(f"\nTraining Random Forest with params: {params}")
    
    # Create the model
    model = create_sklearn_model('random_forest', params, task=config['task'])
    
    # Log the model with MLflow
    metrics = log_sklearn_model(model, X_train, X_test, y_train, y_test, params, config)
    
    print(f"Results for run {i+1}:")
    for metric_name, value in metrics.items():
        print(f"- {metric_name}: {value:.4f}")

### Understanding Autologging

MLflow provides an autologging feature that automatically logs parameters, metrics, and models without requiring explicit logging statements. Let's enable autologging for scikit-learn:

In [None]:
# Enable autologging for scikit-learn
mlflow.sklearn.autolog()

# Create a new model with autologging enabled
with mlflow.start_run(run_name="sklearn_autolog_example"):
    # Log custom parameters not captured by autologging
    mlflow.log_param("example_note", "Using autologging")
    
    # Create and train the model
    from sklearn.ensemble import GradientBoostingClassifier
    gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)
    gb_model.fit(X_train, y_train)
    
    # Make predictions
    y_pred = gb_model.predict(X_test)
    
    # Autologging will log most metrics, but we can add custom ones
    from sklearn.metrics import roc_auc_score
    if len(np.unique(y_test)) == 2:  # Binary classification
        y_proba = gb_model.predict_proba(X_test)[:,1]
        roc_auc = roc_auc_score(y_test, y_proba)
        mlflow.log_metric("roc_auc", roc_auc)
        print(f"ROC AUC: {roc_auc:.4f}")

# Disable autologging after use (optional)
mlflow.sklearn.autolog(disable=True)

## 5. Tracking TensorFlow Models <a id="tensorflow"></a>

Now let's track a TensorFlow model. TensorFlow models typically have different hyperparameters and training procedures.

In [None]:
# Import TensorFlow
import tensorflow as tf
from mlflow_utils import log_tensorflow_model

# Define TensorFlow model parameters
tf_params = {
    "units": 64,
    "num_layers": 2,
    "dropout_rate": 0.2,
    "learning_rate": 0.001,
    "activation": "relu",
    "batch_size": 32,
    "epochs": 10,
    "patience": 3,
    "random_state": 42
}

# Log the TensorFlow model
print("Training TensorFlow model...")
metrics = log_tensorflow_model(tf_params, X_train, X_test, y_train, y_test, config)

print("\nTensorFlow model results:")
for metric_name, value in metrics.items():
    print(f"- {metric_name}: {value:.4f}")

### Autologging with TensorFlow

TensorFlow's autologging captures:
- Parameters from the model configuration
- Metrics at each epoch during training
- The computational graph
- The final model weights

In [None]:
# Enable autologging for TensorFlow
mlflow.tensorflow.autolog()

# Create a new model with autologging enabled
with mlflow.start_run(run_name="tensorflow_autolog_example"):
    # Create a simple model
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(16, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid' if len(np.unique(y_train)) <= 2 else 'softmax')
    ])
    
    # Compile the model
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
        loss='binary_crossentropy' if len(np.unique(y_train)) <= 2 else 'sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Train the model
    model.fit(
        X_train, y_train,
        epochs=5,
        batch_size=32,
        validation_split=0.2,
        verbose=1
    )
    
    # Test the model
    test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
    print(f"Test accuracy: {test_acc:.4f}")

# Disable autologging after use (optional)
mlflow.tensorflow.autolog(disable=True)

## 6. Hyperparameter Tuning with MLflow <a id="tuning"></a>

MLflow is particularly useful for hyperparameter tuning. Let's implement a simple hyperparameter search:

In [None]:
# Import necessary libraries
from itertools import product
from sklearn.model_selection import ParameterGrid

# Define a hyperparameter grid
param_grid = {
    'max_depth': [3, 5, 7],
    'min_samples_split': [2, 5, 10],
    'criterion': ['gini', 'entropy']
}

# Create parameter combinations
param_combinations = list(ParameterGrid(param_grid))
print(f"Testing {len(param_combinations)} hyperparameter combinations")

# Perform hyperparameter tuning with MLflow tracking
for i, params in enumerate(param_combinations):
    print(f"\nTrial {i+1}/{len(param_combinations)}")
    print(f"Parameters: {params}")
    
    # Start a run for this parameter combination
    with mlflow.start_run(run_name=f"dt_tuning_{i}"):
        # Log parameters
        for param_name, param_value in params.items():
            mlflow.log_param(param_name, param_value)
        
        # Create and train the model
        model = DecisionTreeClassifier(**params, random_state=42)
        model.fit(X_train, y_train)
        
        # Evaluate and log metrics
        y_pred = model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred, average='weighted')
        recall = recall_score(y_test, y_pred, average='weighted')
        
        mlflow.log_metric("accuracy", accuracy)
        mlflow.log_metric("precision", precision)
        mlflow.log_metric("recall", recall)
        
        # Log the model
        mlflow.sklearn.log_model(model, "model")
        
        print(f"Accuracy: {accuracy:.4f}")

### Finding the Best Model

After hyperparameter tuning, we can use MLflow to find the best model based on a specific metric.

In [None]:
# Find best run using our utility function
from mlflow_utils import find_best_run

# Use the experiment name from config
experiment_name = config.get('mlflow', {}).get('experiment_name', 'default_experiment')

# Find the best run based on accuracy
best_run = find_best_run(experiment_name, "accuracy", mode="max")

if best_run:
    print("\n=== Best Model ===")
    print(f"Run ID: {best_run['run_id']}")
    print(f"Accuracy: {best_run['metrics.accuracy']:.4f}")
    print("Parameters:")
    for key in best_run.keys():
        if key.startswith('params.'):
            param_name = key.replace('params.', '')
            print(f"  {param_name}: {best_run[key]}")
else:
    print("No best model found.")

## 7. Working with the MLflow UI <a id="ui"></a>

MLflow provides a web-based user interface to visualize and compare experiments.

To access the MLflow UI, run this command in your terminal:

```bash
mlflow ui --port 5000
```

Then open your browser to http://localhost:5000

The MLflow UI allows you to:
1. View all experiments
2. Compare runs side-by-side
3. Sort and filter runs based on parameters and metrics
4. View model details and artifacts
5. Download logged models
6. Plot metrics over time

Let's create a visualization similar to what you'd see in the UI:

In [None]:
# Get runs for the current experiment to visualize
experiment = mlflow.get_experiment_by_name(experiment_name)
runs = mlflow.search_runs(experiment_ids=[experiment.experiment_id])

# Display a summary of the runs
if not runs.empty:
    print(f"Found {len(runs)} runs in experiment '{experiment_name}'")
    
    # Create a scatter plot of different runs
    plt.figure(figsize=(12, 6))
    
    # Find columns with metrics
    metric_cols = [col for col in runs.columns if col.startswith('metrics.')]
    
    if len(metric_cols) >= 2 and 'metrics.accuracy' in metric_cols:
        # Choose two metrics to plot
        metric_x = 'metrics.accuracy'
        metric_y = metric_cols[1] if metric_cols[0] == 'metrics.accuracy' else metric_cols[0]
        
        plt.scatter(runs[metric_x], runs[metric_y])
        plt.xlabel(metric_x.replace('metrics.', ''))
        plt.ylabel(metric_y.replace('metrics.', ''))
        plt.title('Comparison of Model Runs')
        
        # Add annotations for some points
        for i, row in runs.iterrows():
            if i % 3 == 0:  # Annotate every third point for clarity
                plt.annotate(f"Run {i}", (row[metric_x], row[metric_y]))
        
        plt.grid(True, linestyle='--', alpha=0.6)
        plt.tight_layout()
        plt.show()
        
    # Show metrics distribution
    if 'metrics.accuracy' in metric_cols:
        plt.figure(figsize=(10, 5))
        plt.hist(runs['metrics.accuracy'], bins=10, alpha=0.7)
        plt.title('Distribution of Accuracy Across Runs')
        plt.xlabel('Accuracy')
        plt.ylabel('Number of Runs')
        plt.grid(True, linestyle='--', alpha=0.6)
        plt.tight_layout()
        plt.show()
else:
    print(f"No runs found in experiment '{experiment_name}'")

## 8. Best Practices for MLflow Tracking <a id="practices"></a>

Here are some best practices for effectively using MLflow:

1. **Organize experiments logically**: Create separate experiments for different objectives or datasets.

2. **Log all relevant parameters**: Document everything needed to reproduce your experiment.

3. **Use descriptive run names**: Name your runs to easily identify them later.

4. **Log model signatures**: Include input/output signatures for better model serving.

5. **Version your data**: Track which version of the data was used for training.

6. **Use tags for additional metadata**: Add tags to categorize runs and add context.

7. **Set up a persistent backend**: Use a database backend for production environments.

8. **Log environment details**: Record package versions and environment configurations.

Let's implement some of these best practices:

In [None]:
# Example implementing best practices
with mlflow.start_run(run_name="best_practices_example"):
    # 1. Log all relevant parameters
    mlflow.log_param("dataset", config['dataset'])
    mlflow.log_param("test_size", config.get('test_size', 0.2))
    mlflow.log_param("model_type", "DecisionTree")
    mlflow.log_param("max_depth", 5)
    
    # 2. Add tags for additional metadata
    mlflow.set_tag("version", "v1.0.0")
    mlflow.set_tag("author", "MLflow Tutorial")
    mlflow.set_tag("purpose", "Educational")
    mlflow.set_tag("priority", "high")
    
    # 3. Log environment details
    import sys
    mlflow.log_param("python_version", sys.version)
    mlflow.log_param("scikit_learn_version", pd.__version__)
    
    # 4. Create and train a model
    model = DecisionTreeClassifier(max_depth=5, random_state=42)
    model.fit(X_train, y_train)
    
    # 5. Log metrics
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    mlflow.log_metric("accuracy", accuracy)
    
    # 6. Log the model with signature and input example
    signature = mlflow.models.infer_signature(X_test, y_pred)
    input_example = X_test[:5]
    
    mlflow.sklearn.log_model(
        model,
        "model",
        signature=signature,
        input_example=input_example
    )
    
    # 7. Log a dataset info artifact
    dataset_info = {
        "name": config['dataset'],
        "n_samples": len(X_train) + len(X_test),
        "n_features": X_train.shape[1],
        "date_processed": pd.Timestamp.now().strftime("%Y-%m-%d %H:%M:%S")
    }
    
    # Write dataset info to a file
    with open("dataset_info.json", "w") as f:
        import json
        json.dump(dataset_info, f)
    
    # Log the file as an artifact
    mlflow.log_artifact("dataset_info.json")
    
    print(f"Run completed with accuracy: {accuracy:.4f}")
    print(f"Run ID: {mlflow.active_run().info.run_id}")

# Clean up
os.remove("dataset_info.json")

## 9. Exercises <a id="exercises"></a>

Now let's practice what we've learned with some exercises. Try to complete these on your own before looking at the solutions.

### Exercise 1: Compare Multiple Scikit-learn Classifiers

Create an experiment that compares different scikit-learn classifiers (Decision Tree, Random Forest, Logistic Regression) on the same dataset.

1. Log each model type with its default parameters
2. Record accuracy, precision, recall, and F1 score for each
3. Create a visual comparison of the results

In [None]:
# Exercise 1: Solution
# Import necessary libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Define models to compare
models = {
    'DecisionTree': DecisionTreeClassifier(random_state=42),
    'RandomForest': RandomForestClassifier(random_state=42),
    'LogisticRegression': LogisticRegression(random_state=42, max_iter=1000)
}

# Track results for comparison
results = {}

# Compare models with MLflow
for name, model in models.items():
    print(f"\nTraining {name}...")
    
    with mlflow.start_run(run_name=f"compare_{name}"):
        # Log model type
        mlflow.log_param("model_type", name)
        
        # Train model
        model.fit(X_train, y_train)
        
        # Evaluate model
        y_pred = model.predict(X_test)
        
        # Calculate metrics
        metrics = {
            "accuracy": accuracy_score(y_test, y_pred),
            "precision": precision_score(y_test, y_pred, average='weighted'),
            "recall": recall_score(y_test, y_pred, average='weighted'),
            "f1_score": f1_score(y_test, y_pred, average='weighted')
        }
        
        # Log metrics
        mlflow.log_metrics(metrics)
        
        # Log model
        mlflow.sklearn.log_model(model, "model")
        
        # Store results for visualization
        results[name] = metrics
        
        print(f"{name} - Accuracy: {metrics['accuracy']:.4f}")

# Create a visualization
metrics_df = pd.DataFrame(results).transpose()
metrics_df.plot(kind='bar', figsize=(12, 6))
plt.title('Classifier Comparison')
plt.ylabel('Score')
plt.xlabel('Model')
plt.xticks(rotation=0)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.legend(loc='lower right')
plt.tight_layout()
plt.show()

print("\nModel comparison:")
print(metrics_df)

### Exercise 2: Implement Cross-Validation with MLflow

Implement k-fold cross-validation while tracking each fold's performance with MLflow.

In [None]:
# Exercise 2: Solution
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score

# Define the model
model_type = 'RandomForest'
base_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Define cross-validation
n_splits = 5
kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)

# Start a parent run
with mlflow.start_run(run_name=f"cv_{model_type}"):
    # Log parent run parameters
    mlflow.log_param("model_type", model_type)
    mlflow.log_param("n_splits", n_splits)
    mlflow.log_param("dataset", config['dataset'])
    
    # Store fold results
    fold_accuracies = []
    
    # Combine X_train and X_test for full dataset CV
    X_full = np.vstack((X_train, X_test))
    y_full = np.concatenate((y_train, y_test))
    
    # Perform cross-validation
    for fold, (train_idx, val_idx) in enumerate(kf.split(X_full)):
        print(f"\nTraining fold {fold+1}/{n_splits}")
        
        # Get fold data
        X_fold_train, X_fold_val = X_full[train_idx], X_full[val_idx]
        y_fold_train, y_fold_val = y_full[train_idx], y_full[val_idx]
        
        # Create a nested run for this fold
        with mlflow.start_run(run_name=f"fold_{fold+1}", nested=True):
            # Log fold info
            mlflow.log_param("fold", fold+1)
            
            # Train model on fold
            model = clone(base_model)  # Clone to get a fresh model
            model.fit(X_fold_train, y_fold_train)
            
            # Evaluate on validation set
            y_pred = model.predict(X_fold_val)
            accuracy = accuracy_score(y_fold_val, y_pred)
            
            # Log metrics
            mlflow.log_metric("accuracy", accuracy)
            
            # Store for averaging
            fold_accuracies.append(accuracy)
            
            print(f"Fold {fold+1} accuracy: {accuracy:.4f}")
    
    # Log average accuracy in parent run
    avg_accuracy = np.mean(fold_accuracies)
    std_accuracy = np.std(fold_accuracies)
    mlflow.log_metric("mean_accuracy", avg_accuracy)
    mlflow.log_metric("std_accuracy", std_accuracy)
    
    print(f"\nCross-validation results:")
    print(f"Mean accuracy: {avg_accuracy:.4f} ± {std_accuracy:.4f}")

### Exercise 3: Create a Custom Artifact Visualization

Create a custom visualization of your model results and log it as an MLflow artifact.

In [None]:
# Exercise 3: Solution
from sklearn.metrics import confusion_matrix, roc_curve, auc
from sklearn.preprocessing import label_binarize

# Train a model for visualization
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

# Start an MLflow run
with mlflow.start_run(run_name="visualization_example"):
    # Log basic metrics
    accuracy = accuracy_score(y_test, y_pred)
    mlflow.log_metric("accuracy", accuracy)
    
    # 1. Create and log confusion matrix
    plt.figure(figsize=(8, 6))
    cm = confusion_matrix(y_test, y_pred)
    
    # Check if binary or multiclass
    classes = np.unique(y_test)
    
    # Use a colormap
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title('Confusion Matrix')
    plt.colorbar()
    
    # Add labels
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes)
    plt.yticks(tick_marks, classes)
    
    # Add text annotations
    thresh = cm.max() / 2
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            plt.text(j, i, format(cm[i, j], 'd'),
                    horizontalalignment="center",
                    color="white" if cm[i, j] > thresh else "black")
    
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.tight_layout()
    
    # Save and log the figure
    plt.savefig("confusion_matrix.png")
    mlflow.log_artifact("confusion_matrix.png")
    
    # 2. Feature importance visualization
    if hasattr(model, 'feature_importances_'):
        plt.figure(figsize=(10, 6))
        
        # Sort features by importance
        importances = model.feature_importances_
        indices = np.argsort(importances)[::-1]
        
        # Plot the feature importances
        plt.bar(range(min(10, len(importances))), importances[indices][:10])
        plt.title('Top 10 Feature Importances')
        plt.xticks(range(min(10, len(importances))), indices[:10])
        plt.xlabel('Feature Index')
        plt.ylabel('Importance')
        plt.tight_layout()
        
        # Save and log the figure
        plt.savefig("feature_importance.png")
        mlflow.log_artifact("feature_importance.png")
    
    # 3. ROC curve for binary classification
    if len(classes) == 2:
        plt.figure(figsize=(8, 6))
        
        y_proba_positive = y_proba[:, 1]
        fpr, tpr, _ = roc_curve(y_test, y_proba_positive)
        roc_auc = auc(fpr, tpr)
        
        plt.plot(fpr, tpr, label=f'ROC curve (AUC = {roc_auc:.2f})')
        plt.plot([0, 1], [0, 1], 'k--')
        plt.xlim([0.0, 1.0])
        plt.ylim([0.0, 1.05])
        plt.xlabel('False Positive Rate')
        plt.ylabel('True Positive Rate')
        plt.title('Receiver Operating Characteristic (ROC)')
        plt.legend(loc='lower right')
        plt.grid(True, linestyle='--', alpha=0.7)
        
        # Save and log the figure
        plt.savefig("roc_curve.png")
        mlflow.log_artifact("roc_curve.png")
        
        # Also log the AUC as a metric
        mlflow.log_metric("roc_auc", roc_auc)
    
    # Log the model
    mlflow.sklearn.log_model(model, "visualization_model")
    
    print(f"Visualizations created and logged to MLflow")

# Clean up
os.remove("confusion_matrix.png")
if os.path.exists("feature_importance.png"):
    os.remove("feature_importance.png")
if os.path.exists("roc_curve.png"):
    os.remove("roc_curve.png")

## Conclusion

In this tutorial, we've explored MLflow's tracking capabilities for machine learning experiment management:

1. Setting up MLflow for experiment tracking
2. Logging parameters, metrics, and artifacts
3. Tracking different model types (scikit-learn and TensorFlow)
4. Performing hyperparameter tuning with MLflow
5. Finding the best model based on metrics
6. Creating visualizations and custom artifacts
7. Applying best practices for experiment management

MLflow provides a structured approach to experiment tracking that helps you:
- Compare results across multiple runs
- Ensure reproducibility of experiments
- Share results with team members
- Deploy models to production

To learn more, visit the [MLflow documentation](https://mlflow.org/docs/latest/index.html).