# MLflow Integration Demo

This notebook demonstrates how to use MLflow for experiment tracking, model comparison, and model registry in the GitHub Activity Forecasting project.

## MLflow Components Covered:
1. **Experiment Tracking** - Query runs and compare metrics
2. **Parameter Analysis** - Compare hyperparameters across runs
3. **Model Registry** - Access registered models
4. **Artifact Management** - Load models and artifacts
5. **Visualization** - Compare model performance

## Prerequisites:
```bash
# Start MLflow UI (in separate terminal)
mlflow ui --backend-store-uri mlruns
# Access at: http://localhost:5000
```

In [None]:
import mlflow
import mlflow.pytorch
import mlflow.sklearn
from mlflow.tracking import MlflowClient
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

# Initialize MLflow client
mlflow.set_tracking_uri('mlruns')
client = MlflowClient()

print("‚úì MLflow client initialized")
print(f"Tracking URI: {mlflow.get_tracking_uri()}")

## 1. List All Experiments

MLflow organizes runs into experiments. Each experiment groups related model training runs.

In [None]:
# List all experiments
experiments = client.search_experiments()

print(f"Total Experiments: {len(experiments)}\n")
for exp in experiments:
    print(f"üìä {exp.name}")
    print(f"   ID: {exp.experiment_id}")
    print(f"   Artifact Location: {exp.artifact_location}")
    print(f"   Lifecycle Stage: {exp.lifecycle_stage}")
    print()

## 2. Query Forecasting Experiment Runs

Retrieve all training runs from the forecasting experiment and compare their performance.

In [None]:
# Get forecasting experiment
forecast_exp = client.get_experiment_by_name('forecasting-models')

if forecast_exp:
    # Search runs in this experiment
    runs = client.search_runs(
        experiment_ids=[forecast_exp.experiment_id],
        order_by=['metrics.best_dev_loss ASC'],
        max_results=20
    )
    
    print(f"Found {len(runs)} forecasting runs\n")
    
    # Create DataFrame for analysis
    run_data = []
    for run in runs:
        run_data.append({
            'run_id': run.info.run_id,
            'run_name': run.data.tags.get('mlflow.runName', 'N/A'),
            'model_type': run.data.params.get('model_type', 'N/A'),
            'hidden_size': int(run.data.params.get('hidden_size', 0)),
            'num_layers': int(run.data.params.get('num_layers', 0)),
            'best_dev_loss': run.data.metrics.get('best_dev_loss', np.nan),
            'final_train_loss': run.data.metrics.get('final_train_loss', np.nan),
            'training_time_min': run.data.metrics.get('training_time_minutes', np.nan),
            'start_time': pd.to_datetime(run.info.start_time, unit='ms')
        })
    
    df_runs = pd.DataFrame(run_data)
    df_runs = df_runs.sort_values('best_dev_loss')
    
    print("\nüèÜ Top 5 Forecasting Models by Validation Loss:")
    print(df_runs.head()[['run_name', 'model_type', 'hidden_size', 'best_dev_loss', 'training_time_min']])
else:
    print("No forecasting experiment found. Run training first!")

## 3. Visualize Model Comparison

Compare different models across key metrics.

In [None]:
if 'df_runs' in locals() and not df_runs.empty:
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Plot 1: Validation Loss by Model Type
    df_plot = df_runs.dropna(subset=['best_dev_loss'])
    if not df_plot.empty:
        sns.barplot(data=df_plot.head(10), x='run_name', y='best_dev_loss', 
                   hue='model_type', ax=axes[0])
        axes[0].set_title('Best Validation Loss by Model', fontsize=14, fontweight='bold')
        axes[0].set_xlabel('Run Name')
        axes[0].set_ylabel('Validation Loss (MSE)')
        axes[0].tick_params(axis='x', rotation=45)
        axes[0].legend(title='Model Type')
    
    # Plot 2: Training Time vs Performance
    df_plot2 = df_runs.dropna(subset=['best_dev_loss', 'training_time_min'])
    if not df_plot2.empty:
        for model_type in df_plot2['model_type'].unique():
            data = df_plot2[df_plot2['model_type'] == model_type]
            axes[1].scatter(data['training_time_min'], data['best_dev_loss'], 
                          label=model_type, s=100, alpha=0.6)
        axes[1].set_title('Training Time vs Performance', fontsize=14, fontweight='bold')
        axes[1].set_xlabel('Training Time (minutes)')
        axes[1].set_ylabel('Validation Loss (MSE)')
        axes[1].legend(title='Model Type')
        axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("No data available for visualization")

## 4. Query Classification Experiment Runs

Analyze activity classification models.

In [None]:
# Get classification experiment
class_exp = client.get_experiment_by_name('activity-classification')

if class_exp:
    runs = client.search_runs(
        experiment_ids=[class_exp.experiment_id],
        order_by=['metrics.test_f1 DESC'],
        max_results=20
    )
    
    print(f"Found {len(runs)} classification runs\n")
    
    # Create DataFrame
    class_data = []
    for run in runs:
        class_data.append({
            'run_id': run.info.run_id,
            'model_type': run.data.params.get('model_type', 'N/A'),
            'precision': run.data.metrics.get('test_precision', np.nan),
            'recall': run.data.metrics.get('test_recall', np.nan),
            'f1': run.data.metrics.get('test_f1', np.nan),
            'roc_auc': run.data.metrics.get('test_roc_auc', np.nan),
            'pr_auc': run.data.metrics.get('test_pr_auc', np.nan)
        })
    
    df_class = pd.DataFrame(class_data)
    
    print("\nüèÜ Classification Models Performance:")
    print(df_class[['model_type', 'precision', 'recall', 'f1', 'roc_auc']].round(4))
else:
    print("No classification experiment found. Run training first!")

## 5. Visualize Classification Metrics

Compare classification models across multiple metrics.

In [None]:
if 'df_class' in locals() and not df_class.empty:
    # Prepare data for radar chart
    metrics = ['precision', 'recall', 'f1', 'roc_auc', 'pr_auc']
    df_plot = df_class[['model_type'] + metrics].dropna()
    
    if not df_plot.empty:
        fig, ax = plt.subplots(figsize=(10, 6))
        
        x = np.arange(len(metrics))
        width = 0.25
        
        for i, (idx, row) in enumerate(df_plot.iterrows()):
            values = [row[m] for m in metrics]
            ax.bar(x + i * width, values, width, label=row['model_type'])
        
        ax.set_xlabel('Metrics', fontweight='bold')
        ax.set_ylabel('Score', fontweight='bold')
        ax.set_title('Classification Model Comparison', fontsize=14, fontweight='bold')
        ax.set_xticks(x + width)
        ax.set_xticklabels(metrics)
        ax.legend(title='Model Type')
        ax.set_ylim(0, 1)
        ax.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
else:
    print("No classification data available")

## 6. Access Model Registry

View registered models and their versions.

In [None]:
# List all registered models
registered_models = client.search_registered_models()

print(f"Total Registered Models: {len(registered_models)}\n")

for model in registered_models:
    print(f"üì¶ {model.name}")
    print(f"   Description: {model.description}")
    print(f"   Latest Versions:")
    
    # Get model versions
    versions = client.search_model_versions(f"name='{model.name}'")
    for version in versions[:3]:  # Show top 3 versions
        print(f"     v{version.version} - Stage: {version.current_stage}")
        print(f"     Run ID: {version.run_id}")
        print(f"     Source: {version.source}")
    print()

## 7. Load a Model from Registry

Load a production model for inference.

In [None]:
# Example: Load a forecasting model
try:
    # Load latest version of LSTM forecaster
    model_name = "forecaster-lstm"
    model_uri = f"models:/{model_name}/latest"
    
    print(f"Loading model: {model_name}")
    loaded_model = mlflow.pytorch.load_model(model_uri)
    
    print(f"‚úì Model loaded successfully!")
    print(f"Model type: {type(loaded_model)}")
    print(f"\nModel can now be used for predictions")
    
except Exception as e:
    print(f"Model not found: {e}")
    print("Run training with --model lstm first to register a model")

## 8. Compare Specific Runs

Deep dive into comparing two or more specific runs.

In [None]:
if 'df_runs' in locals() and len(df_runs) >= 2:
    # Get top 2 runs
    top_runs = df_runs.head(2)
    
    print("Comparing Top 2 Runs:\n")
    
    for idx, row in top_runs.iterrows():
        run_id = row['run_id']
        run = client.get_run(run_id)
        
        print(f"{'='*60}")
        print(f"Run: {row['run_name']}")
        print(f"Run ID: {run_id}")
        print(f"\nParameters:")
        for key, value in sorted(run.data.params.items()):
            print(f"  {key}: {value}")
        
        print(f"\nMetrics:")
        for key, value in sorted(run.data.metrics.items()):
            print(f"  {key}: {value:.6f}")
        print()
else:
    print("Not enough runs available for comparison")

## 9. Get Best Model Across All Experiments

Find the best performing model based on validation loss.

In [None]:
# Search across all experiments
all_experiments = client.search_experiments()
all_exp_ids = [exp.experiment_id for exp in all_experiments]

# Find best forecasting run
best_runs = client.search_runs(
    experiment_ids=all_exp_ids,
    filter_string="metrics.best_dev_loss > 0",
    order_by=['metrics.best_dev_loss ASC'],
    max_results=1
)

if best_runs:
    best_run = best_runs[0]
    print("üèÜ BEST FORECASTING MODEL:\n")
    print(f"Run ID: {best_run.info.run_id}")
    print(f"Run Name: {best_run.data.tags.get('mlflow.runName')}")
    print(f"Model Type: {best_run.data.params.get('model_type')}")
    print(f"Best Validation Loss: {best_run.data.metrics.get('best_dev_loss'):.6f}")
    print(f"\nKey Parameters:")
    key_params = ['hidden_size', 'num_layers', 'dropout', 'learning_rate']
    for param in key_params:
        if param in best_run.data.params:
            print(f"  {param}: {best_run.data.params[param]}")
else:
    print("No runs found")

## 10. Export Results for Presentation

Create summary tables for project report.

In [None]:
# Create comprehensive summary
summary = []

if 'df_runs' in locals() and not df_runs.empty:
    for _, row in df_runs.head(5).iterrows():
        summary.append({
            'Experiment': 'Forecasting',
            'Model': row['model_type'],
            'Config': f"h={row['hidden_size']}, l={row['num_layers']}",
            'Metric': 'Best Val Loss',
            'Value': f"{row['best_dev_loss']:.6f}",
            'Time (min)': f"{row['training_time_min']:.2f}"
        })

if 'df_class' in locals() and not df_class.empty:
    for _, row in df_class.iterrows():
        summary.append({
            'Experiment': 'Classification',
            'Model': row['model_type'],
            'Config': 'default',
            'Metric': 'F1-Score',
            'Value': f"{row['f1']:.4f}",
            'Time (min)': 'N/A'
        })

if summary:
    df_summary = pd.DataFrame(summary)
    print("\nüìä MODEL PERFORMANCE SUMMARY")
    print("="*70)
    print(df_summary.to_string(index=False))
    
    # Save to CSV
    output_path = 'mlflow_results_summary.csv'
    df_summary.to_csv(output_path, index=False)
    print(f"\n‚úì Summary exported to: {output_path}")
else:
    print("No data available for summary")

## Summary

### MLflow Benefits Demonstrated:

1. **Experiment Tracking**: All training runs are automatically logged
2. **Model Comparison**: Easy comparison across hyperparameters and architectures
3. **Model Registry**: Centralized model versioning and stage management
4. **Reproducibility**: Complete parameter and metric tracking
5. **Visualization**: Built-in UI and programmatic access for analysis

### For Presentation:

- Show MLflow UI: `mlflow ui --backend-store-uri mlruns`
- Highlight experiment comparison charts
- Demonstrate model loading from registry
- Emphasize reproducibility and tracking benefits

### Next Steps:

1. Train multiple models with different hyperparameters
2. Compare results in MLflow UI
3. Promote best model to Production stage
4. Use registered models for deployment