# SEIR Model MLflow Demo

This notebook demonstrates how to use MLflow to track experiments, parameters, metrics, and artifacts for the SEIR compartmental model. It also shows how to register models in the MLflow Model Registry for versioning.

This demo includes:

- Loading COVID-19 vaccination data and creating a data model
- Setting up MLflow experiment tracking
- Running multiple SEIR model experiments with different parameters
- Tracking parameters, metrics, and artifacts
- Registering the best model in the MLflow Model Registry

## Setup and Imports

In [None]:
# Import required libraries
import os
import datetime
import pickle
import numpy as np
import pandas as pd
from io import BytesIO
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from scipy.interpolate import interp1d

# Import MLflow
import mlflow
import mlflow.pyfunc

# Import the SEIR model
from seir_model import SEIRModel

# Import the dataset
from cfa.scenarios.dataops.datasets import datasets

## Set up MLflow

Configure MLflow to track experiments locally. We'll create a new experiment for our SEIR model runs.

In [None]:
# Create directories for MLflow artifacts and data
os.makedirs('mlruns', exist_ok=True)
os.makedirs('data', exist_ok=True)
os.makedirs('models', exist_ok=True)

# Set the tracking URI to the local directory
mlflow.set_tracking_uri("file:./mlruns")

# Create or set the experiment
experiment_name = "SEIR-Model-Experiments"
mlflow.set_experiment(experiment_name)

# Print the experiment info
experiment = mlflow.get_experiment_by_name(experiment_name)
print(f"Experiment ID: {experiment.experiment_id}")
print(f"Artifact Location: {experiment.artifact_location}")

## Vaccination Data Model

Load and preprocess the COVID-19 vaccination data, similar to the original demo.

In [None]:
# load the vaccination data
df = pd.read_parquet('data/covid19vax_trends_us.parquet')
df.head()

In [None]:
# Plot the vaccination data
plt.figure(dpi=100, figsize=(8, 4))
plt.plot(df.date, df.vax_frac, 'r.', alpha=0.25)
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=(1, 7)))
ax.xaxis.set_minor_locator(mdates.MonthLocator())
plt.xlabel('date')
plt.ylabel("population vaccination fraction [US]")
plt.grid(ls='--')
plt.show()

In [None]:
# Save the data to a parquet file
df.to_parquet('data/covid19vax_trends_us.parquet')

## Creating Data Interpolation Model

Create a model for vaccination rate interpolation, similar to the original demo.

In [None]:
class VaxTrends:
    def __init__(self, start_date: str):
        df = pd.read_parquet('data/covid19vax_trends_us.parquet')
        df.date = pd.to_datetime(df.date).dt.date
        self.start_date = datetime.datetime.strptime(start_date, '%Y-%m-%d').date()
        self.dates = df.date.to_list()
        self.vax_frax = df.vax_frac.to_list()
        self.days = [i.days for i in (df.date - self.start_date).to_list()]
        self.function = interp1d(self.days, self.vax_frax, kind='linear') 

    def __call__(self, t: float):
        if t < max(self.days):
            return float(self.function(t))
        else:
            return float(self.vax_frax[-1])
        
    def save(self, filename: str):
        with open(filename, 'wb') as f:
            pickle.dump(self, f)

In [None]:
# Create the VaxTrends model
v = VaxTrends('2021-01-01')
v.save('models/vax_trends.pkl')  # save the object to a pickle file

# Test the model
print(f"Vaccination fraction at day 50: {v(50):.4f}")

## Define a PyFunc MLflow Model for SEIR

Create a wrapper class for the SEIR model that can be logged as an MLflow model.

In [None]:
class SEIRModelWrapper(mlflow.pyfunc.PythonModel):
    def __init__(self, seir_model):
        self.seir_model = seir_model
        
    def predict(self, context, model_input):
        """
        Run a simulation with the SEIR model.
        
        Args:
            context: MLflow model context
            model_input: DataFrame with columns:
                - t_start: Start time
                - t_end: End time
                - t_points: Number of time points
                - S0, E0, I0, R0: Initial conditions
                
        Returns:
            DataFrame with simulation results
        """
        # Extract parameters from input
        row = model_input.iloc[0]
        t_span = (row['t_start'], row['t_end'])
        t_points = int(row['t_points'])
        initial_conditions = [row['S0'], row['E0'], row['I0'], row['R0']]
        
        # Run simulation
        t, y = self.seir_model.simulate(t_span, initial_conditions, t_points)
        
        # Create result DataFrame
        result = pd.DataFrame({
            'time': t,
            'S': y[0],
            'E': y[1],
            'I': y[2],
            'R': y[3]
        })
        
        return result

## Run SEIR Model Experiments with MLflow Tracking

Run multiple experiments with different parameter sets and track them with MLflow.

In [None]:
# Define a function to calculate metrics from simulation results
def calculate_metrics(t, y):
    S, E, I, R = y
    
    # Calculate peak infectious
    peak_infectious = np.max(I)
    peak_time = t[np.argmax(I)]
    
    # Calculate final recovered fraction
    final_recovered = R[-1]
    
    # Calculate total infected (final recovered)
    total_infected = final_recovered
    
    return {
        "peak_infectious": peak_infectious,
        "peak_time": peak_time,
        "final_recovered": final_recovered,
        "total_infected": total_infected
    }

In [None]:
# Define a function to plot and save the simulation results
def plot_simulation(t, y, vax_fraction, title, filename):
    S, E, I, R = y
    
    plt.figure(dpi=100, figsize=(10, 6))
    
    # Plot the S, E, I, R states
    plt.plot(t, [vax_fraction(i) for i in t], 'k--', label='Vaccination Fraction', alpha=0.5)
    plt.plot(t, S, 'b-', label='Susceptible')
    plt.plot(t, E, 'c-', label='Exposed')
    plt.plot(t, I, 'r-', label='Infectious')
    plt.plot(t, R, 'm-', label='Recovered')
    
    plt.grid(ls='--')
    plt.xlabel('Time (days)')
    plt.ylabel('Population Fraction')
    plt.title(title)
    plt.legend(fontsize=9)
    
    plt.savefig(filename)
    plt.close()
    
    return filename

In [None]:
# Define parameter sets for experiments
experiment_params = [
    {
        "name": "Baseline",
        "beta": 0.3,
        "sigma": 0.2,
        "gamma": 0.1,
        "vax_eff": 0.7,
        "initial_conditions": [0.90, 0.02, 0.04, 0.04]
    },
    {
        "name": "High Transmission",
        "beta": 0.5,
        "sigma": 0.2,
        "gamma": 0.1,
        "vax_eff": 0.7,
        "initial_conditions": [0.90, 0.02, 0.04, 0.04]
    },
    {
        "name": "Low Transmission",
        "beta": 0.2,
        "sigma": 0.2,
        "gamma": 0.1,
        "vax_eff": 0.7,
        "initial_conditions": [0.90, 0.02, 0.04, 0.04]
    },
    {
        "name": "High Recovery",
        "beta": 0.3,
        "sigma": 0.2,
        "gamma": 0.2,
        "vax_eff": 0.7,
        "initial_conditions": [0.90, 0.02, 0.04, 0.04]
    },
    {
        "name": "High Vaccine Efficacy",
        "beta": 0.3,
        "sigma": 0.2,
        "gamma": 0.1,
        "vax_eff": 0.9,
        "initial_conditions": [0.90, 0.02, 0.04, 0.04]
    }
]

In [None]:
# Run experiments and track with MLflow
best_model_run_id = None
lowest_peak_infectious = float('inf')

for params in experiment_params:
    # Start an MLflow run
    with mlflow.start_run(run_name=params["name"]) as run:
        run_id = run.info.run_id
        print(f"\nRunning experiment: {params['name']} (Run ID: {run_id})")
        
        # Log parameters
        mlflow.log_param("beta", params["beta"])
        mlflow.log_param("sigma", params["sigma"])
        mlflow.log_param("gamma", params["gamma"])
        mlflow.log_param("vax_eff", params["vax_eff"])
        mlflow.log_param("S0", params["initial_conditions"][0])
        mlflow.log_param("E0", params["initial_conditions"][1])
        mlflow.log_param("I0", params["initial_conditions"][2])
        mlflow.log_param("R0", params["initial_conditions"][3])
        
        # Create the SEIR model with the specified parameters
        model = SEIRModel(
            population_size=1.0,
            beta=params["beta"],
            sigma=params["sigma"],
            gamma=params["gamma"],
            vax_fraction=v,
            vax_eff=params["vax_eff"],
            version=f"0.1.0-{params['name']}"
        )
        
        # Run the simulation
        t_span = (0, 100)
        t_points = 1000
        t, y = model.simulate(t_span, params["initial_conditions"], t_points)
        
        # Calculate metrics
        metrics = calculate_metrics(t, y)
        
        # Log metrics
        for metric_name, metric_value in metrics.items():
            mlflow.log_metric(metric_name, metric_value)
            print(f"{metric_name}: {metric_value:.4f}")
        
        # Create and log the plot
        plot_file = plot_simulation(
            t, y, v, 
            f"SEIR Model - {params['name']}", 
            f"seir_plot_{params['name'].lower().replace(' ', '_')}.png"
        )
        mlflow.log_artifact(plot_file)
        
        # Save the model to a pickle file
        model_file = f"models/seir_model_{params['name'].lower().replace(' ', '_')}.pkl"
        model.save(model_file)
        mlflow.log_artifact(model_file)
        
        # Log the model as an MLflow model
        mlflow_model = SEIRModelWrapper(model)
        mlflow.pyfunc.log_model(
            "seir_model",
            python_model=mlflow_model,
            code_path=["seir_model.py"],
            artifacts={"model_pickle": model_file}
        )
        
        # Check if this is the best model (lowest peak infectious)
        if metrics["peak_infectious"] < lowest_peak_infectious:
            lowest_peak_infectious = metrics["peak_infectious"]
            best_model_run_id = run_id
            print(f"New best model found! Peak infectious: {lowest_peak_infectious:.4f}")

print(f"\nBest model run ID: {best_model_run_id}")
print(f"Lowest peak infectious: {lowest_peak_infectious:.4f}")

## Register the Best Model in the MLflow Model Registry

Register the model with the lowest peak infectious value in the MLflow Model Registry.

In [None]:
# Register the best model in the MLflow Model Registry
if best_model_run_id:
    model_uri = f"runs:/{best_model_run_id}/seir_model"
    model_name = "SEIR-Compartmental-Model"
    
    # Register the model
    model_details = mlflow.register_model(model_uri, model_name)
    
    print(f"Registered model: {model_details.name}")
    print(f"Version: {model_details.version}")
    
    # Add a description to the model version
    client = mlflow.tracking.MlflowClient()
    client.update_model_version(
        name=model_details.name,
        version=model_details.version,
        description="SEIR compartmental model with the lowest peak infectious value"
    )

## Load and Use a Registered Model

Demonstrate how to load a registered model from the MLflow Model Registry and use it for predictions.

In [None]:
# Load the registered model
model_name = "SEIR-Compartmental-Model"
loaded_model = mlflow.pyfunc.load_model(f"models:/{model_name}/latest")

# Create input for prediction
input_data = pd.DataFrame([
    {
        "t_start": 0,
        "t_end": 150,
        "t_points": 1500,
        "S0": 0.95,
        "E0": 0.01,
        "I0": 0.02,
        "R0": 0.02
    }
])

# Make prediction
result = loaded_model.predict(input_data)

# Plot the results
plt.figure(dpi=100, figsize=(10, 6))
plt.plot(result['time'], result['S'], 'b-', label='Susceptible')
plt.plot(result['time'], result['E'], 'c-', label='Exposed')
plt.plot(result['time'], result['I'], 'r-', label='Infectious')
plt.plot(result['time'], result['R'], 'm-', label='Recovered')

plt.grid(ls='--')
plt.xlabel('Time (days)')
plt.ylabel('Population Fraction')
plt.title('Prediction from Registered SEIR Model')
plt.legend(fontsize=9)
plt.show()

## Comparing Multiple Runs

Retrieve and compare metrics from multiple runs.

In [None]:
# Get all runs for the experiment
client = mlflow.tracking.MlflowClient()
experiment = mlflow.get_experiment_by_name(experiment_name)
runs = client.search_runs(experiment_ids=[experiment.experiment_id])

# Extract run names and metrics
run_data = []
for run in runs:
    run_data.append({
        "run_id": run.info.run_id,
        "run_name": run.data.tags.get("mlflow.runName", "Unknown"),
        "peak_infectious": run.data.metrics.get("peak_infectious", 0),
        "peak_time": run.data.metrics.get("peak_time", 0),
        "total_infected": run.data.metrics.get("total_infected", 0),
        "beta": run.data.params.get("beta", 0),
        "gamma": run.data.params.get("gamma", 0),
        "vax_eff": run.data.params.get("vax_eff", 0)
    })

# Create a DataFrame for comparison
comparison_df = pd.DataFrame(run_data)
comparison_df.sort_values(by="peak_infectious", inplace=True)
comparison_df

In [None]:
# Plot comparison of peak infectious values
plt.figure(dpi=100, figsize=(10, 6))
plt.bar(comparison_df["run_name"], comparison_df["peak_infectious"])
plt.xlabel("Experiment")
plt.ylabel("Peak Infectious Fraction")
plt.title("Comparison of Peak Infectious Values Across Experiments")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()

## Conclusion

In this notebook, we demonstrated how to use MLflow to track experiments, parameters, metrics, and artifacts for the SEIR compartmental model. We also showed how to register models in the MLflow Model Registry for versioning.

Key points:
1. MLflow provides a simple way to track experiments and compare results
2. The Model Registry enables versioning and management of models
3. Tracking parameters and metrics helps identify the best model configuration
4. Artifacts like plots and saved models can be stored and retrieved easily

To view the MLflow UI and explore the experiments in more detail, run the following command in the terminal:
```
mlflow ui --backend-store-uri file:./mlruns
```