#Python Assignment: MLflow Model Registry for Versioning and Lifecycle Management

This assignment will guide you through the critical process of managing machine learning model versions and their lifecycle using MLflow's Model Registry. You will learn to register models, create new versions, transition them through different stages (Staging, Production), and load specific versions or stages for deployment. This is a fundamental concept in MLOps for ensuring traceability, reproducibility, and controlled deployment of models.

## Part 1: Initial Model Training and Registration (30 points)

We'll start by training a simple model and registering its first version in the MLflow Model Registry.

In [None]:
# 1.1 Install MLflow and other necessary libraries
# Run this cell once to ensure you have the required packages.
# !pip install mlflow scikit-learn pandas

import mlflow
import mlflow.sklearn
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import make_regression
import warnings

warnings.filterwarnings("ignore") # Suppress warnings for cleaner output
np.random.seed(42)

# Set a tracking URI (e.g., to a local 'mlruns' directory)
# For this assignment, we'll use a local folder. You can also point to a remote MLflow server.
mlflow.set_tracking_uri("file:///tmp/mlruns_model_registry") # Using a unique temporary directory

# Define a consistent experiment name for all runs in this assignment
experiment_name = "Model_Registry_Assignment"
mlflow.set_experiment(experiment_name)

print(f"MLflow Tracking URI: {mlflow.get_tracking_uri()}")
print(f"Active Experiment: {experiment_name}")

# 1.2 Generate Dataset
#    Generate a synthetic regression dataset similar to the previous assignment.
#    - `n_samples`: 1000
#    - `n_features`: 5 (3 informative, 2 noisy)
X, y = make_regression(n_samples=1000, n_features=5, n_informative=3, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Dataset X shape: {X.shape}, y shape: {y.shape}")

# 1.3 Train and Register the First Model Version
#    Train a `LinearRegression` model.
#    Log its parameters (`fit_intercept`), metrics (`rmse`, `r2_score`), and the model itself.
#    Crucially, **register this model** to the MLflow Model Registry with the name "RegressionModel".
#    Use `mlflow.sklearn.log_model(..., registered_model_name="RegressionModel")`.

model_name = "RegressionModel"

print(f"\n--- Training and Registering Initial Model ({model_name}) ---")
with mlflow.start_run(run_name="LR_V1_Initial") as run:
    lr_model_v1 = LinearRegression(fit_intercept=True)
    lr_model_v1.fit(X_train, y_train)

    # Log parameters
    mlflow.log_param("model_type", "LinearRegression")
    mlflow.log_param("fit_intercept", lr_model_v1.fit_intercept)

    # Evaluate and log metrics
    lr_predictions_v1 = lr_model_v1.predict(X_test)
    lr_rmse_v1 = np.sqrt(mean_squared_error(y_test, lr_predictions_v1))
    lr_r2_v1 = r2_score(y_test, lr_predictions_v1)
    mlflow.log_metric("rmse", lr_rmse_v1)
    mlflow.log_metric("r2_score", lr_r2_v1)

    # Register the model
    print(f"  Registering '{model_name}' Version 1...")
    run_id_v1 = run.info.run_id
    mlflow.sklearn.log_model(
        sk_model=lr_model_v1,
        artifact_path="model",
        registered_model_name=model_name,
        tags={"description": "Initial Linear Regression model"}
    )

    print(f"  RMSE (V1): {lr_rmse_v1:.4f}")
    print(f"  R2 Score (V1): {lr_r2_v1:.4f}")
    print(f"  Model '{model_name}' (Version 1) registered in run {run_id_v1}.")


## Part 2: Training a New Version and Updating Registration (25 points)

Now, simulate a scenario where you train an improved (or different) model and register it as a new version under the *same* model name. This demonstrates how MLflow automatically handles versioning.

In [None]:
from sklearn.ensemble import RandomForestRegressor

# 2.1 Train and Register a Second Model Version
#    Train a `RandomForestRegressor` model (often performs better than Linear Regression).
#    Log its parameters (e.g., `n_estimators`, `max_depth`), metrics (`rmse`, `r2_score`).
#    **Register this model** under the *same name* "RegressionModel". MLflow will automatically create a new version.

print(f"\n--- Training and Registering New Model Version ({model_name}) ---")
with mlflow.start_run(run_name="RF_V2_Improved") as run:
    # Initialize and train a RandomForestRegressor model with chosen hyperparameters
    # Use different hyperparameters to ensure it's a distinct model
    rf_model_v2 = RandomForestRegressor(n_estimators=150, max_depth=10, random_state=42)
    rf_model_v2.fit(X_train, y_train)

    # Log parameters
    mlflow.log_param("model_type", "RandomForestRegressor")
    mlflow.log_param("n_estimators", rf_model_v2.n_estimators)
    mlflow.log_param("max_depth", rf_model_v2.max_depth)

    # Evaluate and log metrics
    rf_predictions_v2 = rf_model_v2.predict(X_test)
    rf_rmse_v2 = np.sqrt(mean_squared_error(y_test, rf_predictions_v2))
    rf_r2_v2 = r2_score(y_test, rf_predictions_v2)
    mlflow.log_metric("rmse", rf_rmse_v2)
    mlflow.log_metric("r2_score", rf_r2_v2)

    # Register the model - this will create a new version (Version 2)
    print(f"  Registering '{model_name}' Version 2...")
    run_id_v2 = run.info.run_id
    mlflow.sklearn.log_model(
        sk_model=rf_model_v2,
        artifact_path="model",
        registered_model_name=model_name,
        tags={"description": "Improved RandomForest model"}
    )

    print(f"  RMSE (V2): {rf_rmse_v2:.4f}")
    print(f"  R2 Score (V2): {rf_r2_v2:.4f}")
    print(f"  Model '{model_name}' (Version 2) registered in run {run_id_v2}.")


## Part 3: Model Staging and Transitioning (25 points)

Model stages help manage the lifecycle of a model from development to production. You'll use `MlflowClient` to transition models through these stages. Ensure you have the MLflow UI running in a separate terminal (`mlflow ui --backend-store-uri file:///tmp/mlruns_model_registry`) to observe these changes.

In [None]:
from mlflow.tracking import MlflowClient

client = MlflowClient()

# 3.1 Inspect Registered Models
#    Use `client.search_model_versions` to programmatically inspect the registered models and their initial stages.

print(f"\n--- Inspecting Registered Model '{model_name}' Current Stages ---")
for mv in client.search_model_versions(f"name='{model_name}'"):
    print(f"  Version: {mv.version}, Stage: {mv.current_stage}, Run ID: {mv.run_id[:8]}...")


# 3.2 Transition Model Stages
#    Transition Version 1 (Linear Regression) to 'Archived'.
#    Transition Version 2 (Random Forest) to 'Staging'.
#    Then, transition Version 2 from 'Staging' to 'Production'.
#    Confirm the stages in the MLflow UI after each transition.

print(f"\n--- Transitioning Stages for '{model_name}' ---")

print(f"  Transitioning Version 1 to Archived...")
client.transition_model_version_stage(
    name=model_name,
    version=1,
    stage="Archived",
    archive_existing_versions=False # Set to True if you want to archive any other models in this stage
)
print(f"  Model '{model_name}' Version 1 transitioned to Archived.")


print(f"  Transitioning Version 2 to Staging...")
client.transition_model_version_stage(
    name=model_name,
    version=2,
    stage="Staging",
    archive_existing_versions=False
)
print(f"  Model '{model_name}' Version 2 transitioned to Staging.")


print(f"  Transitioning Version 2 to Production...")
client.transition_model_version_stage(
    name=model_name,
    version=2,
    stage="Production",
    archive_existing_versions=False # If you had a different model in Production, this would archive it.
)
print(f"  Model '{model_name}' Version 2 transitioned to Production.")

# Verify states after transitions
print(f"\n--- Final Stages for '{model_name}' ---")
for mv in client.search_model_versions(f"name='{model_name}'"):
    print(f"  Version: {mv.version}, Stage: {mv.current_stage}")


## Part 4: Loading Specific Model Versions/Stages (10 points)

Demonstrate how to load models based on their version number and their current stage, which is crucial for deployment pipelines.

In [None]:
# Create a new data point for prediction (must have same number of features as training data)
new_data_point = np.array([[0.1, 0.2, 0.3, 0.4, 0.5]])

print("\n--- Loading Models by Version and Stage ---")

try:
    # 4.1 Load Model by Version Number
    #    Load "RegressionModel" Version 1 (the Linear Regression model).
    #    Make a prediction and print it.
    print(f"Loading '{model_name}' Version 1 (Archived)...")
    loaded_model_v1 = mlflow.pyfunc.load_model(f"models:/{model_name}/1")
    pred_v1 = loaded_model_v1.predict(new_data_point)
    print(f"  Prediction with Version 1 (Linear Regression): {pred_v1[0]:.2f}")

    # 4.2 Load Model by Stage
    #    Load "RegressionModel" from the 'Production' stage (which should be Version 2, the RandomForest model).
    #    Make a prediction and print it. Confirm it's the RandomForest model (V2).
    print(f"Loading '{model_name}' from Production stage (should be Version 2)...")
    loaded_model_prod = mlflow.pyfunc.load_model(f"models:/{model_name}/Production")
    pred_prod = loaded_model_prod.predict(new_data_point)
    print(f"  Prediction with Production model (RandomForest): {pred_prod[0]:.2f}")

    # Optional: Load the Staging model if any is set to staging
    # print(f"Loading '{model_name}' from Staging stage...")
    # loaded_model_staging = mlflow.pyfunc.load_model(f"models:/{model_name}/Staging")
    # pred_staging = loaded_model_staging.predict(new_data_point)
    # print(f"  Prediction with Staging model: {pred_staging[0]:.2f}")

except Exception as e:
    print(f"Error loading models: {e}")
    print("Please ensure you've run all previous cells and the MLflow UI is accessible and stages are correctly set.")


## Part 5: Reflection & Discussion (10 points)

Answer the following questions in a markdown cell below.

### Your Answers to Reflection Questions:

1.  **What are the key benefits of using MLflow's Model Registry for managing machine learning models?** (List at least 3 benefits)

    * **Benefit 1:** _(Your answer here)_
    * **Benefit 2:** _(Your answer here)_
    * **Benefit 3:** _(Your answer here)_

2.  **Describe a real-world scenario where transitioning models through 'Staging' and 'Production' stages would be crucial.**

    _(Your answer here)_

3.  **Imagine you have a bug in your 'Production' model (Version 2). How would you use the Model Registry to quickly revert to a previous, stable version (Version 1)?**

    _(Your answer here)_

4.  **What are some considerations or challenges when integrating the MLflow Model Registry into a CI/CD (Continuous Integration/Continuous Deployment) pipeline?**

    _(Your answer here)_


## Deliverables:

1.  This completed Jupyter Notebook (`mlflow_model_registry_assignment.ipynb`) with all code cells executed and reflection questions answered.
2.  Screenshots from the MLflow UI demonstrating:
    -   The registered models and their versions (showing both V1 and V2).
    -   The transitions of models through different stages (Archived for V1, Production for V2).
    (You can include these screenshots in separate markdown cells, or provide them as a separate file.)
