## MLflow Integration for Model Serving and Registry Management

In this notebook, we delve into advanced aspects of MLflow, focusing on model serving, inference, and the management of model versions in the MLflow model registry. Our goal is to demonstrate how MLflow supports the operational phase of the machine learning lifecycle, which includes serving models for inference and efficiently managing multiple versions of models.

We will explore the practical application of these concepts using a text classification model. This will include loading models for inference, performing predictions, managing different versions of models, and understanding how to transition models through various stages in the model lifecycle. These skills are essential for operational efficiency and effective model management in real-world machine learning applications, aligning with the core themes of our course on MLops and experiment tracking.


### Objective:
* Loading and Serving Models
* Inference with the Model
* Managing Model Versions
* Deleting Models and Versions

### Environment Setup

Ensure all necessary libraries are installed and imported for our workflow.

In [None]:
#!pip install mlflow torch transformers

### Imports

Import necessary libraries focusing on MLflow for model retrieval, PyTorch for model operations, and Transformers for data processing.

In [None]:
import mlflow
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import os

### Connect to Mlflow Server

In [None]:
# Set MLflow tracking URI
mlflow.set_tracking_uri("http://localhost:5000")
client = mlflow.tracking.MlflowClient()

### Retrieve the Model from MLflow

In this step, we'll explore two methods to retrieve our trained model from MLflow. Understanding the nuances of each method is key to making an informed choice in a real-life scenario based on the requirements and constraints of your deployment environment.

#### Method 1: Using the Built-in PyTorch Loader

This method is straightforward and uses MLflow's built-in functionality to load PyTorch models. It's user-friendly and works well when you're working within a PyTorch-centric workflow.


In [None]:
# Load a specific model version
model_name = "agnews_pt_classifier"
model_version = "1"  # or "production", "staging"


model_uri = f"models:/{model_name}/{model_version}"
model = mlflow.pytorch.load_model(model_uri)

## Performing Inference

Here, we define the `predict` function to perform inference using the loaded model. This function takes a list of texts, tokenizes them using a pre-trained tokenizer, and then feeds them into the model. The output is the model's prediction, which can be used for various applications such as text classification, sentiment analysis, etc. This step is crucial in demonstrating how a trained model can be utilized for practical applications.


In [None]:

def predict(texts, model, tokenizer):
    # Tokenize the texts
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt").to(model.device)

    # Pass the inputs to the model
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=-1)

    # Convert predictions to text labels
    predictions = predictions.cpu().numpy()
    predictions = [model.config.id2label[prediction] for prediction in predictions]

    # Print predictions
    return predictions


In [None]:
# Sample text to predict
texts = [
    "The local high school soccer team triumphed in the state championship, securing victory with a last-second winning goal.",
    "DataCore is set to acquire startup InnovateAI for $2 billion, aiming to enhance its position in the artificial intelligence market.",
]


In [None]:
# Tokenizer needs to be loaded sepparetly for this
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

print(predict(texts, model, tokenizer))


#### Method 2: Versatile Loading with Custom Handling

This alternate method is more versatile and can handle different types of models. It's particularly useful when you're working with a variety of models or when the environment requires a more customized approach.

In [None]:

# Load custom model
model_name = "agnews-transformer"
model_version = "1"  # or "production", "pstaging"
model_version_details = client.get_model_version(name=model_name, version=model_version)

run_id = model_version_details.run_id
artifact_path = model_version_details.source

# Construct the model URI
model_uri = f"models:/{model_name}/{model_version}"

model_path = "models/agnews_transformer"
os.makedirs(model_path, exist_ok=True)

client.download_artifacts(run_id, artifact_path, dst_path=model_path)

In [None]:
# Load the model and tokenizer
custom_model = AutoModelForSequenceClassification.from_pretrained("models/agnews_transformer/custom_model")
tokenizer = AutoTokenizer.from_pretrained("models/agnews_transformer/custom_model")


In [None]:
# Do the inference
print(predict(texts, custom_model, tokenizer))

## Demonstrating Model Versioning with MLflow

One of the powerful features of MLflow is its ability to manage multiple versions of models. In this section, we log new iterations of our model to showcase this versioning capability. By setting a new experiment and logging models under different run names, we effectively create multiple versions of the same model. This is a crucial aspect of MLOps, as it allows for tracking the evolution of models over time, comparing different iterations, and systematically managing the model lifecycle. We demonstrate this by logging two additional iterations of our model, tagged as "iteration2" and "iteration3".


In [None]:
# Log some new models for versioning demonstration
mlflow.set_experiment("sequence_classification")

# Log a new model as iteration 2
with mlflow.start_run(run_name="iteration2"):
    mlflow.pytorch.log_model(model, "model")

# Log another new model as iteration 3
with mlflow.start_run(run_name="iteration3"):
    mlflow.pytorch.log_model(model, "model")


## Performing Inference

Here, we define the `predict` function to perform inference using the loaded model. This function takes a list of texts, tokenizes them using a pre-trained tokenizer, and then feeds them into the model. The output is the model's prediction, which can be used for various applications such as text classification, sentiment analysis, etc. This step is crucial in demonstrating how a trained model can be utilized for practical applications.


In [None]:
# Model version management
model_versions = client.search_model_versions(f"name='{model_name}'")
for version in model_versions:
    print(f"Version: {version.version}, Stage: {version.current_stage}")

# Change model stage
client.transition_model_version_stage(name=model_name, version=model_version, stage="Production")


## Cleaning Up: Deleting Models and Versions

In some scenarios, you might need to delete specific model versions or even entire registered models from MLflow. This section covers how to perform these deletions. Note that this should be done cautiously, as it cannot be undone. This is particularly useful for maintaining a clean and efficient model registry by removing outdated or unused models and versions.


In [None]:
# Delete a specific model version
client.delete_model_version(name=model_name, version=model_version)

# Delete the entire registered model
client.delete_registered_model(name=model_name)
