<img src="./images/logo.png" alt="Drawing" style="width: 500px;"/>

# **Exercise 5:** Tracking, Registering and Inferencing Models in MLflow

You've trained a model. What's next? You could gather further produce data and train the model further? You could serve the model through an endpoint to allow others and/or frontend applications to use it? Perhaps you'd want to use it as a base model to train a whole new model with a whole new dataset? 

To do any of these requires a deeper dive into ML into MLflow - the most popular and contributed open-source machine learning platform that comes natively installed with **HPE Ezmeral Unified Analytics.** And in this exercise, we'll do just that.

In this exercise, you will learn how to perform the following on MLflow:

- Manage artifacts & metrics on MLflow
- Register the model
- Manage models, including moving them to and from Production staging
- Inference the model using an MLflow endpoint

By the end of this exercise, you will have a firm understanding of the art of Machine Learning Operations (MLOps) with MLflow.

Let's dive in!

<div class="alert alert-block alert-danger">
<b>Important:</b> This exercise requires the completion of Exercise 4:  Building a Image Classification Model with Tensorflow and MLflow.</div>

## **1. Declaring Variables and Importing Libraries**

Let's re-declare the variables related to our MLflow experiement such that we can access them in this exercise.

In [None]:
# Experiment variables for MLflow
experiment_name = "smart-retail"
model_name = "retail-recognition"
artifact_path = "model"

Next, we'll import the necessary libraries. To learn more about these libraries, check out Section 1 of [Exercise 4](./04.model_training.ipynb).

Ignore any warnings that appear.

In [2]:
import mlflow
from mlflow.tracking.client import MlflowClient
from mlflow.entities.model_registry.model_version_status import ModelVersionStatus
from IPython.display import display
from PIL import Image
import numpy as np
from tensorflow.keras.preprocessing.image import load_img,img_to_array
from io import BytesIO

2024-04-02 20:29:41.914546: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-02 20:29:41.920282: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-02 20:29:41.981877: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [None]:
%update_token

## **2. Checking In**

In the last exercise, we trained a model and saved it as an MLflow artifact. Let's go inspect it in MLflow.

1. Navigate back to the Unified Analytics dashboard.
1. In the sidebar navigation menu, select `Data Science` > `Experiments`.
1. The **MLflow Experiments** page will open in a new tab.

Here, you will be able to see all of the experimental **runs** you have executed. 

4. Right click on the Run Name `smart-retail-...`, then click `Show more columns`.

Now, we can compare some of the other **metrics** stored alongside a model, like `accuracy` and `loss`. 

<img src="./images/exercise5/mlflow.PNG" alt="Drawing" style="width: 80%;"/>

<div class="alert alert-block alert-warning">
<b>To Do:</b> New image required.</div>

5. Click on the Run Name to explore it further. 

In this pane, we can see the **parameters** we set up for this model, as well as the **metrics**, all of the **artifacts** associated with run (including the model file and data) and the **Full Path** to the model, which we can call to access the model from this specific run. 

<img src="./images/exercise5/model.PNG" alt="Drawing" style="width: 50%;"/>

<div class="alert alert-block alert-warning">
<b>To Do:</b> New image required.</div>

## **3. Registering a Model in the Model Regsitry**

Back here in the notebook, we're going to learn how to add the model to the MLflow **Model Registry**. A Model Regsitry is a specialized library to store, track, and manage all the different versions of your models.  

Storing your models in a Model Registry has many major advantages, including:

**Model Storage**: Just like a library stores books, a model registry stores trained machine learning models. These models are like the recipes you create to solve specific problems.

**Version Control**:  As you experiment and improve your models, you create new versions. A model registry keeps track of all these different versions, allowing you to compare them and see which one performs best and "rollback" if future experiments give an undesirable output. 

**Documentation**:  In addition to the models themselves, a registry can store important information about each model, like the data it was trained on, its performance metrics, and who created it. This documentation helps everyone understand what the model does and how it was built.

**Collaboration**:  A model registry acts as a central hub for data scientists and engineers working on the same project. They can all access and use the models stored there, making collaboration smoother.

**Deployment**:  Once you've chosen the best model version, the registry can help you deploy it into production, meaning you can use it to make real-world predictions.

Overall, a model registry helps organizations manage the lifecycle of their machine learning models, from creation to deployment. It ensures everyone's on the same page, models are well-documented, and the best versions are easily accessible.

First, let's bring up the runs assosicated with our `smart-retail` experiment and get the ID of the most recent run.

In [None]:
# Search for runs in the specified experiment, ordering by start time in descending order
runs = mlflow.search_runs(experiment_ids=[mlflow.get_experiment_by_name(experiment_name).experiment_id],
                          order_by=["start_time desc"],
                          filter_string="")

# Check if there are any runs
if not runs.empty:
    # Get the run ID of the last active run
    last_run_id = runs.iloc[0]["run_id"]
    print("Last active run ID for experiment '{}': {}".format(experiment_name, last_run_id))
else:
    print("No runs found for experiment '{}'.".format(experiment_name))

We'll use then create a URI for our model to specify where it is using the run ID and artifact path (declared above as just `model`). 

Using this URI, we can register the model in the MLflow Model Registry under a given model name (declared above as `retail-recognition`). 

Then, initiate the creation of a new model version and wait for it to become ready. Continuously check and print the status of the model version until it becomes "READY". Finally, update the description of the registered model.

In [None]:
# set parameters model_uri and model details and register model in mlflow
model_uri = "runs:/{run_id}/{artifact_path}".format(run_id=last_run_id, artifact_path=artifact_path)
model_details = mlflow.register_model(model_uri=model_uri, name=model_name)

Next, we'll define a function that repeatedly checks the status of our model version in the MLflow model registry, updating its description until it becomes "READY", with the loop breaking once the status is achieved.

In [None]:
# Define a function to wait until a specified model version is ready
def wait_until_ready(model_name, model_version):
    # Initialize MLflow client
    client = MlflowClient()
    
    # Iterate a maximum of 10 times to check the status
    for _ in range(10):
        # Get details of the specified model version
        model_version_details = client.get_model_version(
            name=g_model_name,
            version=model_version,
        )
        
        # Convert the status to a readable string and print it
        status = ModelVersionStatus.from_string(model_version_details.status)
        print("Model status: %s" % ModelVersionStatus.to_string(status))
        
        # If the status is "READY", exit the loop
        if status == ModelVersionStatus.READY:
            break
        
        # Wait for one second before checking again
        time.sleep(1)
        
        # Update the description of the registered model
        client.update_registered_model(
            name=model_name, 
            description="Fruit & Vegetables Cashierless Store"
        )

# Call the function with specified parameters
wait_until_ready(model_details.name, model_details.version)


<div class="alert alert-block alert-warning">
<b>To Do:</b> Write steps to check the results of this in MLflow.</div>

### Transition the model to `Production` stage

In [None]:
# create an instance of the MlflowClient
client = MlflowClient()

# Get the latest model created for our experiment
latest_versions = client.get_latest_versions(name=model_name, stages=["None"])
latest_version = latest_versions[0]

# Transition the desired model version to production stage
client.transition_model_version_stage(
  name=model_name,
  version=latest_version.version,
  stage='Production',
)

<div class="alert alert-block alert-warning">
<b>To Do:</b> Write steps to check the results of this in MLflow.</div>

### Transition the other models to `Archived` stage

In [None]:
# Transition model versions to a different stage if their current stage is not "production"
model_versions = client.search_model_versions("")

# Transition model versions to a different stage if their current stage is not "production"
for mv in model_versions:
    if mv.name == model_name:
        if mv.version != latest_version.version:
            client.transition_model_version_stage(
                name=mv.name,
                version=mv.version,
                stage="Archived"
            )
            print(f"Model: {mv.name}, version: {mv.version} has been moved to Archived")

            # Update Model Version Description
            client.update_model_version(
                name=mv.name,
                version=mv.version,
                description="Model Moved to Archived"
            )

<div class="alert alert-block alert-warning">
<b>To Do:</b> Write steps to check the results of this in MLflow.</div>

## **6. Model Testing**

Similar to the previous exercise, we will now test our model - but instead of calling the model from a saved variable in the notebook's memory, this time we will call it from the MLflow Model Registry.

In [None]:
# Get the source URI or location of the model version
logged_model = latest_version.source

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)

In [None]:
def predict(url):
    # Preprocess the image
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    img = img.resize((224, 224))  
    img_array = img_to_array(img)
    img_array = img_array / 255.0
    img_array = np.expand_dims(img_array, axis=0)

    # Make a prediction
    prediction = loaded_model.predict(img_array)   # We update the previous function to use the loaded model here
    predicted_class = np.argmax(prediction, axis=-1)
    result = labels[predicted_class[0]]
    
    # Display the image
    display(img)

    return result

Now, **go out** onto Google Images and find any image with a fruit or vegetable. Replace the `image_url` variable with the link to your image.

In [None]:
# Example usage with an image URL
image_url = ""

predicted_label = predict(image_url)
print("The model predicts: " + str(predicted_label))

Did the model correctly guess what was in your image? If so, great! If not... *still* also great!

As you have now observed through using a Model Registry, **incorrect predictions** from models provide vital feedback that can help you to understand the model's behaviour and improve either the training dataset or the model. Using the MLflow Model Registry, we can compare multiple versions of the same model and understand how tweaking the dataset and model parameters can affect performance.

## **Conclusion**

In a single exercise, you have learned the basics of machine learning - including how to import and prepare a dataset, define training parameters, set up an MLflow experiment to track the training run and trained a model using the Tensorflow library. 

Now, we have a model file that we can run `model.precict` on any new image to detect any fresh produce our retail customers are scanning at the checkout! 

In the next exercise, we will run through MLflow in more detail. Training and fine tuning models is an interative process, so ensuring that our models are appropriately registered and training experiments are tracked not only provides us with new, identifyable base models for which to train new data on, but when our model is deployed to hundreds of retail stores across several countries, we can be sure we're using the right one! **rewrite this based on MLFlow cell above**.

In the next exercise, you will learn to **serve** the latest version of your model from the repository using **Kserve** - making it callable from your own retail application!   