# MLflow Model Management 
After learning about the basic handling of MLflow in the previous exercise on Experiemnt Tracking, we will now take a look at MLflow's model management in a second, smaller exercise. 
## Exercise Overview
We will use the code and use case from the previous exercise. We will train a model again and add it to the registry via the web UI. We will the customize our code to automatically add a model to the registry when certain criteria are met. We add tags, metadata and an alias to a registered model and finally load a model from the registry
## 0) - Execute code again
Execute the following cells so that the code we want to use is available again. 

In [None]:
import time
from typing import Tuple

import matplotlib.pyplot as plt
import mlflow
import numpy as np
import pandas as pd
from sklearn.metrics import (
    ConfusionMatrixDisplay,
    classification_report,
    confusion_matrix,
    precision_score,
)
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from xgboost.callback import TrainingCallback

In [None]:
mlflow_tracking_uri = "http://mlflow:5001"
mlflow.set_tracking_uri(mlflow_tracking_uri)

In [None]:
try:
    exp_id = mlflow.create_experiment(
        name="Spotify genre classification | Model Management"
    )

except mlflow.exceptions.RestException:
    exp_id = mlflow.get_experiment_by_name(
        name="Spotify genre classification | Model Management"    
    ).experiment_id

In [None]:
def split_data() -> Tuple[pd.DataFrame, pd.DataFrame, pd.Series, pd.Series, pd.Series]:
    data = pd.read_csv("./data/genres_standardized.csv", sep=";")
    columns = list(data.columns)
    columns.remove("genre")
    data["genre"] = data["genre"].astype("category")
    data["target"] = data["genre"].cat.codes
    test_size = 0.2
    mlflow.log_param(key="Test size", value=test_size)
    X_train, X_test, y_train, y_test = train_test_split(
        data[columns], data["target"], test_size=test_size
    )
    return X_train, X_test, y_train, y_test, data["genre"]

In [None]:
class MlflowCallback(TrainingCallback):
    def after_iteration(self, model, epoch, evals_log) -> bool:
        for data, metric in evals_log.items():
            for metric_name, log in metric.items():
                metric_value = sum(log) / len(log)
                mlflow.log_metric(
                    key=metric_name, value=metric_value, step=epoch
                )
        return False

In [None]:
def train_classifier(
    input_train: pd.DataFrame, target_train: pd.Series
) -> XGBClassifier:
    number_of_estimators: int = 100
    learning_rate: float = 0.1
    max_depth: int = 8
    min_child_weight: float = 1.0
    gamma: float = 0
    number_of_jobs: int = 4

    model = XGBClassifier(
        learning_rate=learning_rate,
        n_estimators=number_of_estimators,
        max_depth=max_depth,
        min_child_weight=min_child_weight,
        gamma=gamma,
        n_jobs=number_of_jobs,
        callbacks=[MlflowCallback()],
    )
    model.fit(
        input_train, target_train, eval_set=[(input_train, target_train)], verbose=False
    )

    return model

In [None]:
def predict(classifier: XGBClassifier, input_test: pd.DataFrame) -> np.ndarray:
    predictions = classifier.predict(input_test)
    return predictions

In [None]:
def analyze(
    target_test: pd.Series,
    predictions: np.ndarray,
    target_names: pd.Series,
):
    category_labels = target_names.cat.categories
    fig, ax = plt.subplots(figsize=(10, 10))
    ConfusionMatrixDisplay.from_predictions(
        target_test, predictions, ax=ax, display_labels=category_labels
    )
    ax.tick_params(axis="x", labelrotation=70, labelbottom=True)
    fig.savefig("./data/confusion_materix.png", pad_inches=20)
    report = classification_report(target_test, predictions, output_dict=True)
    df_classification_report = pd.DataFrame(report).transpose()
    df_classification_report.to_csv("./data/classification_report.csv")
    mlflow.log_artifact("./data/classification_report.csv")
    mlflow.log_artifact("./data/confusion_materix.png")

## 1) - Start a new run and register model via Web UI
After we have run the various functions for training our classifier again, we can now start a new run and create a classifier. 

Please execute the following cell to start a new run.

In [None]:
with mlflow.start_run(experiment_id=exp_id):
    data = pd.read_csv("./data/genres_standardized.csv", sep=";")
    dataset = mlflow.data.from_pandas(data, targets="genre")
    mlflow.log_input(dataset)
    input_train, input_test, target_train, target_test, target_names = split_data()
    classifier = train_classifier(input_train, target_train)
    mlflow.xgboost.log_model(classifier, "spotify_genre_classifier")
    predictions = predict(classifier=classifier, input_test=input_test)
    analyze(target_test=target_test, predictions=predictions, target_names=target_names)

Open the web UI of the [Mlflow Tracking Server](http://localhost:5001) and view the runs.

Open the experiment with the name "Spotify genre classification | Model Management” and look at the last successful run. It should look something like the following example. 

![](./data/mlflow/Assets/run_overview.png)

As you can see, there is a logged model with the name `xgboost`.

**Click on the logged model.**

The following image shows how MLflow logs a model. The structure of an MLflow model is always similar. However, the model is saved differently depending on the model flavour. An MLflow model is therefore not the same as ONNX.

![](./data/mlflow/Assets/artifact_view.png) 

**Please click on “Register model”, create a new model with the name “Spotify Classifier” and register the model.**

**Then click on “Model” in the navigation bar and select the model you have just registered.**

You will see that “Version 1” has been created there.

![](./data/mlflow/Assets/model.png)

## 2) - Register a model using the Python API
Now we want to enter a trained model in the registry. We want to create a new version of the “Spotify Classifier”. 
To do this, we create a signature (`infer_signature`) in the following example that describes which input data the model expects for a prediction and in which format the output of the model is created. 

Please add the necessary attribute in the following code example to enter the model in the registry under the name “Spotify Classifier”.


In [None]:
with mlflow.start_run(experiment_id=exp_id):
    data = pd.read_csv("./data/genres_standardized.csv", sep=";")
    dataset = mlflow.data.from_pandas(data, targets="genre")
    mlflow.log_input(dataset)
    input_train, input_test, target_train, target_test, target_names = split_data()
    classifier = train_classifier(input_train, target_train)

    predictions = predict(classifier=classifier, input_test=input_test)
    signature = infer_signature(input_test, predictions)
    mlflow.xgboost.log_model(
        xgb_model=classifier,
        artifact_path="spotify_genre_classifier",
        signature=signature,
        registered_model_name="Spotify Classifier",
    )
    analyze(target_test=target_test, predictions=predictions, target_names=target_names)

## 3) - Use of a registered model using an alias.
Imagine that we have created a model that is so good that it can be used in production. The code we use in production should always use our current best released model. This can be solved via an alias. An alias is unique and can only be assigned to one model version.

![](./data/mlflow/Assets/alias.png)
**Please create an alias with the name `production` for the last registered model.**

Once we have created the alias, we can use the alias to load the model reference from our registry. The model reference (`model_version`) has a source attribute that can be used to download the corresponding model.

Load the model version via the alias. Then use the source attribute to download the model.

In [None]:
from mlflow import MlflowClient

client = MlflowClient()
model_version = client.get_model_version_by_alias("Spotify Classifier", "production")
production_model = mlflow.xgboost.load_model(model_version.source)

Finally, the model can be used to create predictions.

In [None]:
predictions = production_model.predict(input_test)