
# Model Registry

MLflow Model Registry is a collaborative hub where teams can share ML models, work together from experimentation to online testing and production, integrate with approval and governance workflows, and monitor ML deployments and their performance.  This lesson explores how to manage models using the MLflow model registry.

## In this lesson you:<br>
 - Register a model using MLflow
 - Deploy that model into production
 - Update a model in production to new version including a staging phase for testing
 - Archive and delete models


### Model Registry

The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow Experiment and Run produced the model), model versioning, stage transitions (e.g. from staging to production), annotations (e.g. with comments, tags), and deployment management (e.g. which production jobs have requested a specific model version).

Model registry has the following features:<br><br>

* **Central Repository:** Register MLflow models with the MLflow Model Registry. A registered model has a unique name, version, stage, and other metadata.
* **Model Versioning:** Automatically keep track of versions for registered models when updated.
* **Model Stage:** Assigned preset or custom stages to each model version, like “Staging” and “Production” to represent the lifecycle of a model.
* **Model Stage Transitions:** Record new registration events or changes as activities that automatically log users, changes, and additional metadata such as comments.
* **CI/CD Workflow Integration:** Record stage transitions, request, review and approve changes as part of CI/CD pipelines for better control and governance.

<div><img src="https://files.training.databricks.com/images/eLearning/ML-Part-4/model-registry.png" style="height: 400px; margin: 20px"/></div>

See <a href="https://mlflow.org/docs/latest/registry.html" target="_blank">the MLflow docs</a> for more details on the model registry.


### Registering a Model

The following workflow will work with either the UI or in pure Python.  This notebook will use pure Python.


Train a model and log it to MLflow.

In [0]:
import mlflow
import mlflow.sklearn
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from mlflow.models.signature import infer_signature

white_wine = pd.read_csv("/dbfs/databricks-datasets/wine-quality/winequality-white.csv", sep=";")
red_wine = pd.read_csv("/dbfs/databricks-datasets/wine-quality/winequality-red.csv", sep=";")

red_wine['is_red'] = 1
white_wine['is_red'] = 0

data = pd.concat([red_wine, white_wine], axis=0)

# Remove spaces from column names
data.rename(columns=lambda x: x.replace(' ', '_'), inplace=True)

high_quality = (data.quality >= 7).astype(int)
data.quality = high_quality

data.reset_index(drop=True,inplace=True)
data.drop

train, test = train_test_split(data, random_state=123)
X_train = train.drop(["quality"], axis=1)
X_test = test.drop(["quality"], axis=1)
y_train = train.quality
y_test = test.quality

train, test = train_test_split(data, random_state=123)
X_train = train.drop(["quality"], axis=1)
X_test = test.drop(["quality"], axis=1)
y_train = train.quality
y_test = test.quality

n_estimators = 100
max_depth = 5

rf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
rf.fit(X_train, y_train)

input_example = X_train.head(3)
signature = infer_signature(X_train, pd.DataFrame(y_train))

with mlflow.start_run(run_name="RF Model") as run:
    mlflow.sklearn.log_model(rf, "model", input_example=input_example, signature=signature)
    mlflow.log_metric("auc", roc_auc_score(y_test, rf.predict(X_test)))
    mlflow.log_param("n_estimators", n_estimators)
    mlflow.log_param("max_depth", max_depth)
    run_id = run.info.run_id

  inputs = _infer_schema(model_input)
  outputs = _infer_schema(model_output) if model_output is not None else None



Create a unique model name so you don't clash with other workspace users.

In [0]:
suffix = "aml"
model_name = f"wine-rf-model_{suffix}"


Register the model.

In [0]:
model_uri = f"runs:/{run_id}/model"

model_details = mlflow.register_model(model_uri=model_uri, name=model_name)

Successfully registered model 'wine-rf-model_aml'.
2024/04/04 11:40:48 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: wine-rf-model_aml, version 1
Created version '1' of model 'wine-rf-model_aml'.



**Open the *Models* tab on the left of the screen to explore the registered model.**  Note the following:<br><br>

* It logged who trained the model and what code was used
* It logged a history of actions taken on this model
* It logged this model as a first version


Check the status.  It will initially be in **`PENDING_REGISTRATION`** status.

In [0]:
from mlflow.tracking.client import MlflowClient

client = MlflowClient()
model_version_details = client.get_model_version(name=model_name, version=1)

model_version_details.status

Out[8]: 'READY'




Now add a model description

In [0]:
client.update_registered_model(
    name=model_details.name,
    description="This model classify whine quality based on various inputs."
)

Out[9]: <RegisteredModel: creation_timestamp=1712230847619, description='This model classify whine quality based on various inputs.', last_updated_timestamp=1712231047884, latest_versions=[], name='wine-rf-model_aml', tags={}>


Add a version-specific description.

In [0]:
client.update_model_version(
    name=model_details.name,
    version=model_details.version,
    description="This model version was built using sklearn."
)

Out[10]: <ModelVersion: creation_timestamp=1712230848094, current_stage='None', description='This model version was built using sklearn.', last_updated_timestamp=1712231071454, name='wine-rf-model_aml', run_id='514e8b817daa41328e35c69d99833cd8', run_link='', source='dbfs:/databricks/mlflow-tracking/4422168978451532/514e8b817daa41328e35c69d99833cd8/artifacts/model', status='READY', status_message='', tags={}, user_id='4955259050321055', version='1'>



### Deploying a Model

The MLflow Model Registry defines several model stages: **`None`**, **`Staging`**, **`Production`**, and **`Archived`**. Each stage has a unique meaning. For example, **`Staging`** is meant for model testing, while **`Production`** is for models that have completed the testing or review processes and have been deployed to applications. 

Users with appropriate permissions can transition models between stages. In private preview, any user can transition a model to any stage. In the near future, administrators in your organization will be able to control these permissions on a per-user and per-model basis.

If you have permission to transition a model to a particular stage, you can make the transition directly by using the **`MlflowClient.update_model_version()`** function. If you do not have permission, you can request a stage transition using the REST API; for example: ***```%sh curl -i -X POST -H "X-Databricks-Org-Id: <YOUR_ORG_ID>" -H "Authorization: Bearer <YOUR_ACCESS_TOKEN>" https://<YOUR_DATABRICKS_WORKSPACE_URL>/api/2.0/preview/mlflow/transition-requests/create -d '{"comment": "Please move this model into production!", "model_version": {"version": 1, "registered_model": {"name": "power-forecasting-model"}}, "stage": "Production"}'
```***

Now that you've learned about stage transitions, transition the model to the **`Production`** stage.

In [0]:
import time

time.sleep(10) # In case the registration is still pending

In [0]:
client.transition_model_version_stage(
    name=model_details.name,
    version=model_details.version,
    stage="Production"
)

Out[12]: <ModelVersion: creation_timestamp=1712230848094, current_stage='Production', description='This model version was built using sklearn.', last_updated_timestamp=1712231218340, name='wine-rf-model_aml', run_id='514e8b817daa41328e35c69d99833cd8', run_link='', source='dbfs:/databricks/mlflow-tracking/4422168978451532/514e8b817daa41328e35c69d99833cd8/artifacts/model', status='READY', status_message='', tags={}, user_id='4955259050321055', version='1'>

Fetch the model's current status.

In [0]:
model_version_details = client.get_model_version(
  name=model_details.name,
  version=model_details.version,
)
print(f"The current model stage is: '{model_version_details.current_stage}'")

The current model stage is: 'Production'



Fetch the latest model using a **`pyfunc`**.  Loading the model in this way allows us to use the model regardless of the package that was used to train it.

You can load a specific version of the model too.

In [0]:
import mlflow.pyfunc

model_version_uri = f"models:/{model_name}/1"

print(f"Loading registered model version from URI: '{model_version_uri}'")
model_version_1 = mlflow.pyfunc.load_model(model_version_uri)

Loading registered model version from URI: 'models:/wine-rf-model_aml/1'



Apply the model.

In [0]:
model_version_1.predict(X_test)

Out[15]: array([0, 0, 0, ..., 0, 0, 0])


### Deploying a New Model Version

The MLflow Model Registry enables you to create multiple model versions corresponding to a single registered model. By performing stage transitions, you can seamlessly integrate new model versions into your staging or production environments.


Create a new model version and register that model when it's logged.

In [0]:
n_estimators = 300
max_depth = 10

rf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
rf.fit(X_train, y_train)

input_example = X_train.head(3)
signature = infer_signature(X_train, pd.DataFrame(y_train))

with mlflow.start_run(run_name="RF Model") as run:
    # Specify the `registered_model_name` parameter of the `mlflow.sklearn.log_model()`
    # function to register the model with the MLflow Model Registry. This automatically
    # creates a new model version
    mlflow.sklearn.log_model(
        sk_model=rf,
        artifact_path="sklearn-model",
        registered_model_name=model_name,
        input_example=input_example,
        signature=signature
    )
    mlflow.log_metric("auc", roc_auc_score(y_test, rf.predict(X_test)))

    mlflow.log_param("n_estimators", n_estimators)
    mlflow.log_param("max_depth", max_depth)

    run_id = run.info.run_id

  inputs = _infer_schema(model_input)
  outputs = _infer_schema(model_output) if model_output is not None else None
Registered model 'wine-rf-model_aml' already exists. Creating a new version of this model...
2024/04/04 11:52:45 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: wine-rf-model_aml, version 2
Created version '2' of model 'wine-rf-model_aml'.



Check the UI to see the new model version.



Use the search functionality to grab the latest model version.

In [0]:
model_version_infos = client.search_model_versions(f"name = '{model_name}'")
new_model_version = max([model_version_info.version for model_version_info in model_version_infos])
print(f"New model version: {new_model_version}")

New model version: 2



Add a description to this new version.

In [0]:
client.update_model_version(
    name=model_name,
    version=new_model_version,
    description="This model version is a random classifier with 300 decision trees and a max depth of 10 that was trained in scikit-learn."
)

Out[22]: <ModelVersion: creation_timestamp=1712231565352, current_stage='None', description=('This model version is a random classifier with 300 decision trees and a max '
 'depth of 10 that was trained in scikit-learn.'), last_updated_timestamp=1712231668657, name='wine-rf-model_aml', run_id='a24a257f373a47b8b9cff9084a109a32', run_link='', source='dbfs:/databricks/mlflow-tracking/4422168978451532/a24a257f373a47b8b9cff9084a109a32/artifacts/sklearn-model', status='READY', status_message='', tags={}, user_id='4955259050321055', version='2'>




Put this new model version into **`Staging`**

In [0]:
time.sleep(10) # In case the registration is still pending

client.transition_model_version_stage(
    name=model_name,
    version=new_model_version,
    stage="Staging"
)

Out[23]: <ModelVersion: creation_timestamp=1712231565352, current_stage='Staging', description=('This model version is a random classifier with 300 decision trees and a max '
 'depth of 10 that was trained in scikit-learn.'), last_updated_timestamp=1712231710721, name='wine-rf-model_aml', run_id='a24a257f373a47b8b9cff9084a109a32', run_link='', source='dbfs:/databricks/mlflow-tracking/4422168978451532/a24a257f373a47b8b9cff9084a109a32/artifacts/sklearn-model', status='READY', status_message='', tags={}, user_id='4955259050321055', version='2'>


Since this model is now in staging, you can execute an automated CI/CD pipeline against it to test it before going into production.  Once that is completed, you can push that model into production.

In [0]:
client.transition_model_version_stage(
    name=model_name,
    version=new_model_version,
    stage="Production",
    archive_existing_versions=True # Archive old versions of this model
)

Out[24]: <ModelVersion: creation_timestamp=1712231565352, current_stage='Production', description=('This model version is a random classifier with 300 decision trees and a max '
 'depth of 10 that was trained in scikit-learn.'), last_updated_timestamp=1712231727424, name='wine-rf-model_aml', run_id='a24a257f373a47b8b9cff9084a109a32', run_link='', source='dbfs:/databricks/mlflow-tracking/4422168978451532/a24a257f373a47b8b9cff9084a109a32/artifacts/sklearn-model', status='READY', status_message='', tags={}, user_id='4955259050321055', version='2'>


### Deleting

You can now delete old versions of the model.


Delete version 1.  

You cannot delete a model that is not first archived.

In [0]:
client.delete_model_version(
    name=model_name,
    version=1
)


Archive version 2 of the model too.

In [0]:
client.transition_model_version_stage(
    name=model_name,
    version=2,
    stage="Archived"
)

Out[26]: <ModelVersion: creation_timestamp=1712231565352, current_stage='Archived', description=('This model version is a random classifier with 300 decision trees and a max '
 'depth of 10 that was trained in scikit-learn.'), last_updated_timestamp=1712231791101, name='wine-rf-model_aml', run_id='a24a257f373a47b8b9cff9084a109a32', run_link='', source='dbfs:/databricks/mlflow-tracking/4422168978451532/a24a257f373a47b8b9cff9084a109a32/artifacts/sklearn-model', status='READY', status_message='', tags={}, user_id='4955259050321055', version='2'>


Now delete the entire registered model.

In [0]:
client.delete_registered_model(model_name)