In [35]:
import pandas as pd
import mlflow
import sklearn 
import numpy as np 
import warnings
import os
warnings.simplefilter('ignore')

## What does MLFlow registery do?

The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model.

* An MLflow Model is created from an experiment or run that is logged.Once it is logged, this model can then be registered with the Model Registry.

* A registered model has a unique name, contains versions, associated transitional stages, model lineage, and other metadata.

* We can also add suitable description to the registred model.

**Model version vs Model stage**

* Modelversion - Each registered model can have one or many versions. When a new model is added to the Model Registry, it is added as version 1. Each new model registered to the same model name increments the version number.

* Model stage - Each distinct model version can be assigned one stage at any given time. MLflow provides predefined stages for common use-cases such as Staging, Production or Archived


Suppose we have to push our best model (model with best R2) to registery. How to achieve it?

In [36]:
os.chdir('../MLFlow-Tracking')

In [37]:
remote_server_uri = "http://127.0.0.1:5000"
backend_store_uri = "sqlite:///mlflow.db"
mlflow.set_registry_uri(remote_server_uri)
mlflow.set_tracking_uri(backend_store_uri)
mlflow.tracking.get_registry_uri(),mlflow.tracking.get_tracking_uri()

('http://127.0.0.1:5000', 'sqlite:///mlflow.db')

In [5]:
#list all experiments
from  mlflow.tracking import MlflowClient
client = MlflowClient()
experiments = client.list_experiments()
experiments


[<Experiment: artifact_location='mlruns/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>,
 <Experiment: artifact_location='./mlruns/1', experiment_id='1', lifecycle_stage='active', name='Wine_quality_Random_Forest', tags={}>,
 <Experiment: artifact_location='./mlruns/2', experiment_id='2', lifecycle_stage='active', name='Wine_quality_Decision_Tree', tags={'sklearn.framework': 'LR'}>]

We will register the best model among Randomforest

In [38]:
# Fetch experiment metadata information
experiment_id = str(1)
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

Name: Wine_quality_Random_Forest
Experiment_id: 1
Artifact Location: ./mlruns/1
Tags: {}
Lifecycle_stage: active


In [39]:
experiment_name = "Wine_quality_Random_Forest"
# get exp id by name
exp_details = dict(mlflow.get_experiment_by_name(experiment_name))
exp_id = exp_details['experiment_id']
df = mlflow.search_runs([exp_id])
df

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.train_rmse,metrics.r2,metrics.train_mae,metrics.test_mae,metrics.test_rmse,params.max_depth,params.n_estimators,tags.mlflow.runName,tags.mlflow.log-model.history,tags.mlflow.source.name,tags.mlflow.source.type,tags.mlflow.user
0,ae913b31a8304ae2800a34f48746097f,1,FINISHED,./mlruns/1/ae913b31a8304ae2800a34f48746097f/ar...,2022-05-09 10:38:23.221000+00:00,2022-05-09 10:38:31.985000+00:00,0.62519,0.336691,0.486478,0.515765,0.640561,3,300,RF_2,"[{""run_id"": ""ae913b31a8304ae2800a34f48746097f""...",c:\Users\Arun Mohan\.conda\envs\mlflowenv\lib\...,LOCAL,Arun Mohan
1,9f119d19982d473fa77783a8fa84dfa7,1,FINISHED,./mlruns/1/9f119d19982d473fa77783a8fa84dfa7/ar...,2022-05-09 10:38:08.469000+00:00,2022-05-09 10:38:23.168000+00:00,0.44736,0.439982,0.346903,0.465243,0.588578,7,300,RF_1,"[{""run_id"": ""9f119d19982d473fa77783a8fa84dfa7""...",c:\Users\Arun Mohan\.conda\envs\mlflowenv\lib\...,LOCAL,Arun Mohan
2,ed7620729b374017baf95b35a74a3180,1,FINISHED,./mlruns/1/ed7620729b374017baf95b35a74a3180/ar...,2022-05-09 10:37:56.759000+00:00,2022-05-09 10:38:08.365000+00:00,0.543539,0.396027,0.423144,0.487621,0.61124,5,100,RF_0,"[{""run_id"": ""ed7620729b374017baf95b35a74a3180""...",c:\Users\Arun Mohan\.conda\envs\mlflowenv\lib\...,LOCAL,Arun Mohan


In [40]:
best = df[df['metrics.r2'] == df['metrics.r2'].max()]
print(f'Details:')
best.to_dict()

Details:


{'run_id': {1: '9f119d19982d473fa77783a8fa84dfa7'},
 'experiment_id': {1: '1'},
 'status': {1: 'FINISHED'},
 'artifact_uri': {1: './mlruns/1/9f119d19982d473fa77783a8fa84dfa7/artifacts'},
 'start_time': {1: Timestamp('2022-05-09 10:38:08.469000+0000', tz='UTC')},
 'end_time': {1: Timestamp('2022-05-09 10:38:23.168000+0000', tz='UTC')},
 'metrics.train_rmse': {1: 0.4473604176962957},
 'metrics.r2': {1: 0.43998224802577834},
 'metrics.train_mae': {1: 0.34690298225856253},
 'metrics.test_mae': {1: 0.465242675775956},
 'metrics.test_rmse': {1: 0.5885775065871136},
 'params.max_depth': {1: '7'},
 'params.n_estimators': {1: '300'},
 'tags.mlflow.runName': {1: 'RF_1'},
 'tags.mlflow.log-model.history': {1: '[{"run_id": "9f119d19982d473fa77783a8fa84dfa7", "artifact_path": "model", "utc_time_created": "2022-05-09 10:38:14.477193", "flavors": {"python_function": {"model_path": "model.pkl", "loader_module": "mlflow.sklearn", "python_version": "3.9.12", "env": "conda.yaml"}, "sklearn": {"pickled_mo

In [41]:
mlflow.set_tracking_uri("http://localhost:5000")

# If a registered model with the name doesn’t exist, the method registers a new model, 
# creates Version 1, and returns a ModelVersion MLflow object. If a registered model with the name exists, the method
# creates a new model version and returns the version object

result = mlflow.register_model(
    "mlruns/1/9f119d19982d473fa77783a8fa84dfa7/artifacts/model",
    "random-forest-wine-quality-model"
)
result

Successfully registered model 'random-forest-wine-quality-model'.
2022/05/09 17:02:59 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: random-forest-wine-quality-model, version 1
Created version '1' of model 'random-forest-wine-quality-model'.


<ModelVersion: creation_timestamp=1652095979619, current_stage='None', description='', last_updated_timestamp=1652095979619, name='random-forest-wine-quality-model', run_id='', run_link='', source='mlruns/1/9f119d19982d473fa77783a8fa84dfa7/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

Suppose we need to update details of any registered model

In [42]:
# lt us include details of the model
client = MlflowClient()
client.update_model_version(
    name="random-forest-wine-quality-model",
    version=1,
    description="This model version is a scikit-learn random forest for wine quality prediction with r2 0.439"
)

<ModelVersion: creation_timestamp=1652095979619, current_stage='None', description=('This model version is a scikit-learn random forest for wine quality '
 'prediction with r2 0.439'), last_updated_timestamp=1652095982886, name='random-forest-wine-quality-model', run_id='', run_link='', source='mlruns/1/9f119d19982d473fa77783a8fa84dfa7/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

MLFlow offeres 3 kinds of stages for model- staging,production and archieve.

Let us move the model to staging and then to production

In [43]:
client = MlflowClient()
client.transition_model_version_stage(
    name="random-forest-wine-quality-model",
    version=1,
    stage="Staging"
)

<ModelVersion: creation_timestamp=1652095979619, current_stage='Staging', description=('This model version is a scikit-learn random forest for wine quality '
 'prediction with r2 0.439'), last_updated_timestamp=1652095994267, name='random-forest-wine-quality-model', run_id='', run_link='', source='mlruns/1/9f119d19982d473fa77783a8fa84dfa7/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

In [44]:

client.transition_model_version_stage(
    name="random-forest-wine-quality-model",
    version=1,
    stage="Production"
)

<ModelVersion: creation_timestamp=1652095979619, current_stage='Production', description=('This model version is a scikit-learn random forest for wine quality '
 'prediction with r2 0.439'), last_updated_timestamp=1652095994876, name='random-forest-wine-quality-model', run_id='', run_link='', source='mlruns/1/9f119d19982d473fa77783a8fa84dfa7/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

similary move the best decision tree model to staging

In [45]:
experiment_name = 'Wine_quality_Decision_Tree'
# get exp id by name
exp_details = dict(mlflow.get_experiment_by_name(experiment_name))
exp_id = exp_details['experiment_id']
df = mlflow.search_runs([exp_id])
best = df[df['metrics.test_score'] == df['metrics.test_score'].max()]
print(f'Details:')
best.to_dict()

Details:


{'run_id': {3: '62e669c2cd7e4e748d50f87253eb262d'},
 'experiment_id': {3: '2'},
 'status': {3: 'FINISHED'},
 'artifact_uri': {3: './mlruns/2/62e669c2cd7e4e748d50f87253eb262d/artifacts'},
 'start_time': {3: Timestamp('2022-05-09 10:39:40.526000+0000', tz='UTC')},
 'end_time': {3: Timestamp('2022-05-09 10:39:50.732000+0000', tz='UTC')},
 'metrics.training_r2_score': {3: 0.4847313759455386},
 'metrics.training_rmse': {3: 0.584270822848656},
 'metrics.test_score': {3: 0.31164898740117164},
 'metrics.test_mse': {3: 0.4258096341998065},
 'metrics.test_rmse': {3: 0.6525409061505696},
 'metrics.training_score': {3: 0.4847313759455386},
 'metrics.training_mae': {3: 0.4383337776650221},
 'metrics.training_mse': {3: 0.34137239443224554},
 'metrics.test_r2_score': {3: 0.31164898740117164},
 'metrics.test_mae': {3: 0.49453849729558486},
 'params.max_depth': {3: '5'},
 'params.random_state': {3: 'None'},
 'params.max_features': {3: 'None'},
 'params.min_weight_fraction_leaf': {3: '0.0'},
 'params.mi

In [46]:
mlflow.set_tracking_uri("http://localhost:5000")

# If a registered model with the name doesn’t exist, the method registers a new model, 
# creates Version 1, and returns a ModelVersion MLflow object. If a registered model with the name exists, the method
# creates a new model version and returns the version object

result = mlflow.register_model(
    "mlruns/2/62e669c2cd7e4e748d50f87253eb262d/artifacts/model",
    "decision-tree-wine-quality-model"
)
# lt us include details of the model
client = MlflowClient()
client.update_model_version(
    name="decision-tree-wine-quality-model",
    version=1,
    description="This model version is a scikit-learn random forest for wine quality prediction with r2 0.439"
)
client.transition_model_version_stage(
    name="decision-tree-wine-quality-model",
    version=1,
    stage="Staging"
)

Successfully registered model 'decision-tree-wine-quality-model'.
2022/05/09 17:03:47 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: decision-tree-wine-quality-model, version 1
Created version '1' of model 'decision-tree-wine-quality-model'.


<ModelVersion: creation_timestamp=1652096027662, current_stage='Staging', description=('This model version is a scikit-learn random forest for wine quality '
 'prediction with r2 0.439'), last_updated_timestamp=1652096027741, name='decision-tree-wine-quality-model', run_id='', run_link='', source='mlruns/2/62e669c2cd7e4e748d50f87253eb262d/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

<h4>List all registered models</h4>

In [47]:

from pprint import pprint

client = MlflowClient()
for rm in client.list_registered_models():
    pprint(dict(rm), indent=4)


{   'creation_timestamp': 1652096027608,
    'description': '',
    'last_updated_timestamp': 1652096027741,
    'latest_versions': [   <ModelVersion: creation_timestamp=1652096027662, current_stage='Staging', description=('This model version is a scikit-learn random forest for wine quality '
 'prediction with r2 0.439'), last_updated_timestamp=1652096027741, name='decision-tree-wine-quality-model', run_id='', run_link='', source='mlruns/2/62e669c2cd7e4e748d50f87253eb262d/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>],
    'name': 'decision-tree-wine-quality-model',
    'tags': {}}
{   'creation_timestamp': 1652095979564,
    'description': '',
    'last_updated_timestamp': 1652095994876,
    'latest_versions': [   <ModelVersion: creation_timestamp=1652095979619, current_stage='Production', description=('This model version is a scikit-learn random forest for wine quality '
 'prediction with r2 0.439'), last_updated_timestamp=1652095994876, name=

In [34]:
# deleting model
# client = MlflowClient()
# versions=[1, 2, 3]
# for version in versions:
#     client.delete_model_version(name="random-forest-wine-quality-model", version=version)

# # Delete a registered model along with all its versions
# client.delete_registered_model(name="random-forest-wine-quality-model")

<img src="../tmp/ui3.PNG">

<h4> Make predictions on registered model (random forest)</h4>

In [48]:
import mlflow.pyfunc

model_name = "random-forest-wine-quality-model"
stage = 'Production'

model = mlflow.pyfunc.load_model(
    model_uri=f"models:/{model_name}/{stage}"
)


In [49]:
from sklearn.model_selection import train_test_split

def load_data(data_path ='data/winequality-red.csv'):
    data = pd.read_csv(data_path)
    X = data.drop(["quality"], axis=1)
    y = data['quality']
    X_train,X_test,y_train, y_test =  train_test_split(X,y,test_size=0.25,random_state=42)
    return X_train,y_train,X_test,y_test


X_train,y_train,X_test,y_test = load_data()
model.predict(X_test.iloc[0].values.reshape(1,-1))

array([5.32582927])