# Train a Ridge Regression Model on the Diabetes Dataset and track experiment using MLflow

This notebook loads the Diabetes dataset from sklearn, splits the data into training and validation sets, trains a Ridge regression model, validates the model on the validation set, and saves the model using MLflow to track parameters, model and results of the experiment.

In [74]:
import mlflow
from azureml.core import Workspace

You need to connect mlflow to your Azure ML Workspace to use the Tracking API of MLflow.

In [75]:
ws = Workspace.from_config(path="../environment_setup/aml_workspace/config.json")
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

Here we ùay have a warning that tells us to use the ServicePrincipalAuthentication mode to connect to the workspace, but because I don't have access to the Azure Active Directory in my subscription (restricted access in my company)

In [76]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import joblib
import pandas as pd

## Load Data

In [77]:
sample_data = load_diabetes()

df = pd.DataFrame(
    data=sample_data.data,
    columns=sample_data.feature_names)
df['Y'] = sample_data.target

In [78]:
print(df.shape)

(442, 11)


In [79]:
# All data in a single dataframe
df.describe()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,Y
count,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0
mean,-3.639623e-16,1.309912e-16,-8.013951e-16,1.289818e-16,-9.042540000000001e-17,1.301121e-16,-4.563971e-16,3.863174e-16,-3.848103e-16,-3.398488e-16,152.133484
std,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,77.093005
min,-0.1072256,-0.04464164,-0.0902753,-0.1123996,-0.1267807,-0.1156131,-0.1023071,-0.0763945,-0.1260974,-0.1377672,25.0
25%,-0.03729927,-0.04464164,-0.03422907,-0.03665645,-0.03424784,-0.0303584,-0.03511716,-0.03949338,-0.03324879,-0.03317903,87.0
50%,0.00538306,-0.04464164,-0.007283766,-0.005670611,-0.004320866,-0.003819065,-0.006584468,-0.002592262,-0.001947634,-0.001077698,140.5
75%,0.03807591,0.05068012,0.03124802,0.03564384,0.02835801,0.02984439,0.0293115,0.03430886,0.03243323,0.02791705,211.5
max,0.1107267,0.05068012,0.1705552,0.1320442,0.1539137,0.198788,0.1811791,0.1852344,0.133599,0.1356118,346.0


## Split Data into Training and Validation Sets

In [80]:
X = df.drop('Y', axis=1).values
y = df['Y'].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=0)
data = {"train": {"X": X_train, "y": y_train},
        "test": {"X": X_test, "y": y_test}}

## Train Model on Training Set

Create experiment and tell MLFlow you're using this experiment :

In [81]:
experiment_name = 'experiment_with_mlflow'
mlflow.set_experiment(experiment_name=experiment_name)

Log parameters to experiment run :

In [82]:
# experiment parameters
args = {
    "alpha": 0.5
}
mlflow.log_params(args)

Train model :

In [83]:
reg_model = Ridge(**args)
reg_model.fit(data["train"]["X"], data["train"]["y"])

Ridge(alpha=0.5)

## Validate Model on Validation Set

Log model metrics to the experiment run :

In [84]:
preds = reg_model.predict(data["test"]["X"])
mse = mean_squared_error(preds, y_test)
metrics = {"mse": mse}
mlflow.log_metric("mse", mse)
print(metrics)

{'mse': 3298.9096058070622}


## Save Model

Save model to local filesystem eventually :

In [85]:
model_name = "sklearn_regression_model.pkl"

joblib.dump(value=reg_model, filename=model_name)

['sklearn_regression_model.pkl']

Most importantly log model to artifacts of the run :

In [86]:
mlflow.sklearn.log_model(reg_model, artifact_path="/model")



Register model to Azure ML using MLFlow API : (you can do this step by specifying the registered_model_name parameter to the mlflow.sklearn.log_model function.)

In [87]:
run = mlflow.active_run()
model_uri = 'runs:/{}/model'.format(run.info.run_id)
mlflow.register_model(model_uri=model_uri, name="diabetes_reg")

Registered model 'diabetes_reg' already exists. Creating a new version of this model...
2021/07/29 12:15:18 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: diabetes_reg, version 3
Created version '3' of model 'diabetes_reg'.


<ModelVersion: creation_timestamp=1627553717806, current_stage='None', description='', last_updated_timestamp=1627553717806, name='diabetes_reg', run_id='a4939c90-dfcf-44c0-9063-291044ef74fe', run_link='', source='azureml://experiments/experiment_with_mlflow/runs/a4939c90-dfcf-44c0-9063-291044ef74fe/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='3'>

Finish the experiment :

In [88]:
mlflow.end_run()

You have a lot of other possibilities you can check [here](https://mlflow.org/docs/latest/python_api/mlflow.html).