# MLflow Training Tutorial

This `wine_quality.pynb` Jupyter notebook predicts the quality of wine using [sklearn.linear_model.ElasticNet](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html).  

> This is the Jupyter notebook version of the `train.py` example

Attribution
* The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
* P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
* Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

# Goals

- Read in data
- Split data for training and testing
- Train model
- Log parameters, metrics, and model
- Use MLflow user-interface 
- Register model and set for productions
- Use model for predictions

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from mlflow.tracking import MlflowClient
from pprint import pprint

import mlflow
import mlflow.sklearn

## Make a function with relavant metrics for ElasticNet

In [2]:
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

## Read the wine-quality csv file from the URL

In [3]:
# Read the wine-quality csv file from the URL
csv_url =\
    'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'

In [4]:
data = pd.read_csv(csv_url, sep=';')

In [5]:
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


## Split the data into training and test sets. (0.75, 0.25) split.

In [6]:
train, test = train_test_split(data)

## The predicted column is "quality" which is an integer between [3, 9]

In [7]:
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

## Define our hyperparameters for Elastic Net (Linear Regression)

In [8]:
alpha = 0.5
l1_ratio = 0.5

## Set up a small databank to store our models in

In [11]:
mlflow.set_tracking_uri('sqlite:///mlflow.db')

## Train model while MLflow tracks and logs the parameters, metrics, and models

In [12]:
# MLflow Tracking is based on the concept of 'runs'
# Runs are just the execution of some sort of Data Science code

with mlflow.start_run():
    # Execute ElasticNet
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    # Evaluate Metrics
    predicted_qualities = lr.predict(test_x)
    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    # Print out metrics
    print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    # Log parameter to MLflow
    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    
    # Log metrics to MLflow
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    # Log model to MLflow
    mlflow.sklearn.log_model(lr, "Test")

Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.6848992553610684
  MAE: 0.5634820824576144
  R2: 0.13523385605577354


# Now using CLI cd into the current directory and enter the command: 
# mlflow ui --backend-store-uri sqlite:///mlflow.db

In [13]:
# explore user interface and register the model

## Create Client object to retrieve data from our MLflow server

In [14]:
client = MlflowClient()

## Get a list of our registered models 

In [24]:
client.list_registered_models()

[<RegisteredModel: creation_timestamp=1592858539803, description=None, last_updated_timestamp=1592900643182, latest_versions=[<ModelVersion: creation_timestamp=1592900596410, current_stage='Production', description=None, last_updated_timestamp=1592900643182, name='test', run_id='e7a7bdb38ad748e599d4a11176c3505e', source='./mlruns/0/e7a7bdb38ad748e599d4a11176c3505e/artifacts/Test', status='READY', status_message=None, user_id=None, version=2>], name='test'>,
 <RegisteredModel: creation_timestamp=1592900833773, description=None, last_updated_timestamp=1592900841386, latest_versions=[<ModelVersion: creation_timestamp=1592900834011, current_stage='Production', description=None, last_updated_timestamp=1592900841386, name='test_2', run_id='f240f0773b89464a9b52bedf6fd1d985', source='./mlruns/0/f240f0773b89464a9b52bedf6fd1d985/artifacts/Test', status='READY', status_message=None, user_id=None, version=1>], name='test_2'>]

In [25]:
for rm in client.list_registered_models():
    pprint(dict(rm), indent=4)

{   'creation_timestamp': 1592858539803,
    'description': None,
    'last_updated_timestamp': 1592900643182,
    'latest_versions': [   <ModelVersion: creation_timestamp=1592900596410, current_stage='Production', description=None, last_updated_timestamp=1592900643182, name='test', run_id='e7a7bdb38ad748e599d4a11176c3505e', source='./mlruns/0/e7a7bdb38ad748e599d4a11176c3505e/artifacts/Test', status='READY', status_message=None, user_id=None, version=2>],
    'name': 'test'}
{   'creation_timestamp': 1592900833773,
    'description': None,
    'last_updated_timestamp': 1592900841386,
    'latest_versions': [   <ModelVersion: creation_timestamp=1592900834011, current_stage='Production', description=None, last_updated_timestamp=1592900841386, name='test_2', run_id='f240f0773b89464a9b52bedf6fd1d985', source='./mlruns/0/f240f0773b89464a9b52bedf6fd1d985/artifacts/Test', status='READY', status_message=None, user_id=None, version=1>],
    'name': 'test_2'}


## Check to see if we have any models in production

In [22]:
production_models = [m.latest_versions[0] for m
                     in client.list_registered_models()
                     if m.latest_versions[0].current_stage == 'Production']

In [23]:
model_path = None
if len(production_models) == 0:
    print('No models flagged as production')
else: 
    model_path = production_models[0].source
    print(f'Model Path = {model_path}')

Model Path = ./mlruns/0/e7a7bdb38ad748e599d4a11176c3505e/artifacts/Test


## Load model for use

In [19]:
predict_func = mlflow.pyfunc.load_model(model_path)

## Use a small batch of features to make predictions

In [20]:
small_batch = test_x[0:3]

In [21]:
predictions = predict_func.predict(small_batch)
print(f'\n----------------------\nPredictions: {predictions}')


----------------------
Predictions: [5.49879772 5.79322156 6.66189577]
