# MLflow Training Tutorial

This `wine_quality.pynb` Jupyter notebook predicts the quality of wine using [sklearn.linear_model.ElasticNet](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html).  

> This is the Jupyter notebook version of the `train.py` example

Attribution
* The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
* P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
* Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

# Goals

- Read in data
- Split data for training and testing
- Train model
- Log parameters, metrics, and model
- Use MLflow user-interface 
- Register model and set for productions
- Use model for predictions

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from mlflow.tracking import MlflowClient
from pprint import pprint

import mlflow
import mlflow.sklearn

## Make a function with relavant metrics for ElasticNet

In [2]:
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

## Read the wine-quality csv file from the URL

In [3]:
# Read the wine-quality csv file from the URL
csv_url =\
    'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'

In [4]:
data = pd.read_csv(csv_url, sep=';')

In [53]:
data.head()

## Split the data into training and test sets. (0.75, 0.25) split.

In [6]:
train, test = train_test_split(data)

## The predicted column is "quality" which is an integer between [3, 9]

In [7]:
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

## Define our hyperparameters for Elastic Net (Linear Regression)

In [35]:
alpha = 0.333
l1_ratio = 0.666

## Connect to a databank in cloud to store our models in

In [46]:
mlflow.set_tracking_uri('postgresql://postgres:postgres@postgres.cycau0z5fcoi.us-east-1.rds.amazonaws.com:5432/mlflow')

## Train model while MLflow tracks and logs the parameters, metrics, and models

In [57]:
# MLflow Tracking is based on the concept of 'runs'
# Runs are just the execution of some sort of Data Science code

with mlflow.start_run():
    # Execute ElasticNet
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    # Evaluate Metrics
    predicted_qualities = lr.predict(test_x)
    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    # Print out metrics
    print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    # Log parameter to MLflow
    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    
    # Log metrics to MLflow
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    # Log model to MLflow
    mlflow.sklearn.log_model(lr, "Test")

# Now using CLI cd into the current directory and enter the command: 
##### mlflow ui --backend-store-uri postgresql://postgres:postgres@postgres.cycau0z5fcoi.us-east-1.rds.amazonaws.com:5432/mlflow

In [11]:
# explore user interface and register the model

## Create Client object to retrieve data from our MLflow server

In [58]:
client = MlflowClient()

## Get a list of our registered models 

In [59]:
client.list_registered_models()

In [50]:
for rm in client.list_registered_models():
    pprint(dict(rm), indent=4)

## Check to see if we have any models in production

In [60]:
production_models = [m.latest_versions[0] for m
                     in client.list_registered_models()
                     if m.latest_versions[0].current_stage == 'Production']

In [61]:
model_path = None
if len(production_models) == 0:
    print('No models flagged as production')
else: 
    model_path = production_models[0].source
    print(f'Model Path = {model_path}')

## Load model for use

In [43]:
predict_func = mlflow.pyfunc.load_model(model_path)

## Use a small batch of features to make predictions

In [44]:
small_batch = test_x[0:3]

In [62]:
predictions = predict_func.predict(small_batch)
print(f'\n----------------------\nPredictions: {predictions}')