# MLflow Training Tutorial

This Jupyter notebook is directly taken from MLflow's examples and slightly adapted. It predicts the quality of wine using [sklearn.linear_model.ElasticNet](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html).  

Attribution
* The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
* P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
* Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.


In [1]:
import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn

In [2]:
warnings.filterwarnings("ignore")
np.random.seed(40)

In [3]:
mlflow.set_tracking_uri("http://localhost:5000")
print("Tracking URI: ", mlflow.tracking.get_tracking_uri())

Tracking URI:  http://localhost:5000


In [4]:
experiment_name = "sklearn_elasticnet_wine"
print("experiment_name: ", experiment_name)
mlflow.set_experiment(experiment_name)

client = mlflow.tracking.MlflowClient()
experiment_id = client.get_experiment_by_name(experiment_name).experiment_id
print("experiment_id: ", experiment_id)

experiment_name:  sklearn_elasticnet_wine
experiment_id:  2


In [5]:
# Wine Quality Sample
def run(data, in_alpha, in_l1_ratio, log_model=True):
    
    def eval_metrics(actual, pred):
        rmse = np.sqrt(mean_squared_error(actual, pred))
        mae = mean_absolute_error(actual, pred)
        r2 = r2_score(actual, pred)
        return rmse, mae, r2

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    # Set default values if no alpha is provided
    if float(in_alpha) is None:
        alpha = 0.5
    else:
        alpha = float(in_alpha)

    # Set default values if no l1_ratio is provided
    if float(in_l1_ratio) is None:
        l1_ratio = 0.5
    else:
        l1_ratio = float(in_l1_ratio)

    # Useful for multiple runs (only doing one run in this sample notebook)    
    with mlflow.start_run(run_name="jupyter") as run:
        
        # Execute ElasticNet
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        # Evaluate Metrics
        predicted_qualities = lr.predict(test_x)
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        # Print out metrics
        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)
        
        print("runId: ", run.info.run_id)
        print("hyperparameters: ", lr.get_params())
        
        # Log parameter, metrics, and model to MLflow
        mlflow.log_params(lr.get_params())
        mlflow.log_metrics({
            "rmse": rmse, 
            "r2": r2, 
            "mae": mae
        })
        mlflow.set_tags({"log_model": log_model, "run_origin": "jupyter"})        
        
        if log_model:
            mlflow.sklearn.log_model(lr, "model")

In [6]:
# Read the wine-quality csv file from the URL
csv_url =\
    'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
try:
    data = pd.read_csv(csv_url, sep=';')
except Exception as e:
    logger.exception(
        "Unable to download training & test CSV, check your internet connection. Error: %s", e)

In [7]:
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [None]:
run(data, 0.5, 0.5)

In [None]:
run(data, 0.2, 0.2)

In [None]:
run(data, 0.1, 0.1)

**Loading `sklearn` from MLflow server**

In [None]:
run_list = client.search_runs(experiment_id, "tags.log_model='True'")
run_id = run_list[0].info.run_id
print(run_id)

sk_model = mlflow.sklearn.load_model("runs:/{}/model".format(run_id))

In [None]:
test_data = data.drop(["quality"], axis=1)
test_data = test_data.iloc[np.random.choice(1599, 10),:]

In [None]:
sk_model.predict(test_data)

**Creating REST API**

In [None]:
import os
os.environ["MLFLOW_TRACKING_URI"] = "http://localhost:5000"

In [None]:
! mlflow models serve -m runs:/b09f96e9673e46d49610fc09365d220e/model -p 43210

Windows

In [14]:
! curl http://localhost:43210/invocations \
        -H "Content-Type: application/json; format=pandas-split" \
        -d "{
        \"columns\":[\"fixed acidity\",\"volatile acidity\",\"citric acid\",\"residual sugar\", \
        \"chlorides\",\"free sulfur dioxide\",\"total sulfur dioxide\",\"density\",\"pH\",\"sulphates\",\"alcohol\"], \
        \"data\":[[9.9,0.25,0.46,1.7,0.062,26.0,42.0,0.9959,3.18,0.83,10.6], \
        [7.8,0.6,0.26,2.0,0.08,31.0,131.0,0.99622,3.21,0.52,9.9], \
        [8.4,0.39,0.1,1.7,0.075,6.0,25.0,0.99581,3.09,0.43,9.7], \
        [12.4,0.35,0.49,2.6,0.079,27.0,69.0,0.9994,3.12,0.75,10.4]]}"

[5.987824005295028, 5.182470026326579, 5.428737873458397, 5.915010072124568]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   544  100    76  100   468   3619  22285 --:--:-- --:--:-- --:--:-- 27200


Linux

In [10]:
! curl http://localhost:43210/invocations \
        -H 'Content-Type: application/json; format=pandas-split' \
        -d '{
        "columns":["fixed acidity","volatile acidity","citric acid","residual sugar", 
        "chlorides","free sulfur dioxide","total sulfur dioxide","density","pH","sulphates","alcohol"],
        "data":[[9.9,0.25,0.46,1.7,0.062,26.0,42.0,0.9959,3.18,0.83,10.6],
        [7.8,0.6,0.26,2.0,0.08,31.0,131.0,0.99622,3.21,0.52,9.9],
        [8.4,0.39,0.1,1.7,0.075,6.0,25.0,0.99581,3.09,0.43,9.7],
        [12.4,0.35,0.49,2.6,0.079,27.0,69.0,0.9994,3.12,0.75,10.4]]}'

IndentationError: unexpected indent (<ipython-input-10-aeb573a17c5a>, line 2)