# MLFlow in Practice

> In this notebook, we'll get started with MLFlow, learning how to create experiments, track our parameters and metrics, and eventually save and look at our model in the MLFlow user interface!

### Installing and importing packages

In [None]:
!pip install mlflow

In [None]:
import os
import warnings
import sys

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
import mlflow.sklearn

import logging

### Our first experiment!

The first thing we need to do is create an experiment in MLFlow to store information about our models. We will want to do this for each new model we want to track so that MLFlow knows to treat them separately

In [None]:
mlflow.set_experiment('mlflow-workshop-1')

Now we're ready to start tracking parameters and metrics! We'll start off with some simple values to get acclamated to how it works.

In [None]:
with mlflow.start_run():
    mlflow.log_param('parameter',3)
    mlflow.log_param('string-parameter','string')
    mlflow.log_metric('accuracy',1)

### Getting some actual numbers involved

Again, we'll want to create another experiment. In the UI, this will create a new tab for these runs so that we can look at them separately

In [None]:
mlflow.set_experiment('mlflow-workshop-2')

Now let's generate some points. We're just going to create 100 points that are roughly in a line, keeping the line there to show what a model might produce if we asked it to learn these points.

In [None]:
def f(x):
    return 3*x + 1
N = 100
x = np.linspace(-2,2,N)
y = f(x) + np.random.randn(N)

plt.plot(x,f(x),color = 'red')
plt.scatter(x,y);

These are our line parameters, slope and intercept. Currently, they are set to the line of best fit that we generated the data from, but we can change these because these are our parameters! Let's try that:

In [None]:
slope = 3
intercept = 1

The code below will show the new line based on the parameters above. Feel free to change the parameters and re-run this cell to see what the lines look like!

In [None]:
plt.plot(x,slope*x + intercept,color = 'green')
plt.scatter(x,y);

Now we need an evaluation metric. Here we will calculate the mean absolute error, which is just the average distance between each point and the green line. Re-run this cell whenever you change the input parameters and watch the error change in response

In [None]:
error = mean_absolute_error(y,slope*x + intercept)
print(error)

Now we can have MLFlow track these exact parameters and save the error too! Running this will log our run under the current experiment. Changing the parameters and running it again will add THAT run as well, and then the MLFlow UI will allow us to compare runs, sort them by error if we want, and see exactly what paramaters went into each one

In [None]:
with mlflow.start_run():
    mlflow.log_param('slope',slope)
    mlflow.log_param('intercept',intercept)
    mlflow.log_metric('mean-sq-error',error)

Now we have a good understanding of what MLFlow is actually doing, let's get to using it on a real model with some actual data

### Modeling on real data

Once again, it's important for us to create a new experiment to track this new model. We ALWAYS need to do this, or else our models will go to the same place!

In [None]:
mlflow.set_experiment('mlflow-workshop-3')

The data we will be using here is Sci-Kit-Learn's wine-quality dataset, which contains some characteristics about various wines and then gives them a 'quality' score that we want to predict.

In [None]:
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', sep=";")

Let's look at that data:

In [None]:
data.head()

Now let's get into some actual modeling. First, we need to split the data into training and test sets and make sure to separate the target column that we want to predict. That is done with the code below:

In [None]:
# Split the data into training and test sets
train, test = train_test_split(data)

# The target column is "quality"
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

The model we will use for this is an ElasticNet, which has two parameter we can adjust, alpha and l1_ratio

In [None]:
#Model Parameters
alpha = 0.8
l1_ratio = 0.5

Let's train the model and get some 

In [None]:
#Train Model
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)

#Get Predictions
predicted_qualities = lr.predict(test_x)

It's as easy as that! Now let's see how we did. The metrics we will be looking at here are mean absolute error just like before, as well as root mean square error and r-squared

In [None]:
rmse = np.sqrt(mean_squared_error(test_y, predicted_qualities))
mae = mean_absolute_error(test_y, predicted_qualities)
r2 = r2_score(test_y, predicted_qualities)

print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
print("  RMSE: %s" % rmse)
print("  MAE: %s" % mae)
print("  R2: %s" % r2)

Now let's do all of that as an mlflow run and watch it appear in the UI!
We'll also add one final step for mlflow to log the model itself for us to be able to get later as well.

In [None]:
with mlflow.start_run():
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    # Evaluation Metrics
    rmse = np.sqrt(mean_squared_error(test_y, predicted_qualities))
    mae = mean_absolute_error(test_y, predicted_qualities)
    r2 = r2_score(test_y, predicted_qualities)

    print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)
    
    mlflow.sklearn.log_model(lr, "model")

Now let's look at how we could get that model back and predict on our data!

In [None]:
logged_model = 'file:///C:/Users/Preston/0workshop-mats/mlruns/4/9b86cfe1e5864bdbbea7e83514446aba/artifacts/model'

loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on a Pandas DataFrame.
predictions = pd.DataFrame(loaded_model.predict(test_x))

And here we are, now we have the predictions from that model, and this code could be run in any notebook in this repo, allowing the quick and easy versioning of models accross a project!

In [None]:
predictions.head(10)