# More cool stuff to do with MLFlow

## Using python functions with MLFlow
MLFlow uses its own knowledge base to know how to wrap ML models developped in Pytorch, Tensorflow, Scikit-learn etc. But sometimes, we don't want to use a model that comes directly from these frameworks. Sometimes, we want to wrap models that are made partially or entirely in Python. Yes Shukri, even you. MLFlow can do that. We will recreate the same model, but this time we will add some python functions to it.

In [1]:
# Imports 
import pandas as pd
import numpy as np
import mlflow
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import train_test_split
# Data
csv_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data = pd.read_csv(csv_url, sep=";")
train, test = train_test_split(data)
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]
# Parameters
alpha = 0.5
l1_ratio = 0.5
# Model (train)
with mlflow.start_run():
    # Train the model
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

Let's get some data to test our model, to be sure that it works once wrapped. I have a dataframe serialized in json for this purpose.

In [2]:
import pandas as pd
df_query = pd.read_json ('test.json', orient='split')

Let's now save our model, in order to be able to open it from another python function. We are simply going to serialize it with pickle. 

In [3]:
import pickle
filename = 'model.sklrn'
outfile = open(filename,'wb')
pickle.dump(lr, outfile)
outfile.close()

In [4]:
infile = open(filename, 'rb')
model = pickle.load(infile)
infile.close()

In [5]:
import pandas as pd
df_query = pd.read_json ('test.json', orient='split')
model.predict(df_query)

array([5.06405619])

## Create a new Python Function that complements our Model
Perfect! It works. This time, we saved our model without mlflow. However, we cannot use MLFLow's magic to serve it yet. But remember, we have more things to add to it. We will add a function to our ML pipeline, that will take the output of the ElasticSearch model and return the verbal evaluation of the wine. In order to do so, I created a text file with verbal evaluations and we are going to map wine qualities to these verbal evaluations with our function.

In [6]:
import json
reaction_file = open('reaction.txt')
values = {}
count = 0
for i in reaction_file.readlines():
    values[str(count)] = i.replace('\n','')
    count += 2.5
reaction_file.close()

values

{'0': 'Rooooh! Very bad!',
 '2.5': 'Non non non! Not acceptable!',
 '5.0': 'Mouais, Not bad',
 '7.5': 'Ah! This is quite tasty!',
 '10.0': 'Scrogneugneux! Amazing!'}

In [7]:
def reaction(quality, values):
    index = min(values, key=lambda x:abs(float(x)-quality))
    return values[index]

Let's test it...

In [8]:
reaction(3.612627480115992, values)

'Non non non! Not acceptable!'

## Wrap the pipeline
Perfect. Now, we will wrap our own pipeline (our two functions) in a bigger function, that will get a wine sample as an input, load the pickled sklearn model, use the imput to predict wine quality (double), and use the new function to give a verbal evaluation (string), then return this evaluation as an output. This time, we will wrap everything with MLFlow in order to serve it.

In [9]:
# Create main class
import pickle
from mlflow.pyfunc import PythonModel

class wine_predict(PythonModel):
    
    def __init__(self):
        None
    
    def reaction(quality, values):
        index = min(values, key=lambda x:abs(float(x)-myNumber))
        return values[index]

    def predict(self, context, pd_input):
        infile = open('model.sklrn','rb') if context is None else open(context.artifacts["model"], 'rb')
        reaction_file = open('reaction.txt') if context is None else open(context.artifacts["reaction"])
        # Load model
        model = pickle.load(infile)
        infile.close()
        # Load reactions
        values = {}
        count = 0
        for i in reaction_file.readlines():
            values[str(count)] = i.replace('\n','')
            count += 2.5
        reaction_file.close()
        # Get score
        quality = model.predict(pd_input)
        # Get value
        return reaction(quality, values)

Let's test the whole pipeline...

In [10]:
test = wine_predict()
test.predict(None, df_query)

'Mouais, Not bad'

## Use artifacts and generate Conda environments
Remember how we had to create a yaml file with the conda environment? We can do it from the code, creating it as a dictionnary as a parameter. We will do this now to simplify the deployment. The second thing that we need to do is to handle artifacts. Our code needs the *reaction.txt* file to work properly. This file might not be present in the environment where we serve our model. For this reason, we will need to specify it as a needed artefact with MLFlow. The same reasoning applies to our piclked model.

In [11]:
import mlflow
import os
## Create the conda environment of our mlflow (python and pathos for model serving)
conda_env = {
    'name': 'mlflow-env',
    'channels': ['defaults'],
    'dependencies': [
        'python=3.8',
        'numpy',
        'pandas',
        'scikit-learn',
    ]
}

# Specify the artifacts of our model (the database and the saved minhash model)
artifacts = {
    "model": 'model.sklrn',
    "reaction": 'reaction.txt'
}

## Remove previously saved models
os.system('rm -r my_model')

## Save the model
mlflow.pyfunc.save_model("my_model", python_model=test, artifacts=artifacts, conda_env=conda_env)

## Test new Model
The model has been saved with its artifacts and conda environment. This model is a mixt of python code and Scikit-learn models. We will test it as we tested our purely scikit-learn model. 

In [12]:
import socket
ip = socket.gethostbyname(socket.gethostname())
ip

'192.168.0.115'

In [13]:
import os
command = 'mlflow models serve -m my_model -h '+ip+' -p 1234'
command

'mlflow models serve -m my_model -h 192.168.0.115 -p 1234'

In [14]:
external_ip = '94.155.120.231'

In [15]:
query = '''curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://'''+external_ip+''':1234/invocations'''
print(query)

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://94.155.120.231:1234/invocations
