# MLFlow example

Note: This notebook runs on Python 3.6 and uses UbiOps CLient Library 3.3.0.

In this notebook we will show you the following:

How to create a deployment that uses a model built and tracked using MLFlow which will help to find the optimal parameters, this model will use several features to predict the quality of wine.

This example uses the a sample dataset from one of the examples on the MLFlow website. [Link to the dataset](https://github.com/mlflow/mlflow-example)


If you run this entire notebook after filling in your access token, the mlflow deployment will be deployed to your UbiOps environment. You can thus check your environment after running to explore. You can also check the individual steps in this notebook to see what we did exactly and how you can adapt it to your own use case.

We recommend to run the cells step by step, as some cells can take a few minutes to finish. You can run everything in one go as well and it will work, just allow a few minutes for building the individual deployments.

### Installing the required packages
We will use several packages to create our model and deploy it.

In [None]:
!pip install pandas
!pip install numpy
!pip install sklearn
!pip install mlflow

## The model
A model that looks at features of wine and tries to predict the quality based on that. This is based on the [Example from the MLFlow documentation](https://github.com/mlflow/mlflow-example)

In [None]:
%load wine-model/train.py
# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

# Testing for the most optimal parameters
We can do this in one of two ways:
* Manually via the command line
* Programmatically using python

We will use the latter in this example because it can be automated and would take less time.

## Testing best parameters
We can also use the mlflow package to test a list of possible settings to see which performs the best.

In [None]:
parameters = [
    {'alpha': 0.3, 'l1_ratio': 0.1},
    {'alpha': 0.2, 'l1_ratio': 0.7},
    {'alpha': 0.4, 'l1_ratio': 0.2},
    {'alpha': 0.5, 'l1_ratio': 0.7},
    {'alpha': 0.1, 'l1_ratio': 0.9},
    {'alpha': 0.2, 'l1_ratio': 0.2},
    {'alpha': 0.7},
]

model_location = 'wine-model'


In [None]:
import mlflow

for param in parameters:
    print(f'Running with param = {param}')
    res = mlflow.run(model_location, parameters=param, use_conda=False)
    print(f'status={res.get_status()}')


## Comparing the results
Start a terminal session and run this (in the mlflow-example folder). Then head over to [the MLFlow UI](http://localhost:5000)
```
mlflow ui
```

![Comparing runs](images/1.png)

## Selecting the optimal run

After running you can view the runs of your model with the metrics of each time and compare to find the best configuration for use case.

For my example I would like to use the model with the lowest root mean square error (RMSE), running the code in the cell below will find that run id and copy the built model into our deployment folder.



In [None]:
from shutil import copyfile
import pandas as pd
import os

# Reading Pandas Dataframe from mlflow
df=mlflow.search_runs(filter_string="metrics.rmse < 1")

# Fetching Run ID for
run = df.loc[df['metrics.rmse'].idxmin()]
run_id = run['run_id']

print(f'The optimal run id is {run_id}')
print(f'It had the parameters: alpha={run["params.alpha"]}, l1_ratio={run["params.l1_ratio"]}')
print(f'And RMSE: {run["metrics.rmse"]}')


copyStatus = copyfile(f'mlruns/0/{run_id}/artifacts/model/model.pkl', 'mlflow_deployment_package/model.pkl')
print('Model copied to the deployment!')

# Deployment steps
Now that we have the optimal model copied into our deployment folder we will start the steps to deploy it to our UbiOps environment.

In [None]:
API_TOKEN = "<INSERT API_TOKEN WITH PROJECT EDITOR RIGHTS>" # Make sure this is in the format "Token token-code"
PROJECT_NAME = "<INSERT PROJECT NAME IN YOUR ACCOUNT>"
DEPLOYMENT_NAME = 'mlflow-deployment'
DEPLOYMENT_VERSION = 'v1'

# Import all necessary libraries
import shutil
import os
import ubiops

client = ubiops.ApiClient(ubiops.Configuration(api_key={'Authorization': API_TOKEN}, 
                                               host='https://api.ubiops.com/v2.1'))
api = ubiops.CoreApi(client)

In [None]:
# %load mlflow_deployment_package/deployment.py
"""
The file containing the deployment code is required to be called 'deployment.py' and should contain the 'Deployment'
class and 'request' method.
"""

import os
import pickle
import pandas as pd




class Deployment:

    def __init__(self, base_directory, context):
        """
        Initialisation method for the deployment. It can for example be used for loading modules that have to be kept in
        memory or setting up connections. Load your external model files (such as pickles or .h5 files) here.

        :param str base_directory: absolute path to the directory where the deployment.py file is located
        :param dict context: a dictionary containing details of the deployment that might be useful in your code.
            It contains the following keys:
                - deployment (str): name of the deployment
                - version (str): name of the version
                - input_type (str): deployment input type, either 'structured' or 'plain'
                - output_type (str): deployment output type, either 'structured' or 'plain'
                - language (str): programming language the deployment is running
                - environment_variables (str): the custom environment variables configured for the deployment.
                    You can also access those as normal environment variables via os.environ
        """

        print("Initialising the model")

        model_file = os.path.join(base_directory, "model.pkl")
        with open('model.pkl', 'rb') as f:
            self.model = pickle.load(f)


    def request(self, data):
        """
        Method for deployment requests, called separately for each individual request.

        :param dict/str data: request input data. In case of deployments with structured data, a Python dictionary
            with as keys the input fields as defined upon deployment creation via the platform. In case of a deployment
            with plain input, it is a string.
        :return dict/str: request output. In case of deployments with structured output data, a Python dictionary
            with as keys the output fields as defined upon deployment creation via the platform. In case of a deployment
            with plain output, it is a string. In this example, a dictionary with the key: output.
        """
        print('Loading data')
        input_data = pd.read_csv(data['data'])
        
        print("Prediction being made")
        prediction = self.model.predict(input_data)
        
        # Writing the prediction to a csv for further use
        print('Writing prediction to csv')
        pd.DataFrame(prediction).to_csv('prediction.csv', header = ['MPG'], index_label= 'index')
        
        return {
            "prediction": 'prediction.csv',
        }


# Create Deployment

In [None]:
# Create the deployment
deployment_template = ubiops.DeploymentCreate(
    name=DEPLOYMENT_NAME,
    description='MLFlow deployment',
    input_type='structured',
    output_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='data',
            data_type='blob',
        ),
    ],
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='prediction',
            data_type='blob'
        ),
    ],
    labels={'demo': 'mlflow-recipe'}
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    language='python3.6',
    memory_allocation=512,
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800, # = 30 minutes
    request_retention_mode='none' # We don't need to store the requests for this deployment    
)

api.deployment_versions_create(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    data=version_template
)

# Zip the deployment package
shutil.make_archive('mlflow_deployment_package', 'zip', '.', 'mlflow_deployment_package')

# Upload the zipped deployment package
file_upload_result =api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    file='mlflow_deployment_package.zip'
)

## Making a request and exploring further
You can go ahead to the Web App and take a look in the user interface at what you have just built. If you want you can create a request to the mlflow deployment using the "dummy_data_to_predict.csv". The dummy data is just the horsepower data.

So there we have it! We have created a deployment and using the mlfow tool tested the best parameters for it. You can use this notebook to base your own deployments on. Just adapt the code in the deployment packages and alter the input and output fields as you wish and you should be good to go.

For any questions, feel free to reach out to us via the customer service portal: https://ubiops.atlassian.net/servicedesk/customer/portals