# Mlflow to BentoML

[BentoML](http://bentoml.ai) is an open-source framework for machine learning **model serving**, aiming to **bridge the gap between Data Science and DevOps**.

[MLflow](https://mlflow.org/) is an open source platform for the machine learning lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

You might want to use Mlflow to keep track of your  training but you would prefer to use BentoML to deploy your models in productions. You can see a comparison between the 2 [here](https://docs.bentoml.org/en/latest/faq.html?highlight=mlflow#how-does-bentoml-compare-to-mlflow)

This notebook will demonstrate you how you can load a model from Mlflow model and package it with BentoML for deployment. We will break it down in the following parts:
1. Train a model based on iris dataset and save it using MLflow
2. Load the model from MLflow and package it with BentoML
3. Containerize the model with docker

BentoML requires python 3.6 or above, install dependencies via `pip`:

In [1]:
# Install PyPI packages required in this guide, including BentoML
!pip install -q bentoml  # install preview version of BentoML for this guide
!pip install -q 'scikit-learn>=0.23.2' 'mlflow>=1.13.1' 'matplotlib'

## 1. Train a model and save it using MLflow
Like in the quick-start, let's train a classifier model on the [Iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set):

In [2]:
from sklearn import svm
from sklearn import datasets
import mlflow
from mlflow.models.signature import infer_signature

# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Model Training and saving in MLflow
clf = svm.SVC(gamma='scale')
with mlflow.start_run() as run:
    clf.fit(X, y)
    mlflow.sklearn.log_model(
        sk_model=clf,
        artifact_path="model",
        signature=infer_signature(X),
    )

The model has been trained and saved in Mlflow. You can see it using the mlflow ui by running 

In [3]:
!mlflow ui --port=5001

[2021-02-13 23:26:33 +0100] [12078] [INFO] Starting gunicorn 20.0.4
[2021-02-13 23:26:33 +0100] [12078] [INFO] Listening at: http://127.0.0.1:5001 (12078)
[2021-02-13 23:26:33 +0100] [12078] [INFO] Using worker: sync
[2021-02-13 23:26:33 +0100] [12081] [INFO] Booting worker with pid: 12081
^C
[2021-02-13 23:27:18 +0100] [12078] [INFO] Handling signal: int
[2021-02-13 23:27:18 +0100] [12081] [INFO] Worker exiting (pid: 12081)


# 2. Load the model from MLflow and package it with BentoML

Like in the quick-start, the first step is creating a
prediction service class, which defines the models required and the inference APIs which
contains the serving logic. Here is a minimal prediction service created for serving
the iris classifier model trained above:

In [4]:
%%writefile iris_classifier.py
import pandas as pd

from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.frameworks.sklearn import SklearnModelArtifact

@env(infer_pip_packages=True)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):
    """
    A minimum prediction service exposing a Scikit-learn model
    """

    @api(input=DataframeInput(), batch=True)
    def predict(self, df: pd.DataFrame):
        """
        An inference API named `predict` with Dataframe input adapter, which codifies
        how HTTP requests or CSV files are converted to a pandas Dataframe object as the
        inference API function input
        """
        return self.artifacts.model.predict(df)

Overwriting iris_classifier.py


This code defines a prediction service that packages a scikit-learn model and provides
an inference API that expects a `pandas.Dataframe` object as its input. 

We will now load this MLflow model. 

In [5]:
model_uri = f"runs:/{run.info.run_id}/model"
print(f"Retrieving model with uri={model_uri}")
mlflow_loaded_model = mlflow.sklearn.load_model(model_uri)

Retrieving model with uri=runs:/7f1e9a8ce364450596460d5ef0f2e35f/model


The following code packages the model loaded from MLflow with the prediction service class
`IrisClassifier` defined above, and then saves the IrisClassifier instance to disk 
in the BentoML format for distribution and deployment:

In [6]:
# import the IrisClassifier class defined above
from iris_classifier import IrisClassifier

# Create a iris classifier service instance
iris_classifier_service = IrisClassifier()

# Pack the newly trained model artifact
iris_classifier_service.pack('model', mlflow_loaded_model)

# Save the prediction service to disk for model serving
saved_path = iris_classifier_service.save()

[2021-02-13 23:27:20,667] INFO - BentoService bundle 'IrisClassifier:20210213232719_B5F4D1' saved to: /home/theodore/bentoml/repository/IrisClassifier/20210213232719_B5F4D1


BentoML stores all packaged model files under the
`~/bentoml/{service_name}/{service_version}` directory by default.
The BentoML file format contains all the code, files, and configs required to 
deploy the model for serving.


## REST API Model Serving



To start a REST API model server with the `IrisClassifier` saved above, use 
the `bentoml serve` command:

In [7]:
!bentoml serve IrisClassifier:latest

[2021-02-13 23:27:52,275] INFO - Getting latest version IrisClassifier:20210213232719_B5F4D1
[2021-02-13 23:27:52,276] INFO - Starting BentoML API server in development mode..
 * Serving Flask app "IrisClassifier" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
^C


If you are running this notebook from Google Colab, you can start the dev server with `--run-with-ngrok` option, to gain acccess to the API endpoint via a public endpoint managed by [ngrok](https://ngrok.com/): 

In [8]:
!bentoml serve IrisClassifier:latest --run-with-ngrok

[2021-02-13 23:28:07,942] INFO - Getting latest version IrisClassifier:20210213232719_B5F4D1
[2021-02-13 23:28:07,942] INFO - Starting BentoML API server in development mode..
^C


The `IrisClassifier` model is now served at `localhost:5000`. Use `curl` command to send
a prediction request:

```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
localhost:5000/predict
```

Or with `python` and [request library](https://requests.readthedocs.io/):
```python
import requests
response = requests.post("http://127.0.0.1:5000/predict", json=[[5.1, 3.5, 1.4, 0.2]])
print(response.text)
```

Note that BentoML API server automatically converts the Dataframe JSON format into a
`pandas.DataFrame` object before sending it to the user-defined inference API function.

The BentoML API server also provides a simple web UI dashboard.
Go to http://localhost:5000 in the browser and use the Web UI to send
prediction request:

![BentoML API Server Web UI Screenshot](https://raw.githubusercontent.com/bentoml/BentoML/master/guides/quick-start/bento-api-server-web-ui.png)

# 3. Containerize the model with docker

One common way of distributing this model API server for production deployment, is via
Docker containers. And BentoML provides a convenient way to do that.

Note that `docker` is __not available in Google Colab__. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a 
docker container serving the `IrisClassifier` prediction service created above:

In [12]:
!bentoml containerize IrisClassifier:latest -t iris-classifier:latest

[2021-02-13 23:17:15,147] INFO - Getting latest version IrisClassifier:20210213230147_11DA63
[39mFound Bento: /home/theodore/bentoml/repository/IrisClassifier/20210213230147_11DA63[0m
|[32mBuild container image: iris-classifier:latest[0m
 

Start a container with the docker image built in the previous step:

In [13]:
!docker run -p 5000:5000 iris-classifier:latest --workers=1 --enable-microbatch

[2021-02-13 22:17:26,238] INFO - Starting BentoML API server in production mode..
[2021-02-13 22:17:26,262] INFO - Running micro batch service on :5000
[2021-02-13 22:17:26 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2021-02-13 22:17:26 +0000] [10] [INFO] Starting gunicorn 20.0.4
[2021-02-13 22:17:26 +0000] [1] [INFO] Listening at: http://0.0.0.0:55989 (1)
[2021-02-13 22:17:26 +0000] [10] [INFO] Listening at: http://0.0.0.0:5000 (10)
[2021-02-13 22:17:26 +0000] [10] [INFO] Using worker: aiohttp.worker.GunicornWebWorker
[2021-02-13 22:17:26 +0000] [1] [INFO] Using worker: sync
[2021-02-13 22:17:26 +0000] [12] [INFO] Booting worker with pid: 12
[2021-02-13 22:17:26 +0000] [11] [INFO] Booting worker with pid: 11
[2021-02-13 22:17:26,330] INFO - Micro batch enabled for API `predict` max-latency: 10000 max-batch-size 2000
[2021-02-13 22:17:26,330] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same ti

This made it possible to deploy BentoML bundled ML models with platforms such as
[Kubeflow](https://www.kubeflow.org/docs/components/serving/bentoml/),
[Knative](https://knative.dev/community/samples/serving/machinelearning-python-bentoml/),
[Kubernetes](https://docs.bentoml.org/en/latest/deployment/kubernetes.html), which
provides advanced model deployment features such as auto-scaling, A/B testing,
scale-to-zero, canary rollout and multi-armed bandit.

# Summary

This is a very short example how you can load a model from MLflow and serve it using BentoML. 

We recently looked into building the integration, the idea was to make BentoML support and serve the model format created in MLFlow directly. Although the team has concluded it is probably a really bad idea. The main difficulty of doing that is MLFlow's model format is not really designed for serving. And when turning a trained model to a prediction service, there are a number of things that may require the users' attention, which is not supported in MLFlow. In particular, what is the input/output data schema of the prediction endpoint, what are the local code dependencies, and how to preprocess a batch of input data, so it can take advantage of the micro-batching mechanism provided by BentoML, etc.

There might be other ways we can improve the integration with MLFlow, but for now, we decided to get started with this documentation on how users can potentially build a workflow that takes advantage of both frameworks.
