# The Easiest Way to Deploy Your ML/DL Models in 2022: Streamlit + BentoML + DagsHub
## Deploy models as lightweight APIs with a user-friendly interface
![](images/pexels.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/photo/blue-red-and-black-abstract-painting-2130475/'>Steve Johnson</a>
    </strong>
</figcaption>


### Introduction

You have a ready machine learning model. Now what? What do you do with it? Do you keep it inside your Jupyter like a prized possession for no one to see? No, you make it as stupidly easy as possible for others to share and play around with your work. In other words, you *deploy* the model.

But how? Should you just share the model as a file? No, that would be the worst.

How about a Docker container? Yeah, that is better - the user will have it all to run your model locally. But... they would still need to do it inside a coding environment. Not very convenient. 

Then, what about an API? Well, good luck explaining what an API is to a non-programmer person. 

Hey, what if you built a web app with a minimal interface? Yes, this seems to be the best option. But how to build the app? Don't you need to learn a web framework for that? That's too much work!

### What problem are we solving?

The deployment method I will be showing today works for any model architecture from any ML/DL framework. But, like we've been doing for the past two articles, we will continue working on [the Pet Pawpularity dataset](https://www.kaggle.com/c/petfinder-pawpularity-score). 

The PetFinder.my dataset collected this image dataset of cats and dogs and gave each a cuteness score with their in-house scoring algorithms. We've been trying to use this dataset to predict the cuteness score of a pet given its image. 

The dataset was used to host the Pet Pawpularity competition on Kaggle with the hopes that top solutions would be used to improve pet adoption. 

In the first article, we performed EDA on the images and their metadata and outlined our approach to solve the problem and the tools we will be using on the way.

The second article was a comprehensive tutorial on tracking your machine learning experiments while we tried to find a good model to predict the cuteness score. 

The final article will show how to deploy one of the Keras image regression models from the second part as an API and a web app so that any type of user can get a cuteness score for their pets.

### What tools will we be using to deploy?

It is hard to agree on the best tools to serve models in production because each problem is unique and their solutions have different constraints. 

Therefore, I wanted to choose a solution or a set of tools that would benefit as much people as possible. I wanted the solution to be simple enough so that it would only take a few minutes to whip up a working prototype and serve it online and if needed, can be scaled to a larger scale problems.

The core component of this solution is the [BentoML package](https://docs.bentoml.org/en/latest/). It is one of the latest promising players in the MLOps landscape and already amassed half a million downloads on GitHub. 

Its purpose is to to serve ML models as API endpoints with as few lines as possible and without hassles of other frameworks like Flask. It integrates almost any ML framework you can think of:

![image.png](attachment:d80638f5-61e0-4394-abea-3aa6c7af0ca1.png)
![image.png](attachment:502ad9ee-7989-4e75-9786-db626d72dfd7.png)

and if you can't find your favorite from the list, their support is probably under development. 

Jumping a few steps ahead, you can check out [this link](https://pet-pawpularity.herokuapp.com/) that contains the deployed API we are going to build in this article. 

![image.png](attachment:439682ad-ec4d-4d55-8b64-46d4fe5e552c.png)

The deployment shouldn't just stop at an API either. After all, APIs are only for programmers and you need something that enables non-programming community to interact with your model. That's where Streamlit comes in.

Streamlit isn't probably new to you. It is already established as a go-to library for creating minimalistic web apps for almost any type of ML application. 

Since we are building the Streamlit UI on top of an API, the web app will be even more lightweight. You won't have dependency issues as you will only need the `requests` library to process requests to the BentoML API through the Streamlit app. 

Below, you can see the [app we will be building](https://share.streamlit.io/bextuychiev/pet_pawpularity/ui/src/ui.py) in this article:

![image.png](attachment:e19ad733-9fa9-4542-9951-c4152982eb99.png)

Last but not least, we will use [DagsHub](https://dagshub.com/) again to manage the project as a whole. DagsHub is the GitHub for data professionals and allows you to do 360-degree machine learning.

You can use a DagsHub repository for many tasks:
- hosting code: works just like GitHub
- hosting data: has dedicated storage for data storage with DVC
- experiment tracking: has an experiments tab that makes it easy to find the best model for your problem
- another dedicated storage for versioning MLFlow artifacts and experiments

![image.png](attachment:60e81e9c-448b-4e73-a66a-c0313c38de12.png)

We've used DagsHub most heavily in the second part during experimentation:

https://towardsdatascience.com/19-hidden-sklearn-features-you-were-supposed-to-learn-the-hard-way-5293e6ff149

Machine learning lifecycle isn't just about deployment. For a model to be successful in production, it needs solid foundation in infrastructure. DagsHub allows you to build that foundation.

Now, let's jump into the main part of the article and start by explaining how to use BentoML to create an API endpoint for a prediction service.

### Step 1: Save the best model to BentoML local store

Let's start by importing the necessary libraries:

```python
import logging

import bentoml  # pip install bentoml --pre
import joblib
import tensorflow as tf
```

Make sure you install `bentoml` with the `--pre` tag since it is still in preview. 

Below, we will create a couple helper functions to create and train a Keras convolution model:

```python
def get_keras_conv2d():
    """A function to build an instance of a Keras conv2d model."""
    
    model = ...
    
    return model


def fit_keras_conv2d():
    """
    A function to train a Keras conv2d model.
    """
    model = get_keras_conv2d()
    
    #-- Fit model with early stopping and 30 epochs on the images --#
    
    return model
```

I've left out the body of the first function but all it does is it creates a Conv2D instance with three hidden layers, with dropout and MaxPool layers in-between. We don't have to focus on model architecture too much. 

`fit_keras_conv2d` uses the first function and trains the obtained model with early stopping and 30 epochs.

Next, we create a function to save the model to BentoML local store:

```python
def save(model, bentoml_name, path):
    """
    A function to save a given model to BentoML local store and with joblib.
    """
    bentoml.keras.save(bentoml_name, model, store_as_json_and_weights=True)
    
    joblib.dump(model, path)
```

The `keras.save` function specifically saves Keras models in a format suitable for other BentoML operations. 

So, let's run these functions to get a ready model:

```python
def main():
    
    model = fit_keras_conv2d()
    
    logging.log(logging.INFO, "Saving...")
    
    save(model, 
         "keras_conv2d_smaller", 
         "models/keras_conv2d_smaller.joblib")
    
    logging.log(logging.INFO, "Done!")
    
if __name__ == "__main__":
    main()
```

After the training and saving is done, you can run the below command to get a list of models in the BentoML store:

```bash
$ bentoml models list
```
![](images/models_list.gif)


The saved models are officially called *tags* in the BentoML docs. By default, all models will be saved under your home directory and the `bentoml/models` folder with a random tag name, in case there are multiple models with the same name.

If you go into the given path, you will find files like these:

```
checkpoint
model.yaml
saved_model_json.json
saved_model_weights.data-00000-of-00001
saved_model_weights.index
```

You can always load the model back using the `load_runner` function preceded by the relevant framework name:

```python
model = bentoml.keras.load_runner("keras_conv2d_smaller:latest")

# Load a sample image from memory
img = ...

print(model.run(img))
```

After loading, the model can be used for prediction using its `run` method, which calls the `predict` method of Keras `Model` object under the hood.

### Step 2: Create the service

Now, we only need to write a few lines of code to convert the saved model into a functioning API we can send requests to. 

First, we write a function to create a Service object of BentoML that takes care of all API logic without any effort on our part. 

After loading the model back with the `load_runner` function, we pass it to the `Service` method with an arbitrary name. 

```python
def create_bento_service_keras(bento_name):
    """
    Create a Bento service for a Keras model.
    """
    # Load the model
    keras_model = bentoml.keras.load_runner(bento_name)

    # Create the service
    service = bentoml.Service(bento_name + "_service", runners=[keras_model])

    return keras_model, service

model, service = create_bento_service_keras("conv2d_larger_dropout")

```

After that, we create an API endpoint that handles our POST requests. You create endpoints in BentoML by defining a function decorated with the `api` method of the service object we just created:

```python
import numpy as np
import bentoml
from bentoml.io import Text, NumpyNdarray
from skimage.transform import resize
```

```python
# Create an API function
@service.api(input=Text(), output=NumpyNdarray())
def predict(image_str) -> np.ndarray:
    """
    Predict pet pawpularity from an image using the given Bento.
    """
    # Convert the image back to numpy array
    image = np.fromstring(image_str, np.uint8)
    image = resize(image, (224, 224, 3))
    image = image / 255.0

    result = model.run(image)

    return result
```

Before discussing body, let's talk about the `service.api` decorator. It has two required parameters - `input` and `output`. 

These parameters should be defined based on what type of data we will be sending and getting back from the API endpoint. 

The purpose of the above `predict` endpoint is that it returns a cuteness score when we send a request with an image. So, I've defined the input as `Text()` because we will be sending the NumPy image array as a string. The output should be `NumpyNdarray()` because when we call `model.run(image)`, the return data type will be a Numpy array.

Getting the right data type for the endpoint is important. You can read [this page](https://docs.bentoml.org/en/latest/api/api_io_descriptors.html?highlight=Image#api-io-descriptors) of the BentoML docs on other types of data you can process through endpoints.

As for the body, you should write all preprocessing logic to the image before you call `model.run`. While writing the training logic, I've resized the images to (224, 224, 3) and normalized them by dividing their pixel values by 255. So, I've performed those steps inside the endpoint function as well.

> Important: If you are using other frameworks like Sklearn for tabular data, make sure you run all your preprocessing steps inside the API endpoint as well. We can achieve this by pickling all your processing functions and calling them inside the `predict` function, so there won't be any data leakage or you won't pass incorrectly formatted data.

Now, to start a debug server for our API, you only need to put all above into a single Python file, conventionally named `service.py`, in the root directory and call the below command:

```bash
$ bentoml serve service.py:service --reload
```

![](images/bentoml_serve.gif)

The `--reload` tag makes sure that the local server detects changes to the `service.py` and updates the logic automatically. 

From the GIF, you can see that the server is live at http://127.0.0.1:3000/ with a simple UI:
![](images/local_server.gif)

We can already send requests to the local server and get predictions for the images:

```python

import requests
from skimage.io import imread

endpoint = "http://127.0.0.1:3000/predict"

# Load a sample image
img = imread("data/raw/train/0a0da090aa9f0342444a7df4dc250c66.jpg")

response = requests.post(endpoint, headers={"content-type": "text/plain"},
                    data=str(img))
```

Make sure to set the right headers for your data type and send the image wrapped under the `str` function. Once again, you can find the examples of requests with the right content headers for each data type from [this page](https://docs.bentoml.org/en/latest/concepts/api_io_descriptors.html#) of the docs.

Let's look at the response text:

```python
>>> print(response.text)
[35.49753189086914]
```

And the image we sent was this:

![0a0da090aa9f0342444a7df4dc250c66.jpg](attachment:44297cfd-604b-4195-b579-4f216844690d.jpg)
<figcaption style="text-align: center;">
    <strong>
        Image from the <a href='https://www.kaggle.com/competitions/petfinder-pawpularity-score/data'>Pet Pawpularity dataset</a>.
    </strong>
</figcaption>

### Step 3: Build the Bento

Now, we are ready to create our very first Bento. 

The term Bento means an archive that contains everything to run our services or API online, including all the code, model, dependency info, and configurations for setup. Building it starts with creating a `bentofile.yaml` file in the directory that is the same level as the `service.py` file (preferably, both should be in the project root):

```YAML
service: "service.py:service"
include:
 - "service.py"
python:
  packages:
   - scikit_learn==1.0.2
   - numpy==1.22.3
   - tensorflow==2.8.0
   - scikit_image==0.18.3
```

The first line of the YAML file should contain the service file's name followed by ":service" suffix. Next, you add all the files needed for the `service.py` file work without errors (data, helper scripts, etc.). Here, I only included the service file itself as we didn't use any additional scripts inside it.

Then, under Python and packages, you specify the dependencies and their versions. If you are not sure about the versions, there is a helpful little package I always use called `pipreqs`:

```bash
$ pip install pipreqs

$ pipreqs .
```

Calling `pipreqs [path]` creates a `requirements.txt` file with all the used packages inside your Python files and their versions in the given path like below: 

```
bentoml==1.0.0a7
catboost==0.26.1
dagshub==0.1.8
joblib==0.17.0
keras==2.8.0
lightgbm==2.3.1
matplotlib==3.3.1
mlflow==1.24.0
numpy==1.22.3
pandas==1.3.2
scikit_image==0.18.3
scikit_learn==1.0.2
seaborn==0.11.0
skimage==0.0
tensorflow==2.8.0
tqdm==4.50.0
xgboost==1.4.2

```

After listing the dependencies, you only need to call `bentoml build`:

```bash
$ bentoml build
```

![](images/bento_build.png)

To see a list of all your bentos, call `bentoml list`

```bash
$ bentoml list
```
![](images/bento_list.png)

### Step 4: Deploy to Heroku

The `build` command saves a fresh bento inside the local store with the following tree structure:

```bash
    ├───apis
    │       openapi.yaml
    ├───env
    │   ├───conda
    │   ├───docker
    │   │       Dockerfile
    │   │       entrypoint.sh
    │   │       init.sh
    │   └───python
    │           requirements.lock.txt
    │           requirements.txt
    ├───models
    │   └───keras_conv2d
    │       │   latest
    │       │
    │       └───b52h7x5xpk2bejcl
    │               checkpoint
    │               model.yaml
    │               saved_model_json.json
    │               saved_model_weights.data-00000-of-00001
    │               saved_model_weights.index
    └───src
    │       service.py
    │   bento.yaml
    │   README.md
```

You can read what each of these sub-folders do from [this page](https://docs.bentoml.org/en/latest/concepts/building_bentos.html#building-bentos). The folder we are interested in is the `env/docker` one. It contains everything need to build a fully functional Docker container. We will use it to deploy our API online.

There are many options for this, like Amazon EC or Google Cloud Platform but the most hassle free platform is Heroku. 

Heroku is a popular cloud application platform that enables developers of any language to build and maintain cloud applications. If you haven't already, [create an account](https://signup.heroku.com/) and [download the CLI](https://devcenter.heroku.com/articles/heroku-cli#install-the-heroku-cli), which you can use to create and manage your Heroku apps. 

After installing, call `login` to authenticate your terminal session:

```bash
$ heroku login
```

This opens up a tab in the browser where you can login with your credentials. Next, log in to the container registry:

```bash
$ heroku container:login
```

Now, let's create an app named `pat-pawpularity`:

```bash
$ heroku create pet-pawpularity
```

Afterwards, the app should be visible at https://dashboard.heroku.com/apps:

![image.png](attachment:091ea8e4-ac8c-4730-bf53-9524db5276a7.png)

Now, we need to push our Bento to this app so that it becomes online. To do that, we need to `cd` into the bento directory (which you can find with `bentoml list`) and inside the `docker` folder:

```bash
$ cd ~/bentoml/bentos/keras_conv2d_smaller_service/uaaub3v3cku3ejcl
$ cd env/docker
```

From there, you call this command:

```bash
$ heroku container:push web --app pet-pawpularity --context-path=../..
```

Depending on the size of the archive, the command will take a few minutes to finish. 

Finally, you can release the app with the below command:

```bash
$ heroku container:release web --app pet-pawpularity
```

Now, you can go to https://pet-pawpularity.herokuapp.com/ to see the API online or go to the app's page on your dashboard to open it:

![image.png](attachment:264a94ca-26c2-4169-8f2f-b0cc0d081f74.png)

Now, anyone can send a request to this API. Let's try:

```python
import requests
from skimage.io import imread

endpoint = "https://pet-pawpularity.herokuapp.com/predict"

# Load a sample image
img = imread("data/raw/train/0a4f658ae77b7e4209e22b79fe1c28cb.jpg")

response = requests.post(
    endpoint, headers={"content-type": "text/plain"}, data=str(img)
)

>>> print(response.text)
[27.414047241210938]
```

### Step 5: Build a simple UI with Streamlit

Now, let's build a lightweight user interface around our API. We will start by writing a simple header section for our app with an arbitrary image:

```python
import io

import numpy as np
import requests
import streamlit as st
from PIL import Image

API_ENDPOINT = "https://pet-pawpularity.herokuapp.com/predict"

# Create the header page content
st.title("Pet Pawpularity Prediction App")
st.markdown(
    "### Predict the popularity of your cat or dog with machine learning",
    unsafe_allow_html=True,
)
# Upload a simple cover image
with open("data/app_image.jpg", "rb") as f:
    st.image(f.read(), use_column_width=True)

st.text("Grab a picture of your pet or upload an image to get a Pawpularity score.")
```

Next, we define the core functionality. We will create a function that gets a cuteness score by sending a request to our API:

In [3]:
def predict(img):
    """
    A function that sends a prediction request to the API and return a cuteness score.
    """
    # Convert the bytes image to a NumPy array
    bytes_image = img.getvalue()
    numpy_image_array = np.array(Image.open(io.BytesIO(bytes_image)))

    # Send the image to the API
    response = requests.post(
        API_ENDPOINT,
        headers={"content-type": "text/plain"},
        data=str(numpy_image_array),
    )

    if response.status_code == 200:
        return response.text
    else:
        raise Exception("Status: {}".format(response.status_code))

Images uploaded to Streamlit apps will have a 'BytesIO' format, so we need to convert them to a NumPy array first. Lines 6-7 accomplish this. The rest is fairly self-explanatory.

Now, we create two image input components - one for file uploads and another for a web-cam input:

```python
def main():
    img_file = st.file_uploader("Upload an image", type=["jpg", "png"])
    if img_file is not None:

        with st.spinner("Predicting..."):
            prediction = float(predict(img_file).strip("[").strip("]"))
            st.success(f"Your pet's cuteness score is {prediction:.3f}")

    camera_input = st.camera_input("Or take a picture")
    if camera_input is not None:

        with st.spinner("Predicting..."):
            prediction = float(predict(camera_input).strip("[").strip("]"))
            st.success(f"Your pet's cuteness score is {prediction:.3f}")

if __name__ == "__main__":
    main()
```



When used, both of these components will display a simple animation for standby and then returns a cuteness score. Here is our app in action:

![](images/app_demo.gif)

Now, after you've pushed these changes to GitHub, you can deploy your app online by going to https://share.streamlit.io/deploy:

![image.png](attachment:e555f0b4-43ee-4a24-9776-85c299cedc31.png)

Here is my deployed app: https://share.streamlit.io/bextuychiev/pet_pawpularity/ui/src/ui.py

### Conclusion

Congratulations! You've just built an entire image application with an awesome UI for all users and its own API for your programmer friends or teammates. 