# Deploying Machine Learning Models as API Services With BentoML And AWS Lambda
## Get that model online!
![](images/pexels.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/photo/blue-and-red-galaxy-artwork-1629236/'>Suzy Hazelwood</a>
    </strong>
</figcaption>

## Introduction

According to ml-ops.org, the current state of MLOps stack looks like the following template:

![](https://ml-ops.org/img/mlops-full-stack.png)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://valohai.com/blog/the-mlops-stack/'>Henrik Skogström</a>
        on 
        <a href='https://ml-ops.org/content/state-of-mlops'>ml-ops.org</a>
    </strong>
</figcaption>

The industry is fast-changing, leading to multiple candidates for performing each of the operations in the template.

BentoML is a new open-source library that handles the model serving part of the MLOps life cycle. It offers a Python API that allow users to serve their models as APIs in a simple script and get an HTTP server they can send POST requests to generate predictions on unseen data. 

This lightweight API then can be inserted into any machine learning use case, be it a Docker container or a web app.

In this post, we will go deep into how you can use BentoML and its Bentos API and how you can combine it with AWS Lambda to get your models up and running for anyone.

## What is BentoML and its purpose?

To maximize the business impact of machine learning, the hand-off between data scientists and engineers from model training to deployment should be fast and iterative. However, data scientists often don't have the skills to properly package trained models and push them to the engineers while engineers struggle with working models that come from dozens of different ML frameworks.

BentoML was created to solve these issues and make the hand-off to production deployment as easy and fast as possible. In the coming sections, you will see how BentoML makes it stupidly easy to perform tedious operations. The examples are:
- Saving any model of any framework into a unified format
- Create an HTTP API endpoint with a single Python function
- Containerize everything the model needs using Docker with a single CLI command

So, without further ado, let's get started.

## Dataset preparation and model training

The crux of the article is about model deployment, so I want to concentrate all your attention on that area only. For that purpose, I will assume you are reading this article with your best trained model already in hand and want to deploy it as soon as possible. 

To simulate that here, we will simply create a synthetic dataset, train an XGBoost model and move forward as though you have done all the previous steps of the MLOps life cycle like data cleaning, exploration, feature engineering, model experimentation, hyperparameter tuning and found the model that performs best on your problem. 

In [2]:
import warnings

warnings.filterwarnings("ignore")

In [3]:
import pandas as pd
from sklearn.datasets import make_classification

# Generate the data
n_samples, n_features = 10000, 7
X, y = make_classification(n_samples=n_samples, n_features=n_features, n_informative=5)

# Save it as a CSV
feature_names = [f"feature_{i}" for i in range(n_features)]

df = pd.DataFrame(X, columns=feature_names)
df["target"] = y

df.to_csv("data/data.csv", index=False)

We create a simple dataset with 7 features and 10k samples with a binary classification target. Now, we load it back into environment and train a vanilla XGBoost classifier and pretend that it is our best tuned model.

In [8]:
import xgboost as xgb
from sklearn.model_selection import KFold, cross_validate, train_test_split

# Load and prep the data
data = pd.read_csv("data/data.csv")
X, y = data.drop("target", axis=1), data[["target"]]

# Initialize a classifier
clf = xgb.XGBClassifier(tree_method="gpu_hist")

# Cross-validate
cv = KFold(n_splits=10, shuffle=True, random_state=1)

scores = cross_validate(
    clf,
    X,
    y,
    cv=cv,
    n_jobs=-1,
    scoring="roc_auc",
    return_train_score=True,
    return_estimator=True,
)

After loading the data, we use 10-fold cross-validation and use ROC AUC score as a metric. For the sake of completeness, let's quickly log the train/validation scores:

In [14]:
avg_train = scores["train_score"].mean()
avg_test = scores["test_score"].mean()

std_train = scores["train_score"].std()
std_test = scores["test_score"].std()

print(f"Average training ROC AUC: {avg_train:.3f} ± {std_train:.3f}")
print(f"Average test ROC AUC: {avg_test:.3f} ± {std_test:.3f}")

Average training ROC AUC: 0.999 ± 0.000
Average test ROC AUC: 0.971 ± 0.003


We extract one of the models from the folds and save it as `clf`.

In [18]:
clf = scores["estimator"][8]

Great! Now, we are ready for deployment.

## Saving trained models to BentoML format

Saving a trained model into BentoML-compatible format is done calling the framework-specific `save` command:

In [25]:
import bentoml  # pip install bentoml

bento_xgb = bentoml.sklearn.save_model("xgb_initial", clf)
bento_xgb

Model(tag="xgb_initial:3bt3t6yqw6cnujcl", path="C:\Users\bex\bentoml\models\xgb_initial\3bt3t6yqw6cnujcl\")

Even though we trained an XGBoost classifier, we still use the `sklearn.save_model` command because we initialized the model in Sklearn API. The returned object is an instance of BentoML `Model` class with a label called *tag*. 

The tag consists of two parts - a name given by the user and a version string to differentiate between models saved at different times. Even if an identical model is saved, a new directory and a version string will be created for it. 

BentoML supports almost all important ML frameworks:
- Classic: Sklearn, XGBoost, CatBoost, LightGBM
- Deep learning: TensorFlow, PyTorch, PyTorch Lightning, Keras, Transformers
- Others: ONNX, MLFlow, fast.ai, statsmodels, spaCy, h2o, Gluon, etc.

Each of the frameworks have a corresponding `framework.save_model` command.

When a model is saved, it goes into a local directory called BentoML model store. From the last output, we saw that my model store resides in `C:\Users\bex\bentoml\models`. You can see the list of all your models by calling the `bentoml models list` command in the terminal:

In [26]:
!bentoml models list

 Tag                          Module           Size        Creation Time       
 xgb_initial:3bt3t6yqw6cnuj…  bentoml.sklearn  441.21 KiB  2022-07-31 15:02:11 
 xgb_initial:2y6k6tyqw6i6kj…  bentoml.sklearn  441.21 KiB  2022-07-31 15:02:07 
 xgb_initial:y7ug7oaqw6kjaj…  bentoml.sklearn  441.21 KiB  2022-07-31 15:01:43 
 keras_conv2d_smaller:4zngb…  bentoml.keras    54.59 MiB   2022-04-13 15:03:00 
 conv2d_larger_dropout:rsl6…  bentoml.keras    128.58 MiB  2022-04-12 20:30:57 
 conv2d_larger_dropout:3ygl…  bentoml.keras    128.58 MiB  2022-04-12 20:18:55 
 conv2d_larger_dropout:szo4…  bentoml.keras    128.58 MiB  2022-04-09 13:53:55 
 keras_conv2d:b52h7x5xpk2be…  bentoml.keras    128.58 MiB  2022-04-09 01:25:41 




You can also see models from my other projects.

> Note: in BentoML docs and this article, the names "model" and "tag" are used interchangeably to refer to saved models in the model store.

The `save_model` has other parameters that allow you to pass extra information about the model, from metadata to additional user-defined objects (e.g. weights of your model as a separate object):

In [32]:
bentoml.sklearn.save_model(
    "xgb_custom",
    clf,
    metadata={"auc": avg_test, "cv_scores": scores},
    labels={"author": "Bex"},
)

Model(tag="xgb_custom:hgpjk2iqxk67yjcl", path="C:\Users\bex\bentoml\models\xgb_custom\hgpjk2iqxk67yjcl\")

## Sharing models

Models in the BentoML model store can be shared as standalone archives using the `bentoml models export` command:

In [37]:
!bentoml models export xgb_custom:latest ./models

Model(tag="xgb_custom:hgpjk2iqxk67yjcl") exported to C:\Users\bex\Desktop\articles\2022\7_july\3_bentoml_xgboost\models\xgb_custom-hgpjk2iqxk67yjcl.bentomodel




When you don't know the exact version string of your tag, you can use the ":latest" suffix to choose the most recent. With the above command, we are exporting the classifier into a `.bentomodel` archive to the models directory. When a teammate sends you a `.bentomodel` archive, you can use the `import` command to send it to your local BentoML model store:

In [38]:
!bentoml models import ./models/xgb_custom-hgpjk2iqxk67yjcl.bentomodel

Error: [models] `import` failed: Item 'xgb_custom:hgpjk2iqxk67yjcl' already exists in the store <osfs 'C:\Users\bex\bentoml\models'>


## Retrieving saved models 

There are a few ways of loading saved models from the model store into your environment. The simplest one is the `load_model` function. Like `save_model`, `load_model` is also framework-specific:

In [42]:
import bentoml

clf = bentoml.sklearn.load_model("xgb_custom:latest")
clf

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=0,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=24, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='gpu_hist', validate_parameters=1, verbosity=None)

The function will load the model in exactly the same format it was before it was saved, meaning you can use its native methods like `predict`:

In [45]:
import numpy as np

sample = np.random.random(size=(1, 7))

clf.predict(sample)

array([1], dtype=int64)

To load the model as a BentoML `Model` object, you can use the `models.get` command, which IS NOT framework-specific:

In [46]:
tag = bentoml.models.get("xgb_custom:latest")

The reason you might want to load the model in this format is because now, you can access its add-ons like metadata and labels:

In [47]:
tag.custom_objects

{}

In [49]:
tag.info.labels

{'author': 'Bex'}

The final and most important way of retrieving models is by loading them as runners:

In [50]:
import bentoml

xgb_runner = bentoml.sklearn.load_runner("xgb_custom:latest")

The "bentoml.sklearn.load_runner" method is being deprecated. Use `bentoml.sklearn.get("xgb_custom:latest").to_runner()` instead


Runners are special objects of BentoML that are optimized to use system resources in the most efficient way possible based on their framework. Runners are the core components of the APIs we will build in the next section. 

You can also load runners using the `models.get` and `to_runner` functions (which is preferred to the last method):

In [52]:
import bentoml

tag = bentoml.models.get("xgb_custom:latest")
xgb_runner = tag.to_runner()

Now, we are ready to start building the API!

## Organize into scripts

Up until now, we have been using notebooks. To start building an API service, we need to switch to Python scripts. Let's organize the code of the previous sections. In `generate_data.py` file, create a function that saves the synthetic data from the "Dataset Preparation" section:

In [53]:
import pandas as pd
from sklearn.datasets import make_classification


def generate_data(n_samples, n_features, n_informative, path):
    """
    A simple function to save a synthetic dataset to path.
    """
    # The code from the above sections
    pass

```python
if __name__ == "__main__":
    n_samples, n_features = 10000, 7
    generate_data(n_samples, n_features, 5, "data/data.csv")

```

In a `train.py` file, create a function that trains our XGBoost classifier and saves it to BentoML model store:

In [54]:
def train_xgb_save(X, y, tag_name="xgb_final"):
    """
    A simple function to train a model and save it to BentoML model store.
    """
    # Initialize a classifier
    clf = xgb.XGBClassifier(tree_method="gpu_hist")

    # Train and save
    clf.fit(X, y)

    bentoml.sklearn.save_model(tag_name, clf)

```python
if __name__ == "__main__":
    # Load and prep the data
    data = pd.read_csv("data/data.csv")
    X, y = data.drop("target", axis=1), data[["target"]]

    # Train and save
    train_xgb_save(X, y)
```

We don't have to cross-validate the final model. We can simply train it on the full data and save it to the model store.

Now, we create the final, API service script in the next section.

## Creating an API service script

## Building a Bento

This section will explain how to use the ‘bentoml build’ command and all the steps required before running it.

## Deploying the Bento to AWS Lambda