# Deploying Machine Learning Models as API Services With BentoML And AWS Lambda
## Get that model online!
![](images/pexels.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/photo/blue-and-red-galaxy-artwork-1629236/'>Suzy Hazelwood</a>
    </strong>
</figcaption>

## Introduction

According to ml-ops.org, the current state of MLOps stack looks like the following template:

![](https://ml-ops.org/img/mlops-full-stack.png)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://valohai.com/blog/the-mlops-stack/'>Henrik Skogström</a>
        on 
        <a href='https://ml-ops.org/content/state-of-mlops'>ml-ops.org</a>
    </strong>
</figcaption>

The industry is fast-changing, leading to multiple candidates for performing each of the operations in the template.

BentoML is a new open-source library that handles the model serving part of the MLOps life cycle. It offers a Python API that allow users to serve their models as APIs in a simple script and get an HTTP server they can send POST requests to generate predictions on unseen data. 

This lightweight API then can be inserted into any machine learning use case, be it a Docker container or a web app.

In this post, we will go deep into how you can use BentoML and its Bentos API and how you can combine it with AWS Lambda to get your models up and running for anyone.

## What is BentoML and its purpose?

To maximize the business impact of machine learning, the hand-off between data scientists and engineers from model training to deployment should be fast and iterative. However, data scientists often don't have the skills to properly package trained models and push them to the engineers while engineers struggle with working models that come from dozens of different ML frameworks.

BentoML was created to solve these issues and make the hand-off to production deployment as easy and fast as possible. In the coming sections, you will see how BentoML makes it stupidly easy to perform tedious operations. The examples are:
- Saving any model of any framework into a unified format
- Create an HTTP API endpoint with a single Python function
- Containerize everything the model needs using Docker with a single CLI command

So, without further ado, let's get started.

## Dataset preparation and model training

The crux of the article is about model deployment, so I want to concentrate all your attention on that area only. For that purpose, I will assume you are reading this article with your best trained model already in hand and want to deploy it as soon as possible. 

To simulate that here, we will simply create a synthetic dataset, train an XGBoost model and move forward as though you have done all the previous steps of the MLOps life cycle like data cleaning, exploration, feature engineering, model experimentation, hyperparameter tuning and found the model that performs best on your problem. 

In [2]:
import warnings

warnings.filterwarnings("ignore")

In [3]:
import pandas as pd
from sklearn.datasets import make_classification

# Generate the data
n_samples, n_features = 10000, 7
X, y = make_classification(n_samples=n_samples, n_features=n_features, n_informative=5)

# Save it as a CSV
feature_names = [f"feature_{i}" for i in range(n_features)]

df = pd.DataFrame(X, columns=feature_names)
df["target"] = y

df.to_csv("data/data.csv", index=False)

We create a simple dataset with 7 features and 10k samples with a binary classification target. Now, we load it back into environment and train a vanilla XGBoost classifier and pretend that it is our best tuned model.

In [8]:
import xgboost as xgb
from sklearn.model_selection import KFold, cross_validate, train_test_split

# Load and prep the data
data = pd.read_csv("data/data.csv")
X, y = data.drop("target", axis=1), data[["target"]]

# Initialize a classifier
clf = xgb.XGBClassifier(tree_method="gpu_hist")

# Cross-validate
cv = KFold(n_splits=10, shuffle=True, random_state=1)

scores = cross_validate(
    clf,
    X,
    y,
    cv=cv,
    n_jobs=-1,
    scoring="roc_auc",
    return_train_score=True,
    return_estimator=True,
)

After loading the data, we use 10-fold cross-validation and use ROC AUC score as a metric. For the sake of completeness, let's quickly log the train/validation scores:

In [14]:
avg_train = scores["train_score"].mean()
avg_test = scores["test_score"].mean()

std_train = scores["train_score"].std()
std_test = scores["test_score"].std()

print(f"Average training ROC AUC: {avg_train:.3f} ± {std_train:.3f}")
print(f"Average test ROC AUC: {avg_test:.3f} ± {std_test:.3f}")

Average training ROC AUC: 0.999 ± 0.000
Average test ROC AUC: 0.971 ± 0.003


Great! Now, we can move on to deployment!

## Saving trained models to BentoML format

1. In this section, readers will learn about an already trained XGBoost model on a sample dataset. They will be given a brief overview of the model hyperparameters and the dataset used to train it

2. Then, main BentoML concepts like saving models to Bento store, and how to retrieve them will be explained. The trained XGB model will also be saved in the local store

## Creating an API service script
This section explains how to load a saved XGB model into a prediction script and how to create a service function with the ‘@service.api’ decorator

## Building a Bento

This section will explain how to use the ‘bentoml build’ command and all the steps required before running it.

## Deploying the Bento to AWS Lambda