# Deploying Any Machine Learning Model as API Services With AWS Lambda
## Get that model online!
![](images/pexels.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/photo/blue-and-red-galaxy-artwork-1629236/'>Suzy Hazelwood</a>
    </strong>
</figcaption>

## Introduction

According to ml-ops.org, the current state of MLOps stack looks like this:

![](https://ml-ops.org/img/mlops-full-stack.png)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://valohai.com/blog/the-mlops-stack/'>Henrik Skogström</a>
        on 
        <a href='https://ml-ops.org/content/state-of-mlops'>ml-ops.org</a>
    </strong>
</figcaption>

The industry is fast-changing, leading to multiple candidates for performing each of the operations in the template.

BentoML is a new open-source library that handles the model serving part of the MLOps life cycle. It offers a Python API that allow users to serve their models as APIs in a simple script and get an HTTP server they can send POST requests to generate predictions on unseen data. 

This lightweight API then can be inserted into any machine learning use case, be it a Docker container or a web app.

In this post, we will go deep into how you can use BentoML and its Bentos API and how you can combine it with AWS Lambda to get your models up and running for anyone.

## What is BentoML and its purpose?

To maximize the business impact of machine learning, the hand-off between data scientists and engineers from model training to deployment should be fast and iterative. However, data scientists often don't have the skills to properly package trained models and push them to the engineers while engineers struggle with working models that come from dozens of different ML frameworks.

BentoML was created to solve these issues and make the hand-off to production deployment as easy and fast as possible. In the coming sections, you will see how BentoML makes it stupidly easy to perform tedious MLOps operations. The examples are:
- Saving any model of any framework into a unified format
- Create an HTTP API endpoint with a single Python function
- Containerize everything the model needs using Docker with a single CLI command

So, without further ado, let's get started.

## Dataset preparation and model training

The crux of the article is about model deployment, so I want to concentrate all your attention on that area only. For that purpose, I will assume you are reading this article with your best trained model already in hand and want to deploy it as soon as possible. 

To simulate that here, we will simply create a synthetic dataset, train an XGBoost model and move forward as though you have done all the previous steps of the MLOps life cycle like data cleaning, exploration, feature engineering, model experimentation, hyperparameter tuning and found the model that performs best on your problem. 

In [8]:
import warnings

warnings.filterwarnings("ignore")

In [9]:
import pandas as pd
from sklearn.datasets import make_classification

# Generate the data
n_samples, n_features = 10000, 7
X, y = make_classification(n_samples=n_samples, n_features=n_features, n_informative=5)

# Save it as a CSV
feature_names = [f"feature_{i}" for i in range(n_features)]
df = pd.DataFrame(X, columns=feature_names)
df["target"] = y

df.to_csv("data/data.csv", index=False)

We create a simple dataset with 7 features and 10k samples with a binary classification target. Now, we load it back into the environment and train a vanilla XGBoost classifier and pretend that it is our best tuned model.

In [27]:
import xgboost as xgb
from sklearn.model_selection import KFold, cross_validate, train_test_split

# Load and prep the data
data = pd.read_csv("data/data.csv")
X, y = data.drop("target", axis=1), data[["target"]]

# Create a DMatrıx
dtrain = xgb.DMatrix(X.values, label=y.values)

# Specify parameters for a binary classification problem
params = {
    "objective": "binary:logistic",
    "booster": "gbtree",
    "eval_metric": "auc",
}

# Train
booster = xgb.train(params=params, dtrain=dtrain)

After loading the data, we use 10-fold cross-validation and use ROC AUC score as a metric. For the sake of completeness, let's quickly log the train/validation scores.

Great! Now, we are ready for deployment.

## Saving trained models to BentoML format

Saving a trained model into BentoML-compatible format is done calling the framework-specific `save` command:

In [28]:
import bentoml  # pip install bentoml

bento_xgb = bentoml.xgboost.save_model("xgb_initial", booster)
bento_xgb

Model(tag="xgb_initial:m7juljbhp2bvsj77", path="/home/bexgboost/bentoml/models/xgb_initial/m7juljbhp2bvsj77/")

Even though we trained an XGBoost classifier, we still use the `sklearn.save_model` command because we initialized the model in Sklearn API. The returned object is an instance of BentoML `Model` class with a label called *tag*. 

The tag consists of two parts - a name given by the user and a version string to differentiate between models saved at different times. Even if an identical model is saved, a new directory and a version string will be created for it. 

BentoML supports almost all important ML frameworks:
- Classic: Sklearn, XGBoost, CatBoost, LightGBM
- Deep learning: TensorFlow, PyTorch, PyTorch Lightning, Keras, Transformers
- Others: ONNX, MLFlow, fast.ai, statsmodels, spaCy, h2o, Gluon, etc.

Each of the frameworks have a corresponding `framework.save_model` command.

When a model is saved, it goes into a local directory called BentoML model store. From the last output, we saw that my model store resides in `/home/bexgboost/bentoml/models/home/bexgboost/bentoml/models`. You can see the list of all your models by calling the `bentoml models list` command in the terminal:

In [29]:
!bentoml models list

[1m [0m[1mTag                         [0m[1m [0m[1m [0m[1mModule         [0m[1m [0m[1m [0m[1mSize      [0m[1m [0m[1m [0m[1mCreation Time      [0m[1m [0m
 xgb_initial:m7juljbhp2bvsj77  bentoml.xgboost  41.79 KiB   2022-08-29 14:38:57 
 xgb_custom:hz3mt7bhpwbvsj77   bentoml.xgboost  42.29 KiB   2022-08-29 14:30:39 
 xgb_initial:5a35jarhpsbvsj77  bentoml.xgboost  42.02 KiB   2022-08-29 14:28:14 
 xgb_booster:mkxbjbrge27gwaav  bentoml.xgboost  67.08 KiB   2022-08-27 21:36:22 
 xgb_booster:m5vkj4bf66gfaaav  bentoml.xgboost  69.34 KiB   2022-08-27 16:00:04 
 xgb_booster:onhegxrf62rksaav  bentoml.xgboost  59.84 KiB   2022-08-27 15:53:14 
 xgb_custom:hgpjk2iqxk67yjcl   bentoml.sklearn  442.29 KiB  2022-07-31 15:19:13 


You can also see models from my other projects.

> Note: in BentoML docs and this article, the names "model" and "tag" are used interchangeably to refer to saved models in the model store.

The `save_model` has other parameters that allow you to pass extra information about the model, from metadata to additional user-defined objects (e.g. weights of your model as a separate object):

In [30]:
bentoml.xgboost.save_model(
    "xgb_custom",
    booster,
    metadata={"auc": 0.99, 
              "feature_importances": booster.get_score(importance_type="gain")},
    labels={"author": "Bex"},
)

Model(tag="xgb_custom:nk3qcarhp2bvsj77", path="/home/bexgboost/bentoml/models/xgb_custom/nk3qcarhp2bvsj77/")

## Sharing models

Models in the BentoML model store can be shared as standalone archives using the `bentoml models export` command:

In [31]:
!bentoml models export xgb_custom:latest ./models

Model(tag="xgb_custom:nk3qcarhp2bvsj77") exported to /home/bexgboost/articles/2022/8_august/3_bentoml_xgboost/models/xgb_custom-nk3qcarhp2bvsj77.bentomodel


When you don't know the exact version string of your tag, you can use the ":latest" suffix to choose the most recent. With the above command, we are exporting the classifier into a `.bentomodel` archive to the models directory. When a teammate sends you a `.bentomodel` archive, you can use the `import` command to send it to your local BentoML model store:

In [34]:
!bentoml models import ./models/xgb_custom-nk3qcarhp2bvsj77.bentomodel

Error: [31m[models] `import` failed: Item 'xgb_custom:nk3qcarhp2bvsj77' already exists in the store <osfs '/home/bexgboost/bentoml/models'>[0m


## Retrieving saved models 

There are a few ways of loading saved models from the model store into your environment. The simplest one is the `load_model` function. Like `save_model`, `load_model` is also framework-specific:

In [35]:
import bentoml

booster = bentoml.xgboost.load_model("xgb_custom:latest")
booster

<xgboost.core.Booster at 0x7fd0959da4f0>

The function will load the model in exactly the same format it was before it was saved, meaning you can use its native methods like `predict`:

In [36]:
import numpy as np

sample = np.random.random(size=(1, 7))

booster.predict(xgb.DMatrix(sample))

array([0.4840052], dtype=float32)

To load the model as a BentoML `Model` object, you can use the `models.get` command, which IS NOT framework-specific:

In [37]:
tag = bentoml.models.get("xgb_custom:latest")

The reason you might want to load the model in this format is because now, you can access its add-ons like metadata and labels:

In [39]:
tag.info.labels

{'author': 'Bex'}

In [44]:
tag.info.metadata

{'auc': 0.99,
 'feature_importances': {'f0': 134.2563934326172,
  'f1': 27.976463317871094,
  'f2': 56.519203186035156,
  'f3': 13.31026554107666,
  'f4': 13.665970802307129,
  'f5': 12.156410217285156,
  'f6': 22.775766372680664}}

The final and most important way of retrieving models is by loading them as runners:

In [46]:
import bentoml

tag = bentoml.models.get("xgb_custom:latest")
xgb_runner = tag.to_runner()

Runners are special objects of BentoML that are optimized to use system resources in the most efficient way possible based on their framework. Runners are the core components of the APIs we will build in the next section. 

Now, we are ready to start building the API!

## Organize into scripts

Up until now, we have been using notebooks. To start building an API service, we need to switch to Python scripts. Let's organize the code of the previous sections. In `generate_data.py` file, create a function that saves the synthetic data from the "Dataset Preparation" section:

In [47]:
import pandas as pd
from sklearn.datasets import make_classification


def generate_data(n_samples, n_features, n_informative, path):
    """
    A simple function to save a synthetic dataset to path.
    """
    # The code from the above sections
    ...

```python
if __name__ == "__main__":
    n_samples, n_features = 10000, 7
    generate_data(n_samples, n_features, 5, "data/data.csv")

```

> The full `generate_data.py` script can be found [here](https://github.com/BexTuychiev/bentoml_sample_project/blob/main/src/generate_data.py).

In a `train.py` file, create a function that trains our XGBoost classifier and saves it to BentoML model store:

In [54]:
def train_xgb_save(X, y, tag_name="xgb_final"):
    """
    A simple function to train a model and save it to BentoML model store.
    """
    # Create DMatrix
    dtrain = xgb.DMatrix(X, label=y)
    # Specify parameters for a binary classification problem
    params = {"objective": "binary:logistic", "booster": "gbtree", "eval_metric": "auc"}

    # Train
    booster = xgb.train(params, dtrain, num_boost_round=20)

    bentoml.xgboost.save_model(tag_name, booster)

```python
if __name__ == "__main__":
    # Load and prep the data
    data = pd.read_csv("data/data.csv")
    X, y = data.drop("target", axis=1), data[["target"]]

    # Train and save
    train_xgb_save(X, y, "xgb_booster")
```

> The full `train.py` script can be found [here](https://github.com/BexTuychiev/bentoml_sample_project/blob/main/src/train.py).

For completeness, run both scripts in the correct order to generate the dataset and save a new model to the model store.

## Creating an API service script

Now, it is time to create the local API. For that, we will only need a simple script that starts like below:

In [48]:
import bentoml

# Get the runner
xgb_runner = bentoml.models.get("xgb_booster:latest").to_runner()

# Create a Service object
svc = bentoml.Service("xgb_classifier", runners=[xgb_runner])

After loading our model with `models.get` as a runner, we create an object called `svc`. It will be an instance of BentoML `Service` object. `Service` is a high-level class that abstractly represents our API. 

To the service object, we add a single endpoint called `classify`, which is done by creating a function with the same name:

In [51]:
from bentoml.io import NumpyNdarray
import numpy as np

# Create an endpoint named `classify`
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series) -> np.ndarray:
    # Convert the input string to numpy array
    label = xgb_runner.predict.run(input_series)

    return label

Let's understand the above snippet line-by-line. 

First, we are importing a new class called `NumpyNdarray` from `bentoml.io` - Input/Output module. To standardize inputs and outputs, BentoML offers several classes like `NumpyNdarray` such as, `Text`, `File`, `Image`, `PandasDataFrame`, etc. 

Adding these classes to the `input` and `output` arguments of the `svc.api` decorator ensures that correct datatypes are passed to our API endpoint. In our case, we are making sure that the data passed to our `classify` function is always a NumPy array. If we were working with image models, our input class could be a `File` or `Image` class, while the output would be `NumpyNdarray` again. 

Inside the function, we are using the `run` function of our runner to get a prediction on the input. Here is what the script looks like in the end:

In [53]:
import bentoml
import numpy as np
from bentoml.io import NumpyNdarray

# Get the runner
xgb_runner = bentoml.models.get("xgb_booster:latest").to_runner()

# Create a Service object
svc = bentoml.Service("xgb_classifier", runners=[xgb_runner])


# Create an endpoint named classify
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series) -> np.ndarray:
    # Convert the input string to numpy array
    label = xgb_runner.predict.run(input_series)

    return label

That's it! By using `bentoml serve`, we can create a local debug server:

```
$ bentoml serve service.py:svc --reload
```

> Important: `service.py` and `svc` variables in the above command changes based on your script name and the name of the service object. If you had a service object named `api` in a script called `api.py`, the command would be `bentoml serve api.py:api --reload`. The `--reload` tags ensures that BentoML detects changes made to your script without needing to restart the server.

Here is a sample output of the command (ignore the name mismatch):

![](images/debug_server.gif)

The GIF shows that the API is live locally on https://127.0.0.1:3000:

SHOW THE GIF HERE

By using Swagger UI, BentoML shows you an interactive documentation of our API. We can already send requests to it to get predictions:

SHOW TO GET REQUESTS HERE

## Building a Bento

```yaml
service: "service.py:svc"  # Same as the argument passed to `bentoml serve`
labels:
   owner: Bex Tuychiev
include:
- "*.py"
python:
   packages:  # Additional pip packages required by the service
   - bentoml==1.0.0
   - numpy==1.22.4
   - pandas==1.4.2
   - scikit_learn==1.1.1
   - xgboost==1.4.2
```

```bash
$ pip install pipreqs
$ pipreqs --force .
```

```
$ bentoml build .
```

![](images/bentoml_build.png)

In [55]:
!bentoml list

[1m [0m[1mTag                   [0m[1m [0m[1m [0m[1mSize     [0m[1m [0m[1m [0m[1mCreation Time      [0m[1m [0m[1m [0m[1mPath                  [0m[1m [0m
 xgb_classifier:scwjjg…  81.50 KiB  2022-08-29 15:08:54  ~/bentoml/bentos/xgb_… 
 xgb_classifier:7m4jyq…  81.50 KiB  2022-08-29 15:04:44  ~/bentoml/bentos/xgb_… 
 sample_service:qpvhfk…  74.24 KiB  2022-08-27 15:54:00  ~/bentoml/bentos/samp… 


## Setting up AWS credentials

Go to [AWS console](https://console.aws.amazon.com/console/home)

![](images/aws_credentials.png)

```bash
export AWS_ACCESS_KEY_ID=REPLACE_WITH_YOUR_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=REPLACE_WITH_YOUR_SECRET_KEY
```

```bash
setx AWS_ACCESS_KEY_ID REPLACE_WITH_YOUR_ACCESS_KEY
setx AWS_SECRET_ACCESS_KEY REPLACE_WITH_YOUR_SECRET_KEY
```

## Deploying the Bento to AWS Lambda

```bash
$ pip install bentoctl boto3
$ bentoctl operator install aws-lambda
$ bentoctl init
```

![](images/bentoctl_tfvars.png)

![](images/main_tf.gif)

Terraform installations are [here](https://www.terraform.io/cli/install/apt) for Linux. For other systems, [here](https://learn.hashicorp.com/tutorials/terraform/install-cli).

```bash
$ terraform -h
```

```bash
$ bentoctl build -b xgb_classifier:latest -f deployment_config.yaml
```

![](images/bentoctl_build.gif)

If you receive `botocore.exceptions.ClientError`, then AWS credentials is not set up properly.

![](images/build_output.png)

```
$ terraform init
$ terraform apply -var-file=bentoctl.tfvars -auto-approve
```

![](images/terraform_apply.gif)

![](images/terraform_output.png)

If you go to https://console.aws.amazon.com/lambda, you should see: