# Model Deployment with BentoML and DigitalOcean

In this tutorial, we will be building a transaction fraud detection API using the [BentoML](https://bentoml.com) framework, and deploying it to DigitalOcean via Kubernetes.

Prerequisites:
- Install Docker
- Install Python3.8+
- Install JupyterLab

By the end of this tutorial you will be able to:
- Setup a Kubernetes cluster on DigitalOcean
- Create Bentos with BentoML and containerize them
- Create a image repository on DigitalOcean and upload your images
- Deploy the a Bento service to Kubernetes using the DigitalOcean image repository
  
You should download the data required for this tutorial from [here](https://drive.google.com/file/d/1MidRYkLdAV-i0qytvsflIcKitK4atiAd/view?usp=sharing). This is originally from a [Kaggle dataset](https://www.kaggle.com/competitions/ieee-fraud-detection/data) for Fraud Detection. Place this dataset in a `data` directory in the root of your project. You can run this notebook either in VS Code or Jupyter Notebooks.

## Build a model

Firstly, we need a model to deploy. Let's build a quick model to detect fraudulent transactions. We will need a number of libraries so lets install them. Since the focus of this tutorial is deployment, don't worry about the feature selection or model training specifics. There are only two objects to be concerned with in the training code block:

- **enc**: The encoder object we use to preprocess the data with one-hot encoding.
- **model**: The model object we are going to deploy.

If you wish, create a virtual environment with conda or venv.

In [1]:
pip install numpy pandas xgboost scikit-learn

Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from xgboost import XGBClassifier

# Load the data, sample such that the target classes are equal size
df = pd.read_csv("data/train_transaction.csv")
df = pd.concat([df[df.isFraud == 0].sample(n=len(df[df.isFraud == 1])), df[df.isFraud == 1]], axis=0)

# Select the features and target
X = df[["ProductCD", "P_emaildomain", "R_emaildomain", "card4", "M1", "M2", "M3"]]
y = df.isFraud

# Use one-hot encoding to encode the categorical features
enc = OneHotEncoder(handle_unknown="ignore")
enc.fit(X)

X = pd.DataFrame(enc.transform(X).toarray(), columns=enc.get_feature_names_out().reshape(-1))
X["TransactionAmt"] = df[["TransactionAmt"]].to_numpy()

# Split the dataset and train the model
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
xgb = XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective='binary:logistic', nthread=4, scale_pos_weight=1, seed=27)
model = xgb.fit(X_train, y_train)

  from pandas import MultiIndex, Int64Index


## Setup your Kubernetes Cluster

So, you have built a model, and you want to deploy it so it's actually useful. How do you do that?

Let's start by setting up a Kubernetes cluster on DigitalOcean. Sign up using [this link](https://try.digitalocean.com/freetrialoffer/) to recieve $100 in free credits. You will need to setup a payment method, but don't worry, it won't charge you just yet. If you haven't received your credits, email support if you don't want to be charged before proceeding with the tutorial.

Once you have created your account, you can go to the [DigitalOcean dashboard](https://cloud.digitalocean.com/dashboard) and click on the **Kubernetes** tab. You should be greeted with this screen. Click **Create a Kubernetes Cluster**.

![Kubernetes Landing Page](media/kubernetes_landing_page.png)

Follow the steps to create the cluster as you see fit. Since this is just a demonstration, when you select your capacity make sure you go down to the $12 a month tier. 2 nodes will be sufficient for this tutorial.

![Node $12 Tier](media/node_cheap_tier.png)

You should now be greeted with a screen like this:
![Kubernetes Setup](media/kube_setup.png)

Click **Get Started** and follow the instructions to connect to your cluster. You will need to install the **kubectl** command line tool for configuring the Kubernetes cluster, as well as the **doctl** command line tool for interacting with DigitalOcean. We have provided the commands for Homebrew on MacOS, but you can also install however you wish.

In [6]:
!brew install doctl kubectl

To reinstall 1.78.0, run:
  brew reinstall doctl
To reinstall 1.24.3, run:
  brew reinstall kubernetes-cli


Once that's done, you need to create a DigitalOcean API token. You can do this by going to the [DigitalOcean dashboard](https://cloud.digitalocean.com/dashboard) and clicking on the **API** tab.

![API Tab](media/API.png)

Once there, click **Generate New Token**. You should be promopted to name your token. Enter **fraud-classifier** and click **Generate Token**.

![Token Promopt](media/token.png)

You should now see your token. Copy it and run the following **doctl** command to log you in. When prompted for the token string, paste it into the command line. You should run this in your terminal outside of this notebook as you need to paste the token into the command line as input.

```bash
doctl auth init --context fraud-classifier
```

Validate that the authorization was successful by running the following command:

In [5]:
!doctl account get

User Email             Team       Droplet Limit    Email Verified    User UUID                               Status
elijah@cerebrium.ai    My Team    10               true              63644d72-4ffb-40e3-b3b8-aa8bc8fb1723    active


Next, copy the Cluster ID from your cluster page on the DigitalOcean dashboard and paste it into the notebook in the cell below. It should look something like this:

In [6]:
# !doctl kubernetes cluster kubeconfig save <your-lcuster-id>

[32mNotice[0m: Adding cluster credentials to kubeconfig file found in "/Users/elijahrou/.kube/config"
[32mNotice[0m: Setting current-context to do-sfo3-test-k8s


Verify that the connection has worked by running kubectl:

In [7]:
!kubectl config get-contexts
!kubectl cluster-info

CURRENT   NAME                                                 CLUSTER                                              AUTHINFO                                             NAMESPACE
          arn:aws:eks:eu-west-1:288552132534:cluster/prefect   arn:aws:eks:eu-west-1:288552132534:cluster/prefect   arn:aws:eks:eu-west-1:288552132534:cluster/prefect   
*         do-sfo3-test-k8s                                     do-sfo3-test-k8s                                     do-sfo3-test-k8s-admin                               
          minikube                                             minikube                                             minikube                                             default
          yatai-comp                                           do-sfo3-test-k8s-admin                               do-sfo3-test-k8s-admin                               yatai-components
          yatai-ops                                            do-sfo3-test-k8s-admin                               do

Congratulations! You have now setup a Kubernetes cluster on DigitalOcean. Now that we have the infrastructure, we need to create a API service for our fraud detection model.

## Create a Bento Service

While there are a number of tools that ease the stress of deploying a model, one of the more straightforward ways is to use the Open Source [BentoML](https://bentoml.com) framework. BentoML is a framework for deploying machine learning models that pre-packages the model for you into a callable REST API containerized service. We chose BentoML as it is easy to use, will install the packages we need and can deploy to multiple different cloud services and infrastructures.

Install the BentoML latest stable release with the following command:

```

In [10]:
pip install bentoml==1.0

Note: you may need to restart the kernel to use updated packages.


Now, we want to create our Bento service using the **model** and **enc** objects we created before. When you *save* a model, Bento will store it locally and version it. Import `bentoml` and use the appropriate `save_model` function to save the models we need to the local **model** store, running it in your notebook. You may notice we used the **sklearn** `save_model` for the XGBoost model. This is as we have used the SKLearn API to create the model.

There are a number of different *optional* arguments you can include when saving a model. We included a number of extra tags to demonstrate how this works.

- **labels**: user-defined labels for managing models (e.g. team=nlp, stage=dev).
- **metadata**: user-defined metadata for storing model training context information or model evaluation metrics (e.g. dataset version, training parameters, confusion matrix, etc).
- **custom_objects**: user-defined additional python objects (e.g. a tokenizer instance, preprocessor functions, etc). Custom objects will be serialized with cloudpickle.
- **signatures**: model signatures for inference (e.g. input/output shapes, whether inference is batched, etc). For more information, see the [BentoML documentation](https://bentoml.com/docs/bento-ml/api/model-signatures/) on signatures.

In [11]:
import bentoml
saved_model = bentoml.sklearn.save_model(
    "fraud_classifier", 
    model,
    labels = {
        "owner": "Cerebrium",
        "stage": "prod"
    },
    metadata = {
        "version": "1.0.0"
    },
    custom_objects = {
        "ohe_encoder": enc
    },
    signatures={
      "predict": {
        "batchable": True,
        "batch_dim": 0,
      }
    },
)
print(f"{saved_model}")

Model(tag="fraud_classifier:47sbzaqj2oeaautt")


Next, you will need to create a Bento service. This abstraction tells Bento what model to use to run inference and handle any preprocessing. In this service, we are going to use the `fraud_classifier` model we saved to the local store.

Create a new file called `fraud_detection_service.py` in the project root directory, and paste the following code into it:

```python
import numpy as np
import pandas as pd

import bentoml
from bentoml.io import PandasDataFrame, JSON

ohe_encoder = bentoml.models.get("fraud_classifier:latest").custom_objects["ohe_encoder"]
fraud_classifier_runner = bentoml.sklearn.get("fraud_classifier:latest").to_runner()

svc = bentoml.Service("fraud_classifier", runners=[fraud_classifier_runner])

@svc.api(input=PandasDataFrame(), output=JSON(), route="/fraud-classifier")
def predict(df: pd.DataFrame) -> np.ndarray:
    X = df[["ProductCD", "P_emaildomain", "R_emaildomain", "card4", "M1", "M2", "M3"]]
    X = X.fillna(pd.NA) # ensure all missing values are pandas NA
    X = pd.DataFrame(ohe_encoder.transform(X).toarray(), columns=ohe_encoder.get_feature_names_out().reshape(-1))
    X["TransactionAmt"] = df[["TransactionAmt"]].to_numpy()
    return fraud_classifier_runner.predict.run(X)
```

There are a number of key details in this file to be aware of.

- `ohe_encoder`: You'll notice that we load in the encoder custom object from the *fraud_classifier* model we defined previously. This is because we need to transform the data before we can use it for inference.
- `fraud_classifier_runner`: Here, we load in the *fraud_classifier* model we defined previously and convert it into a **runner**. A runner in BentoML represents a unit of serving logic which wraps a model and can be scaled to maximize throughput and resource use.
- `svc`: This represents the Service object. It is the main entry point for the BentoML service.
- `svc.api`: This is a decorator that tells BentoML this is an API, what kind of input and output the API accepts, and the desired REST route.

As you can see, we instantiate a **Service** class and define an API with DataFrame inputs and JSON outputs. We run all necessary pre-processing with the **encoder** custom object, then call the **model** runner to make predictions.

We can now quickly test out our service. We can run the following command in the terminal:

```bash
bentoml serve fraud_detection_service:svc
```

Navigate to the specified IP of the service and run the following post request (the output should be `[1]`):

```json
[{"TransactionID":3366167,"isFraud":0,"TransactionDT":9489613,"TransactionAmt":495.0,"ProductCD":"W","card1":11839,"card2":490.0,"card3":150.0,"card4":"visa","card5":226.0,"card6":"debit","addr1":123.0,"addr2":87.0,"dist1":1.0,"dist2":null,"P_emaildomain":"live.com","R_emaildomain":null,"C1":1.0,"C2":2.0,"C3":0.0,"C4":0.0,"C5":0.0,"C6":1.0,"C7":0.0,"C8":0.0,"C9":1.0,"C10":0.0,"C11":1.0,"C12":0.0,"C13":1.0,"C14":1.0,"D1":11.0,"D2":11.0,"D3":11.0,"D4":11.0,"D5":11.0,"D6":null,"D7":null,"D8":null,"D9":null,"D10":11.0,"D11":29.0,"D12":null,"D13":null,"D14":null,"D15":11.0,"M1":"T","M2":"T","M3":"T","M4":"M0","M5":"F","M6":null,"M7":"T","M8":"T","M9":"T"}]
```

Our service is ready to be packaged into a **Bento**, which is essentially the service packaged with needed dependencies. We're now going to build and containerize the Bento and deploy it to our DigitalOcean K8s cluster.

## Bento Building & Containerization

Before we containerize and test our Bento, we should create a repository on the Digital Ocean image registry to push our service image to. If you don't have Docker already running, you should start it now.

Using `doctl`, create a new registry with whatever name you want. Whatever name you choose must be *globally* unique, but you should name it meaningfully (eg *ml-models*).

In [13]:
# !doctl registry create <your-registry-name>
# !doctl registry login

[31mError[0m: POST https://api.digitalocean.com/v2/registry: 409 (request "d7ee6ed9-b551-466a-807f-5d07a7aca2d9") name already exists
Logging Docker in to registry.digitalocean.com


Awesome! We have a registry. Now, we need to define `bentofile.yaml` file in our project directory. We use this file to specify things such as python packages, CUDA installations, the base docker image, etc. You can read more about the various build options [here](https://docs.bentoml.org/en/latest/concepts/bento.html).

```yaml
service: "fraud_detection_service:svc"  # Same as the argument passed to `bentoml serve`
labels:
   owner: Cerebrium
   stage: prod
include:
- "*.py"  # A pattern for matching which files to include in the bento
python:
   packages:  # Additional pip packages required by the service
   - scikit-learn
   - pandas
   - numpy
   - xgboost
```

Now, we run the following command to build our Bento:

In [17]:
!bentoml build

Building BentoML service "fraud_classifier:nibfwjaj2wdqiutt" from build context "/Users/elijahrou/Cerebrium/deployment_tut"
Packing model "fraud_classifier:47sbzaqj2oeaautt"
Locking PyPI package versions..

██████╗░███████╗███╗░░██╗████████╗░█████╗░███╗░░░███╗██╗░░░░░
██╔══██╗██╔════╝████╗░██║╚══██╔══╝██╔══██╗████╗░████║██║░░░░░
██████╦╝█████╗░░██╔██╗██║░░░██║░░░██║░░██║██╔████╔██║██║░░░░░
██╔══██╗██╔══╝░░██║╚████║░░░██║░░░██║░░██║██║╚██╔╝██║██║░░░░░
██████╦╝███████╗██║░╚███║░░░██║░░░╚█████╔╝██║░╚═╝░██║███████╗
╚═════╝░╚══════╝╚═╝░░╚══╝░░░╚═╝░░░░╚════╝░╚═╝░░░░░╚═╝╚══════╝

Successfully built Bento(tag="fraud_classifier:nibfwjaj2wdqiutt")


'Grats! You created a Bento! You can serve it with the following command in your terminal with the **latest** tag and navigate to localhost:3000 as before:
```bash
bentoml serve fraud_classifier:latest --production
```

Now, we need to create the service container. Using the `bentoml containerize`, we will containerize our Bento, tagging the image with your registry link and the name of the service. We are using the *latest* tag, but you could grab the id of the Bento as a substitute (with `bentoml list`). If you're on Apple Silicon, include `--platform=linux/amd64` in the command to avoid compatibility issues.

```bash

In [29]:
# !bentoml containerize fraud_classifier:latest -t registry.digitalocean.com/<your-registry-name>/fraud-classifier:latest --platform=linux/amd64

Building docker image for Bento(tag="fraud_classifier:nibfwjaj2wdqiutt")...
[1A[1B[0G[?25l[+] Building 0.0s (0/0)                                                         
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.1s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 2.24kB                                     0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => resolve image config for docker.io/docker/dockerfile:1.4-labs          0.0s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile         

Before we push our image to the registry, let's test that the service is working. Instantiate the service by running the following command in your terminal (if you have a rogue process running on port 3000 from previous commands, you should kill it first):

```bash
docker run -p 3000:5000 registry.digitalocean.com/ml-models/fraud-classifier:latest --workers=2
```

Navigate to `localhost:3000` in your browser and test the API with the previous POST request.

![Test the API locally](media/test_api.png)

Ensure the response output is as expected (either 1 or 0). If there are any errors, you likely made a mistake in the **FraudClassifier** class.

Once that's good, we just have standard docker stuff now. Let's push our image to the registry.

In [16]:
# !docker push registry.digitalocean.com/<your-registry-name>/fraud-classifier:latest

The push refers to repository [registry.digitalocean.com/ml-models/fraud-classifier]

[1Ba7f7463f: Preparing 
[1B36d63b4d: Preparing 
[1Bdbae906f: Preparing 
[1B6e2812f1: Preparing 
[1B02c667f8: Preparing 
[1Bf2ada3b7: Preparing 
[1Bba7358a0: Preparing 
[1B6610b4ce: Preparing 
[1Bf9b1552a: Preparing 
[1B148d3e7c: Preparing 
[1Bef851fa5: Preparing 
[1B01b290f0: Preparing 
[1Bcf49c90c: Preparing 
[1Bcc4915ef: Preparing 
[1B5f184b49: Preparing 
[1B4afccd60: Preparing 
[1B09dcc974: Preparing 
[1B08ab7cf3: Preparing 
[1B5b992fc1: Preparing 
[1B8986f350: Preparing 
[1Bab4c463e: Preparing 
[18B2c667f8: Pushed   877.7MB/871.9MB[22A[2K[19A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K[18A[2K

## Deploy your service

Well done! Now, there's only one thing left to do. We need to deploy our service images Kubernetes!

Firstlt, let's authorize access to the DigitalOcean Container Registry and pipe the access secret to kubectl:

In [24]:
!doctl registry kubernetes-manifest | kubectl apply -f -

secret/registry-ml-models configured


Now, let's use this uploaded secret to authenticate our pulls from the registry:

In [23]:
# !kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "registry-<your-registry-name>"}]}'

serviceaccount/default patched


Finally, let's create the deployment:

In [25]:
# !kubectl create deployment fraud-classifier --image=registry.digitalocean.com/<your-registry-name>/fraud-classifier:latest

deployment.apps/fraud-classifier created


You can confirm it is now running and viewing all Replica Sets.

In [26]:
!kubectl get rs
!kubectl get pods

NAME                          DESIRED   CURRENT   READY   AGE
fraud-classifier-54b7874949   1         1         1       8s
NAME                                READY   STATUS    RESTARTS   AGE
fraud-classifier-54b7874949-6s69s   1/1     Running   0          10s


We need a load balancer to expose our Bento service and make our Fraud Classifier scalable to multiple pods. Let's create one quickly, forwarding port 80 to the target port 5000.

In [27]:
!kubectl expose deployment fraud-classifier --type=LoadBalancer --port=80 --target-port=5000

service/fraud-classifier exposed


This will take a few minutes to create. Using the following `doctl` command, we can monitor the progress of the load balancer deployment:

In [30]:
!doctl compute load-balancer list --format Name,Created,IP,Status

Name                                Created At              IP                Status
affbe322705e74689ac5f06a1d557070    2022-07-20T08:35:28Z    146.190.13.199    active
a1fa8ce40a5234abebea701bcd4f24fb    2022-07-21T15:09:48Z    146.190.13.233    active


Now that the load balancer is live, let's make sure our service is live! Grab the IP from of the load balancer and navigate to it in your browse

![Live App](media/live_app.png)

Congratulations, you've deployed your fraud classifier as an API!