# BentoML Kubeflow Notebook Example

In this example, we will train three fraud detection models using the [Kaggle IEEE-CIS Fraud Detection dataset](https://www.kaggle.com/c/ieee-fraud-detection) using the Kubeflow notebook and create a BentoML service that simultaneously invoke all three models and returns the decision if any one of the models predicts that a transactin is a fraud. We will build and push the BentoML service to an S3 bucket. Next we will containerize BentoML service from the S3 bucket and deploy the service to Kubeflow cluster using using BentoML custom resource definitions on Kubernetes. The service will be deployed in a microservice architecture with each model running in a separate pod, deployed on hardware that is the most ideal for running the model, and scale independently.

## Prerequisites

Install the required packages to run this example.

In [1]:
! pip install -r requirements.txt



## Download Kaggle Dataset

Set Kaggle username and key as environment variables. Accepting the [rules of the competition](https://www.kaggle.com/competitions/ieee-fraud-detection/rules) is required for downloading the dataset.

In [19]:
# Set Kaggle Credentials for downloading dataset
%env KAGGLE_USERNAME=s3sheng
%env KAGGLE_KEY=0e3966223300cd8314f8ce78b2d56058

env: KAGGLE_USERNAME=s3sheng
env: KAGGLE_KEY=0e3966223300cd8314f8ce78b2d56058


In [15]:
!kaggle competitions download -c ieee-fraud-detection
!rm -rf ./data/
!unzip -d ./data/ ieee-fraud-detection.zip && rm ieee-fraud-detection.zip

Downloading ieee-fraud-detection.zip to /Users/ssheng/github/BentoML/examples/kubeflow
100%|███████████████████████████████████████▉| 118M/118M [00:37<00:00, 3.29MB/s]
100%|████████████████████████████████████████| 118M/118M [00:37<00:00, 3.31MB/s]
Archive:  ieee-fraud-detection.zip
  inflating: ./data/sample_submission.csv  
  inflating: ./data/test_identity.csv  
  inflating: ./data/test_transaction.csv  
  inflating: ./data/train_identity.csv  
  inflating: ./data/train_transaction.csv  


## Train Models

In this demonstration, we'll train three fraud detection models using the Kaggle IEEE-CIS Fraud Detection dataset. To showcase saving and serving multiple models with Kubeflow and BentoML, we'll split the dataset into three equal-sized chunks and use each chunk to train a separate model. While this approach has no practical benefits, it will help illustrate how to save and serve multiple models with Kubeflow and BentoML.

In [16]:
import pandas as pd

df_transactions = pd.read_csv("./data/train_transaction.csv")

X = df_transactions.drop(columns=["isFraud"])
y = df_transactions.isFraud

In [17]:
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OrdinalEncoder

numeric_features = df_transactions.select_dtypes(include="float64").columns
categorical_features = df_transactions.select_dtypes(include="object").columns

preprocessor = ColumnTransformer(
    transformers=[
        ("num", SimpleImputer(strategy="median"), numeric_features),
        (
            "cat",
            OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1),
            categorical_features,
        ),
    ],
    verbose_feature_names_out=False,
    remainder="passthrough",
)

X = preprocessor.fit_transform(X)

Define our training function with the number of boosting rounds and maximum depths.

In [25]:
import xgboost as xgb

def train(n_estimators, max_depth):
    return xgb.XGBClassifier(
        tree_method="hist",
        n_estimators=n_estimators,
        max_depth=max_depth,
        eval_metric="aucpr",
        objective="binary:logistic",
        enable_categorical=True,
    ).fit(X_train, y_train, eval_set=[(X_test, y_test)])

We will divide the training data into three equal-sized chunks and treat them as independent data sets. Based on these data sets, we will train three separate fraud detection models. The trained model will be saved to the local model store using BentoML model saving API.

In [29]:
import bentoml

from sklearn.model_selection import train_test_split

CHUNKS = 3
CHUNK_SIZE = len(X) // CHUNKS

for i in range(CHUNKS):
    START = i * CHUNK_SIZE
    END = (i + 1) * CHUNK_SIZE
    X_train, X_test, y_train, y_test = train_test_split(X[START:END], y[START:END])

    name = f"ieee-fraud-detection-{i}"
    model = train(10, 5)
    score = model.score(X_test, y_test)
    print(f"Successfully trained model {name} with score {score}.")

    bentoml.xgboost.save_model(
        name,
        model,
        signatures={
            "predict_proba": {"batchable": True},
        },
        custom_objects={"preprocessor": preprocessor},
    )
    print(f"Successfully saved model {name} to the local model store.")

[0]	validation_0-aucpr:0.33902
[1]	validation_0-aucpr:0.39499
[2]	validation_0-aucpr:0.42274
[3]	validation_0-aucpr:0.45763
[4]	validation_0-aucpr:0.47844
[5]	validation_0-aucpr:0.49879
[6]	validation_0-aucpr:0.50348
[7]	validation_0-aucpr:0.51956
[8]	validation_0-aucpr:0.53362
[9]	validation_0-aucpr:0.54557
Successfully trained model ieee-fraud-detection-0 with score 0.978358936844672.
Successfully saved model ieee-fraud-detection-0 to the local model store.
[0]	validation_0-aucpr:0.39694
[1]	validation_0-aucpr:0.44607
[2]	validation_0-aucpr:0.47242
[3]	validation_0-aucpr:0.48558
[4]	validation_0-aucpr:0.50051
[5]	validation_0-aucpr:0.51961
[6]	validation_0-aucpr:0.53643
[7]	validation_0-aucpr:0.54900
[8]	validation_0-aucpr:0.55198
[9]	validation_0-aucpr:0.55791
Successfully trained model ieee-fraud-detection-1 with score 0.9721409412338454.
Successfully saved model ieee-fraud-detection-1 to the local model store.
[0]	validation_0-aucpr:0.38554
[1]	validation_0-aucpr:0.43959
[2]	valid

Saved models can be loaded back into the memory and debugged in the notebook.

In [30]:
import bentoml
import pandas as pd
import numpy as np

model_ref = bentoml.xgboost.get("ieee-fraud-detection-0:latest")
model_runner = model_ref.to_runner()
model_runner.init_local()
model_preprocessor = model_ref.custom_objects["preprocessor"]

test_transactions = pd.read_csv("./data/test_transaction.csv")[0:500]
test_transactions = model_preprocessor.transform(test_transactions)
result = model_runner.predict_proba.run(test_transactions)
np.argmax(result, axis=1)

'Runner.init_local' is for debugging and testing only. Make sure to remove it before deploying to production.


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

## Define Service API

After the models are built and scored, let's create the service definition. You can find the service definition in the `service.py` module in this example. Let's breakdown the `service.py` module and explain what each section does.

First, we will create a list of preprocessors and runners from the three models we saved earlier. Runners are abstractions of the model inferences that can be scaled independently. See [Using Runners](https://docs.bentoml.org/en/latest/concepts/runner.html) for more details.

```python
fraud_detection_preprocessors = []
fraud_detection_runners = []

for model_name in ["ieee-fraud-detection-0", "ieee-fraud-detection-1", "ieee-fraud-detection-2"]:
    model_ref = bentoml.xgboost.get(model_name)
    fraud_detection_preprocessors.append(model_ref.custom_objects["preprocessor"])
    fraud_detection_runners.append(model_ref.to_runner())
```

Next, we will create a service with the list of runners passed in.

```python
svc = bentoml.Service("fraud_detection", runners=fraud_detection_runners)
```

Finally, we will create the API function `is_fraud`. We'll use the `@api` decorator to declare that the function is an API and specify the input and output types as pandas.DataFrame and JSON, respectively. The function is defined as `async` so that the inference calls to the runners can happen simultaneously without waiting for the results to return before calling the next runner. The inner function `_is_fraud` defines the model inference logic for each runner. All runners are called simultaneously through the `asyncio.gather` function and the results are aggregated into a list. The function will return True if any of the models return True.

For more about service definitinos, please see [Service and APIs](https://docs.bentoml.org/en/latest/concepts/service.html).

## Build Service

Building the service and models into a bento allows it to be distributed among collaborators, containerized into a OCI image, and deployed in the Kubernetes cluster. To build a service into a bento, we first need to define the `bentofile.yaml` file. See [Building Bentos](https://docs.bentoml.org/en/latest/concepts/bento.html) for more options.

```yaml
service: "service:svc"
include:
- "service.py"
- "sample.py"
python:
  requirements_txt: ./requirements.txt
```

Running the following command will build the service into a bento and store it to the local bento store.

In [3]:
! bentoml build

Building BentoML service "fraud_detection:o5smnagbncigycvj" from build context "/Users/ssheng/github/BentoML/examples/kubeflow".
Packing model "ieee-fraud-detection-1:mgp26vgbiwx3ycvj"
Packing model "ieee-fraud-detection-2:mliw6fwbiwx3ycvj"
Packing model "ieee-fraud-detection-0:mbpyb5wbiwx3ycvj"
<class 'pandas.core.series.Series'> is not yet supported.

██████╗░███████╗███╗░░██╗████████╗░█████╗░███╗░░░███╗██╗░░░░░
██╔══██╗██╔════╝████╗░██║╚══██╔══╝██╔══██╗████╗░████║██║░░░░░
██████╦╝█████╗░░██╔██╗██║░░░██║░░░██║░░██║██╔████╔██║██║░░░░░
██╔══██╗██╔══╝░░██║╚████║░░░██║░░░██║░░██║██║╚██╔╝██║██║░░░░░
██████╦╝███████╗██║░╚███║░░░██║░░░╚█████╔╝██║░╚═╝░██║███████╗
╚═════╝░╚══════╝╚═╝░░╚══╝░░░╚═╝░░░░╚════╝░╚═╝░░░░░╚═╝╚══════╝

[32mSuccessfully built Bento(tag="fraud_detection:o5smnagbncigycvj").[0m
[33m
Possible next steps:

 * Containerize your Bento with `bentoml containerize`:
    $ bentoml containerize fraud_detection:o5smnagbncigycvj[0m
[33m
 * Push to BentoCloud with `bentoml push`:

## Serve Bento

Serving the bento will bring up a service endpoint in HTTP or gRPC for the service API we defined. Use `--help` to see more serving options.

In [5]:
! bentoml serve-http --production

2023-03-12T23:43:10-0700 [INFO] [cli] Environ for worker 0: set CPU thread count to 10
2023-03-12T23:43:10-0700 [INFO] [cli] Environ for worker 0: set CPU thread count to 10
2023-03-12T23:43:10-0700 [INFO] [cli] Environ for worker 0: set CPU thread count to 10
2023-03-12T23:43:10-0700 [INFO] [cli] Prometheus metrics for HTTPS BentoServer from "." can be accessed at https://localhost:3000/metrics.
2023-03-12T23:43:11-0700 [INFO] [cli] Starting production HTTPS BentoServer from "." listening on https://0.0.0.0:3000 (Press CTRL+C to quit)
2023-03-12T23:43:31-0700 [INFO] [api_server:fraud_detection:5] 127.0.0.1:55887 (scheme=http,method=GET,path=/,type=,length=) (status=200,type=text/html; charset=utf-8,length=2859) 0.220ms (trace=559de02545eac6369fbb9101f2cc13b2,span=3639a5d4b73b3b90,sampled=0)
2023-03-12T23:43:31-0700 [INFO] [api_server:fraud_detection:5] 127.0.0.1:55887 (scheme=http,method=GET,path=/docs.json,type=,length=) (status=200,type=application/json,length=12628) 7.431ms (trace=

## Export Bento

Bentos can be serialized and distributed for collaboration and deployment. The command below exports the Bento we built earlier to an S3 bucket.

In [None]:
! bentoml export fraud_detection:latest s3://bentoml-bucket/fraud-detection.bento

## Deploy to Kubernetes Cluster

