# Using multi-armed bandits to choose the best model for predicting credit card default

## Dependencies

- [helm](https://github.com/helm/helm)
- [minikube](https://github.com/kubernetes/minikube) --> install 0.25.2
- [s2i](https://github.com/openshift/source-to-image)

- Kaggle account to download data.
- Python packages:

In [None]:
!pip install -r requirements.txt

## Getting data

Either head to https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset or use the Kaggle API (instructions at https://github.com/Kaggle/kaggle-api) to download the dataset:

In [None]:
!kaggle datasets download -d uciml/default-of-credit-card-clients-dataset
!unzip default-of-credit-card-clients-dataset.zip

## Load and inspect data

In [None]:
import pandas as pd
data = pd.read_csv('UCI_Credit_Card.csv')

In [None]:
data.shape

In [None]:
data.columns

In [None]:
target = 'default.payment.next.month'

In [None]:
data[target].value_counts()

Note that we have a class imbalance, so if we use accuracy as the performance measure of a classifier, we need to be able to beat the "dummy" model that classifies every instance as 0 (no default):

In [None]:
data[target].value_counts().max()/data.shape[0]

## Case study for using multi-armed bandits

In deploying a new ML model, it is rarely the case that the existing (if any) model is decommissioned immediately in favour of the new one. More commonly the new model is deployed alongside the existing one(s) and the incoming traffic is shared between the models.

Typically A/B testing is performed in which traffic is routed between existing models randomly, this is called the experiment stage. After a set period of time performance statistics are calculated and the best-performing model is chosen to serve 100% of the requests while the other model(s) are decommissioned.

An alternative method is to route traffic dynamically to the best performing model using multi-armed bandits. This avoids the opportunity cost of consistently routing a lot of traffic to the worst performing model(s) during an experiment as in A/B testing.

This notebook is a case study in deploying two models in parallel and routing traffic between them dynamically using multi-armed bandits (Epsilon-greedy and Thompson sampling in particular).

We will use the dataset to simulate a real-world scenario consisting of several steps:

1. Split the data set in half (15K samples in each set) and treat the first half as the only data observes so far
2. Split the first half of the data in proportion 10K:5K samples to use as train:test sets for a first simple model (Random Forest)
3. After training the first model, simulate a "live" environment on the first 5K of data in the second half of the dataset
4. Use the so far observed 20K samples to train a second model (XGBoost)
5. Deploy the second model alongside the first together with a multi-armed bandit and simulate a "live" environment on the last 10K of the unobserved data, routing requests between the two models

The following diagram illustrates the proposed simulation design:

![data-split](assets/split.png)

## Data preparation

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split

OBSERVED_DATA = 15000
TRAIN_1 = 10000
TEST_1 = 5000

REST_DATA = 15000

RUN_DATA = 5000
ROUTE_DATA = 10000

# get features and target
X = data.loc[:, data.columns!=target].values
y = data[target].values

# observed/unobserved split
X_obs, X_rest, y_obs, y_rest = train_test_split(X, y, random_state=1, test_size=REST_DATA)

# observed split into train1/test1
X_train1, X_test1, y_train1, y_test1 = train_test_split(X_obs, y_obs, random_state=1, test_size=TEST_1)

# unobserved split into run/route
X_run, X_route, y_run, y_route = train_test_split(X_rest, y_rest, random_state=1, test_size=ROUTE_DATA)

# observed+run split into train2/test2
X_rest = np.vstack((X_run, X_route))
y_rest = np.hstack((y_run, y_route))

X_train2 = np.vstack((X_train1, X_test1))
X_test2 = X_run

y_train2 = np.hstack((y_train1, y_test1))
y_test2 = y_run

## Model training

We will train both models at once, but defer evaluation of the second model until simulating the live environment.

In [None]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=1)
rf.fit(X_train1, y_train1)

Now let's see how good our first model is on the test1 set:

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, \
f1_score, confusion_matrix, classification_report

In [None]:
y_preds1 = rf.predict(X_test1)

In [None]:
print(classification_report(y_test1, y_preds1,
                           target_names=['No default','Default']))

In [None]:
for score in [accuracy_score, precision_score, recall_score, f1_score, confusion_matrix]:
    print(score.__name__ + ':\n', score(y_test1, y_preds1))

In [None]:
%matplotlib inline
from utils import plot_confusion_matrix

cm = confusion_matrix(y_test1, y_preds1)
plot_confusion_matrix(cm, classes=['No default','Default'], normalize=True)

So a simple random forest model without any optimizations is able to outperform random guessing on accuracy and achieves a baseline F1 score of ~0.44. However, it is a poor predictor of default as it only achieves a recall of ~0.34.

Train the second model in advance, but defer evaluation:

In [None]:
from xgboost import XGBClassifier
xgb = XGBClassifier(random_state=1)
xgb.fit(X_train2, y_train2)

Save trained models to disk:

In [None]:
from sklearn.externals import joblib
joblib.dump(rf, 'models/rf_model/RFModel.sav')
joblib.dump(xgb, 'models/xgb_model/XGBModel.sav')

## Set up Kubernetes for live simulation

Pick Kubernetes cluster on GCP or Minikube.

In [None]:
minikube = False

In [None]:
if minikube:
    !minikube start --vm-driver kvm2 --memory 4096 --cpus 6
else:
    !gcloud container clusters get-credentials standard-cluster-1 --zone europe-west1-b --project seldon-demos

Create a cluster-wide cluster-admin role assigned to a service account named “default” in the namespace “kube-system”.

In [None]:
!kubectl create clusterrolebinding kube-system-cluster-admin --clusterrole=cluster-admin \
--serviceaccount=kube-system:default

In [None]:
!kubectl create namespace seldon

Add current context details to the configuration file in the seldon namespace.

In [None]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Create tiller service account and give it a cluster-wide cluster-admin role.

In [None]:
!kubectl -n kube-system create sa tiller
!kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
!helm init --service-account tiller

Check deployment rollout status and deploy seldon/spartakus helm charts.

In [None]:
!kubectl rollout status deploy/tiller-deploy -n kube-system

In [None]:
!helm install ../../../helm-charts/seldon-core-operator --name seldon-core --set usageMetrics.enabled=true   --namespace seldon-system

In [None]:
!kubectl rollout status deploy/seldon-controller-manager -n seldon-system

In [None]:
%%bash
helm repo add datawire https://www.getambassador.io
helm repo update
helm install ambassador datawire/ambassador \
    --set image.repository=quay.io/datawire/ambassador \
    --set enableAES=false \
    --set crds.keep=false

In [None]:
!kubectl rollout status deployment.apps/ambassador

Install analytics (Prometheus for metrics and Grafana for visualisation):

In [None]:
!helm install ../../../helm-charts/seldon-core-analytics --name seldon-core-analytics \
    --set grafana_prom_admin_password=password \
    --set persistence.enabled=false \
    --namespace seldon

Port forward Ambassador (run command in terminal):

```sh
kubectl port-forward $(kubectl get pods -n seldon -l app.kubernetes.io/name=ambassador -o jsonpath='{.items[0].metadata.name}') -n seldon 8003:8080
```

Port forward Grafana (run command in terminal):

```sh
kubectl port-forward $(kubectl get pods -n seldon -l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') -n seldon 3000:3000
```

You can then view an analytics dashboard inside the cluster at http://localhost:3000/d/rs_zGKYiz/mab?refresh=1s&orgId=1&from=now-2m&to=now. Login with:

Username : admin

password : password (as set when starting seldon-core-analytics above)

**Import the mab dashboard from ```assets/mab.json```.**

### Wrap model and router images with s2i

We have prepared the model classes under ```models/rf_model/RFModel.py``` and ```models/xgb_model/XGBModel.py``` for wrapping the trained models as docker images using s2i. The structure of the files is as follows:

In [None]:
!pygmentize models/rf_model/RFModel.py

Note that we define our own custom metrics which are the entries of the confusion matrix that will be exposed to Prometheus and visualized in Grafana as the model runs in the simulated live environment.

If Minikube used: create docker image for the trained models and routers inside Minikube using s2i.

In [None]:
if minikube:
    !eval $(minikube docker-env) && \
    make -C models/rf_model build && \
    make -C models/xgb_model build && \
    make -C ../epsilon-greedy build && \
    make -C ../thompson-sampling build

## Deploy the first model

In [None]:
!kubectl apply -f assets/rf_deployment.json -n seldon

In [None]:
!kubectl rollout status deploy/rf-model-rf-model-7c4643e 

## Simulate the first model in production for 5000 samples

In [None]:
from utils import rest_request_ambassador, send_feedback_rest

In [None]:
for i in range(X_run.shape[0]):
    if i%1000 == 0:
        print(f'Processed {i}/{X_run.shape[0]} samples', flush=True)
        
    # fetch sample and make a request payload
    x = X_run[i].reshape(1,-1).tolist()
    request = {'data':{'ndarray':x}}

    # send request to model
    response = rest_request_ambassador('rf-deployment', 'seldon', request)

    # extract prediction
    probs = response.get('data').get('ndarray')[0]
    pred = np.argmax(probs)

    # send feedback to the model informing it if it made the right decision
    truth_val = int(y_run[i])
    reward = int(pred==truth_val)
    truth = [truth_val]
    _ = send_feedback_rest('rf-deployment', 'seldon', request, response, reward, truth)

We can see the model performance on the Grafana dashboard:
http://localhost:3000/d/rs_zGKYiz/mab?refresh=1s&orgId=1&from=now-2m&to=now (refresh to update)

## Deploy the original model and the new model with a router in front

Suppose now we have come up with a new model and want to deploy it alongside the first model with a multi-armed bandit router to make decisions which model should make predictions. We will delete the original deployment and make a new one that has both models in parallel and a router/multi-armed bandit in front.

To make things interesting, we will actually deploy 2 parallel deployments with the same 2 models but a different router in front (Epsilon-greedy and Thompson sampling) to compare the performance of two very different multi-armed bandit algorithms. One can think of the first deployment as a production deployment and the second parallel one as a shadow deployment whose responses are used for testing only.

But first, let's see what the performance of the new XGBoost model is on its test2 data:

In [None]:
y_preds2 = xgb.predict(X_test2)

In [None]:
print(classification_report(y_test2, y_preds2,
                           target_names=['No default','Default']))

In [None]:
for score in [accuracy_score, precision_score, recall_score, f1_score, confusion_matrix]:
    print(score.__name__ + ':\n', score(y_test2, y_preds2))

In [None]:
cm = confusion_matrix(y_test2, y_preds2)
plot_confusion_matrix(cm, classes=['No default','Default'], normalize=True)

So the XGBoost model is slightly better than the old RFModel, so we expect any decent multi-armed bandit router to pick this up on live data, let's try this out.

First, delete the existing deployment of the old RFModel:

In [None]:
!kubectl delete sdep rf-deployment

Deploy the following two deployments:

In [None]:
from utils import get_graph
get_graph('assets/eg_deployment.json')

In [None]:
get_graph('assets/ts_deployment.json')

In [None]:
!kubectl apply -f assets/eg_deployment.json -n seldon
!kubectl apply -f assets/ts_deployment.json -n seldon

In [None]:
!kubectl rollout status deploy/poc-eg-eg-2-47fb8da

In [None]:
!kubectl rollout status deploy/poc-ts-ts-2-75f9d39

## Simulate both deployments in parellel with the remaining 10000 data samples

Here we send request and feedback to both parallel deployments, thus assessing the performance of the Epsilon-greedy router versus Thompson sampling as a method of routing to the best performing model.

In [None]:
for i in range(X_route.shape[0]):
    if i%1000 == 0:
        print(f'Processed {i}/{X_route.shape[0]} samples', flush=True)
        
    # fetch sample and make a request payload
    x = X_route[i].reshape(1,-1).tolist()
    request = {'data':{'ndarray':x}}

    # send request to both deployments
    eg_response = rest_request_ambassador('eg-deployment', 'seldon', request)
    ts_response = rest_request_ambassador('ts-deployment', 'seldon', request)

    # extract predictions
    eg_probs = eg_response.get('data').get('ndarray')[0]
    ts_probs = ts_response.get('data').get('ndarray')[0]
    eg_pred = np.argmax(eg_probs)
    ts_pred = np.argmax(ts_probs)

    # send feedback to the model informing it if it made the right decision
    truth_val = int(y_route[i])
    eg_reward = int(eg_pred==truth_val)
    ts_reward = int(ts_pred==truth_val)
    truth = [truth_val]
    _ = send_feedback_rest('eg-deployment', 'seldon', request, eg_response, eg_reward, truth)
    _ = send_feedback_rest('ts-deployment', 'seldon', request, ts_response, ts_reward, truth)

We can see the model performance on the Grafana dashboard:
http://localhost:3000/dashboard/db/mab?refresh=5s&orgId=1 (refresh to update)

We note that both the Epsilon greedy and Thompson sampling allocate more traffic to the better performing model (XGBoost) over time, but Thompson Sampling does so at a quicker rate as evidenced by the superior metrics (F1 score in particular).

## Clean-up

In [None]:
# delete data
!rm default-of-credit-card-clients-dataset.zip
!rm UCI_Credit_Card.csv

# delete trained models
!rm models/rf_model/RFModel.sav
!rm models/xgb_model/XGBModel.sav

# delete Seldon deployment from the cluster
!kubectl delete sdep --all