# An example of Multi-Armed Bandits in Seldon: Epsilon Greedy Algorithm

In this notebook we will use a [Multi-Armed Bandits](https://en.wikipedia.org/wiki/Multi-armed_bandit) algorithm to deploy 3 models in parallel. The algorithm will observe rewards and learn to route requests to the best model as time goes by.

Seldon's implementation of the Epsilon Greedy algorithm is open source and available in the Seldon Core examples [here](https://github.com/SeldonIO/seldon-core/blob/master/examples/routers/epsilon_greedy/EpsilonGreedy.py)

## Setting up the stage

What follows assumes you have a cluster running with kubernetes (RBAC anabled) and kubectl pointing at it. First we will start Helm and Seldon

In [None]:
!kubectl -n kube-system create sa tiller
!kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
!helm init --service-account tiller

In [None]:
!helm install ../helm-charts/seldon-core-crd --name seldon-core-crd --set usage_metrics.enabled=true

In [None]:
!kubectl create namespace mab

In [None]:
!helm install ../helm-charts/seldon-core --name seldon-core \
        --set apife_service_type=LoadBalancer \
        --namespace mab

In [None]:
!kubectl get svc -n seldon seldon-apiserver -n mab

Setup pyhton code to do RESR and gRPC requests. **Only run this when the LoadBalancer created by GCP for the seldon-apife is running**

In [None]:
from visualizer import get_graph
import requests
from requests.auth import HTTPBasicAuth
from proto import prediction_pb2
from proto import prediction_pb2_grpc
import grpc
try:
    from commands import getoutput # python 2
except ImportError:
    from subprocess import getoutput # python 3
import matplotlib.pyplot as plt
%matplotlib inline

NAMESPACE="mab"
SELDON_API_IP=getoutput("kubectl get svc -n "+NAMESPACE+" seldon-apiserver -o jsonpath='{.status.loadBalancer.ingress[0].ip}'")

def get_token():
    payload = {'grant_type': 'client_credentials'}
    response = requests.post(
                "http://{}:8080/oauth/token".format(SELDON_API_IP),
                auth=HTTPBasicAuth('oauth-key', 'oauth-secret'),
                data=payload)
    token =  response.json()["access_token"]
    return token

def rest_request(request):
    token = get_token()
    headers = {'Authorization': 'Bearer '+token}
    response = requests.post(
                "http://{}:8080/api/v0.1/predictions".format(SELDON_API_IP),
                headers=headers,
                json=request)
    return response.json()
    
def send_feedback_rest(request,response,reward):
    token = get_token()
    headers = {"Authorization": "Bearer "+token}
    feedback = {
        "request": request,
        "response": response,
        "reward": reward
    }
    ret = requests.post(
        "http://{}:8080/api/v0.1/feedback".format(SELDON_API_IP),
        headers=headers,
        json=feedback)
    return ret.text

## Deploying the prediction graph

The graph we will deploy is as follows:

In [None]:
get_graph("resources/epsilon_greedy.json")

For the router we will use the image seldonio/mab_epsilon_greedy:1.0 that was built from the Epsilon Greedy model available in the examples. For the classifiers we will use the image seldonio/mock_classifier:1.0

The complete json for the graph is as follows:

In [None]:
!jq .spec.predictors[0].graph resources/epsilon_greedy.json

The epsilon greedy router takes the following parameters:
 - "n_branches": Required. Must match the number of children of the router (3 in this case).
 - "epsilon": Optional, defaults to 0.1. The exploration parameter of the algorithm.
 - "verbose": Optional, defaults to False. Verbose printout in the kubernetes logs.

Let's create the Seldon Deployment

In [None]:
!kubectl apply -f resources/epsilon_greedy.json -n mab

In [None]:
!kubectl get seldondeployments seldon-deployment-example -o jsonpath="{.status}" -n mab

## Routing Requests

First let's build a prediction request that we will use throughout this tutorial:

In [None]:
request = {
    "data": {
        "ndarray":[[0,0]],
        "names":["feature_1","feature_2"]
    }
}

Let's send it to our deployed predictor

In [None]:
rest_request(request)

The response metadata contains a routing dictionary that indicates which branch in the graph was selected by the router.
Before it has been sent any feedback, the Epsilon greedy algorithm will send 70% of the requests to branch 0, and 30% of the requests to the other branches (because we chose epsilon=0.3).

To test this, let's send 100 requests and observe the distribution of routings:

In [None]:
results = {0:0,1:0,2:0}
for i in range(100):
    response = rest_request(request)
    route = response.get("meta").get("routing").get("eg-router")
    results[route]+=1
for branch,n in results.items():
    print("{} requests were sent to branch {}".format(n,branch))

Now we will send feedback to the router about a particular prediction.
Feedback takes the following form:
```python
{
    request: # The original request
    response: # The response sent by seldon
    reward : # A float number representing a reward for the prediction
    truth : # Optional
}
```
To clarify, truth is for when you observe the actual value of the random variable you want to predict a posteriori. This is not used by the epsilon greedy router. All we need is the request, response and a binary reward.

First, let's get a prediction and save the response:

In [None]:
response = rest_request(request)
response

Now let's send a negative feedback about this prediction

In [None]:
send_feedback_rest(request,response,reward=0)

The epsilon greedy router keeps track of the best branch according to the observed feedbacks.

To test the behaviour of the algorithm, we will run a little simulation. In what follows we will do successive predictions and feedbacks, and send a reward of 1 every time the request was routed to branch 2, and a reward of 0 otherwise. We should observe that the algorithm starts sending requests mainly to branch 2

In [None]:
routes_history = []
for _ in range(100):
    response = rest_request(request)
    route = response.get("meta").get("routing").get("eg-router")
    if route == 2:
        send_feedback_rest(request,response,reward=1)
    else:
        send_feedback_rest(request,response,reward=0)
    routes_history.append(route)

plt.figure(figsize=(15,6))
ax = plt.scatter(range(len(routes_history)),routes_history)
ax.axes.xaxis.set_label_text("Incoming Requests over Time")
ax.axes.yaxis.set_label_text("Selected Branch")
plt.yticks([0,1,2])
_ = plt.title("Branch Chosen for Incoming Requests")

You can see that the algorithm very quickly figured out that branch 2 was the best one.

## Working with predictions in batches

So far we ignored the fact that you can get predictions in batches, but we only send a single number as a reward. What if the predictions returned deserve different rewards?

This is handled very simply: the reward you send in the feedback should correspond to the average reward of the batch.

For Example:
I have a batch of 10 requests. The algorithm routes them to branch 0 and returns 10 predictions from model A.
I observe that 4 of these predictions were accurate and 6 were wrong.
The reward I should give to this batch of predictions is then 0.4.

In [None]:
request = {
    "data": {
        "ndarray":[
            [0,0],
            [0,0],
            [0,0],
            [0,0],
            [0,0],
            [0,0],
            [0,0],
            [0,0],
            [0,0],
            [0,0]
        ],
        "names": ["feature_1","feature_2"]
    }
}

In [None]:
response = rest_request(request)
response

In [None]:
send_feedback_rest(request,response,reward=0.4)

When the epsilon greedy algorithm receives this reward associated to the 10 predictions, it will deduce that out of the 10 predictions, $0.4*10=4$ were good and $0.6*10 = 6$ were bad.

## Tear down

In [None]:
!kubectl delete -f resources/epsilon_greedy.json -n mab

In [None]:
!helm delete seldon-core --purge

In [None]:
!helm delete seldon-core-crd --purge