# Space Bandits Model as a Classifier

RMSE validation of a contextual bandits model is covered [here](validation.ipynb).<br>
Sometimes, we want to compare our contextual bandits model "apples to apples" with a binary classifier. It turns out that the sigmoid function gives us a convenient way to do this.

## Toy Data
Using the same toy data used in the [toy problem notebook](toy_problem.ipynb), which we know  converges.

In [1]:
import numpy as np
import pandas as pd
from random import random, randint
import matplotlib.pyplot as plt
import gc
%config InlineBackend.figure_format='retina'
##Generate Data

from space_bandits.toy_problem import generate_dataframe

df = generate_dataframe(10000)
df.head()

Unnamed: 0,age,ARPU,action,reward
0,41.0,31.705625,0,0
1,27.0,85.884541,0,10
2,36.0,17.053245,1,0
3,48.0,59.970738,1,25
4,47.0,98.454896,2,0


We produce a dataset with randomly selected actions and 4000 rows.
## Train/Validation Split
We split the data into two equally-sized groups.

In [2]:
train = df.sample(frac=.5).copy()
val = df[~df.index.isin(train.index)].copy()
num_actions = len(train.action.unique())

## Validation Metric
We'll use the ROC AUC score as a validation metric. We'll train a simple binary classifier, a logistic regression model, to "compete" with our bandits model. This model simply predicts convert/no convert.

In [3]:
from sklearn.metrics import roc_auc_score
from sklearn.linear_model import LogisticRegression

  return f(*args, **kwds)


In [4]:
train_fts = train[['age', 'ARPU']]
#give actions as features
campaign_fts = pd.get_dummies(train.action)
campaign_fts.index = train_fts.index
X_train = pd.concat([train_fts, campaign_fts], axis=1)
#Get labels: we are predicting conversion, so 1 if reward != 0
train['convert'] = np.where(train.reward > 0, 1, 0)
Y_train = train.convert

#prepare X_val for later
val_fts = val[['age', 'ARPU']]
campaign_fts_val = pd.get_dummies(val.action)
campaign_fts_val.index = val_fts.index
X_val = pd.concat([val_fts, campaign_fts_val], axis=1)
#get validation labels as well
val['convert'] = np.where(val.reward > 0, 1, 0)
Y_val = val.convert

In [5]:
classifier = LogisticRegression()
classifier.fit(X_train, Y_train)
pred = classifier.predict_proba(X_val)[:, 1]

classifier_auc_score = roc_auc_score(Y_val, pred)
print('Logistic regression auc score: ', round(classifier_auc_score, 3))

Logistic regression auc score:  0.786




## Bandits Model
We fit a bandits model on the same data.

In [6]:
from space_bandits import NeuralBandits

model = NeuralBandits(num_actions, num_features=2, layer_sizes=[50,12])

In [7]:
model.fit(train[['age', 'ARPU']], train['action'], train['reward'])

Training neural_model-bnn for 100 steps...


# Get Expected Rewards
We collect expected reward values and add them to the validation dataframe.

In [8]:
expected_values = model.expected_values(val[['age', 'ARPU']].values)
pred = pd.DataFrame()
for a, vals in enumerate(expected_values):
    pred[a] = vals
#expected reward values
pred.index = val.index
#add them to validation df
val = pd.concat([val, pred], axis=1)
val.head()

Unnamed: 0,age,ARPU,action,reward,convert,0,1,2
0,41.0,31.705625,0,0,0,41.371531,63.260073,182.308648
1,27.0,85.884541,0,10,1,185.118165,115.596413,143.854397
3,48.0,59.970738,1,25,1,139.067486,93.874866,136.292929
4,47.0,98.454896,2,0,0,227.190699,137.435247,158.761865
5,51.0,85.20313,1,0,0,207.16042,123.381055,136.992186


## Applying the Sigmoid Function
The bandits model treats each campaign separately, so we should apply a sigmoid function to each reward column independently. To get sensible values, mean-center and normalize each expected reward column.

In [9]:
val['pred'] = .5
for a in range(num_actions):
    #mean center and normalize expected rewards
    val['{}_centered'.format(a)] = (val[a] - val[a].mean())/val[a].std()

In [10]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

#Apply sigmoid to get p_pred
for a in range(num_actions):
    #get the rows for this action
    slc = val[val.action==a]
    #pass values through sigmoid
    vals = sigmoid(slc['{}_centered'.format(a)].values)
    #assign output to appropriate rows
    inds = slc.index
    val.loc[inds, 'pred'] = vals

In [11]:
pred = val.pred

bandits_auc_score = roc_auc_score(Y_val, pred)
print('Bandits auc score: ', round(bandits_auc_score, 3))

Bandits auc score:  0.625


## Result
We see the logistic regression model performs better by this metric. This shouldn't be a surprise! The bandits model has a much harder job! It has to perform a regression for all three campaigns - the logreg model gets all the benefits of supervision and only has a single binary output.