See original notebook : https://drive.google.com/drive/folders/16qThuQ-I4_0Rg2LFGUkiD4PbEnmQX3UH?usp=sharing

# Imports

1) Mount Google Drive to access files

In [None]:
from google.colab import drive
drive.mount('/content/drive')

2) Add a shortcut from the shared folder _Contextual_Bandit_and_Thompson_Sampling_ into your Google Drive folders

3) Change current folder for the _Contextual_Bandit_and_Thompson_Sampling_ folder (don't forget to change the file path if different)

In [None]:
cd /content/drive/MyDrive/Contextual_Bandit_and_Thompson_Sampling

## Problem description

#### Veepee's business description

Veepee is an online flash sales website which offers a large number of new sales everyday with huge discounts up to 70%. Sales are available for a very short period of time.

On Veepee's website, there are **about 200 flash sales on the home page** on a given day divided into several sectors like fashion, accessories, toys, watches, home appliances, sports equipment, technology, wines, travel, etc.

New sales open every day and old sales either continue or stop.

#### Homepage recommendation problem

Because the number of sales (also called operations) is important, users might not scroll until the end of the homepage to see all the banners and might leave Veepee if no sales at the top of the page are relevant to them.

Thus, the main goal of the homepage customization will be to rank the banners so that the most relevant active sales for a customer appear on top of the page.

For that, we rely on the user's previous orders and preferences but also on sales popularity and other global information.

#### First connection issue

Because the ranking algorithm uses members features which are processed once a day, when a user comes for the first time, its home page is not personalized until the next day.

**The goal of these notebook is to make a first ranking by presenting the user some operations and ask him if he is interested or not.**

# Installation

- Python version 3.7.x
- `pip install -u pip`
- `pip install -r requirements.txt`

In [None]:
# In Google Colab, run this cell to import used library version

!pip install numpy==1.19.2
!pip install pandas==1.1.4
!pip install scikit-learn==0.23.2

# 1 - Random propositions

## Imports

In [None]:
import pandas as pd
import random

In [None]:
random.seed(84)

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

## Loading the data

The dataframe contains the features of the operation and the related banner. For the train we took all the operations displayed on the homepage on `2020-08-04` (these operation can be new operations or ongoing ones). For the test set, we only took the operations which started in `2020-10-30`.

In [None]:
train_ops = pd.read_pickle("train_2020-08-04.pickle")
test_ops = pd.read_pickle("test_2020-10-30.pickle")

### Operations features

In [None]:
train_ops.head()

- `operationcd`: code of the operation
- `secteur_principal`: first level of two of the internal taxonomy
- `sous_secteur_principal`: second level of two of the internal taxonomy
- `business_type`: 'Vente privée', 'Entertainment', 'MEDIA', 'Vin', 'One Day', 'Voyage'
- `brand`: name of the brand
- `operation_type`: 'Classique', 'VBI interne', 'Thématique', 'Vin classique', 'One Day', 'Hotel Planet', 'Séquence caviste'
- `front_secteur`: first level of two of the homepage taxonomy
- `front_sous_secteur`: second level of two of the homepage taxonomy

### Operations banners

In [None]:
from IPython import display
from base64 import b64decode

display.Image(train_ops.loc[56].banner)

So the homepage consists on 200 banners like this one displayed in two columns. On the banner you will have the brand name of the operation and a specific visual.

## Exercice Setup

In the next cells we will simulate a small survey given to a new user:

We will display operations randomly and ask the user for his/her interest. Each operation is selected completely randomly.

In [None]:
# Imports
from ipywidgets import AppLayout, Button, GridspecLayout, Image, Layout 

In [None]:
# Add empty column to record actions
train_ops['interested'] = None
test_ops['interested'] = None

In [None]:
def create_expanded_button(description, button_style):
    return Button(
        description=description,
        button_style=button_style,
        layout=Layout(height='auto', width='auto')
    )


def random_row():
    return random.randint(0, len(train_ops) - 1)


def get_banner(index, dataset):
    return Image(
        value=dataset.loc[index].banner,
        format='jpg',
        width=300,
        height=400,
    )


def plot_grid(df):
    n_rows = (len(df)//3) + (len(df) % 3 != 0)
    grid = GridspecLayout(n_rows , 3)

    cpt = 0
    for i in range(n_rows):
        for j in range(3):
            if cpt < len(df):
                grid[i, j] = get_banner(df.index[cpt], df)
                cpt += 1 
    return grid

In [None]:
# Setting the left button: Not Interested
left_button = create_expanded_button('Not Interested', 'danger')

# Setting the image in the center
current_row_number = random_row()
img = get_banner(current_row_number, train_ops)

# Setting the right button: Interested
right_button = create_expanded_button('Interested', 'success')


def on_button_clicked(b):
    """
    Update the values related to the users choice
    Choose a new operation to display
    Update the banner displayed
    """
    update_values(b)
    row_number = choose_row_number()
    update_banner(b, row_number)
    

def update_values(b):
    """
    Update the dataframe column "interested" using user's action
    """
    global current_row_number

    interested = b.description == "Interested"
    train_ops.loc[current_row_number, 'interested'] = interested
    

def choose_row_number():
    """
    Choose randomly a new operation not already seen
    """
    row_number = random_row()
    if train_ops.loc[row_number].interested is None:
        return row_number
    
    while(train_ops.loc[row_number].interested is not None):
        row_number = random_row()
    
    return row_number


def update_banner(b, row_number):
    """
    Update the value of the image widget with the new banner's string
    """
    global current_row_number

    current_row_number = row_number
    banner = train_ops.loc[current_row_number].banner
    img.value = banner

# Set the on_click function to the button
left_button.on_click(on_button_clicked)
right_button.on_click(on_button_clicked)

# https://ipywidgets.readthedocs.io/en/stable/examples/Layout%20Templates.html#AppLayout
AppLayout(
    left_sidebar=left_button,
    center=img,
    right_sidebar=right_button
)

### Sum up operations seen

#### Liked

In [None]:
viewed_ops = train_ops[~train_ops['interested'].isna()]
plot_grid(viewed_ops[viewed_ops.interested])

#### Not Liked

In [None]:
plot_grid(viewed_ops[~viewed_ops.interested.astype(bool)])

## Learn and Predict

### Logistic Regression

Let us first implement a simple logistic regression and learn the user's preferences.

We will then be able to rank the test operations based on the previous feedbacks.

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
###############################################################################
# - Train simple Logistic Regression model on the features and feedbacks      #
# - Make predictions on the test set                                          #
# - Add a new column to the test set with the predicted values                #
###############################################################################

## Prepare the data

feature_columns = ['secteur_principal', # Interesting columns
    'sous_secteur_principal', 'business_type', 'brand',
    'operation_type', 'front_secteur', 'front_sous_secteur']
all_data = pd.concat([train_ops, test_ops], ignore_index=True)
all_data = all_data.loc[:, feature_columns] # Select only columns of features
encoded_data = pd.get_dummies(all_data)  # Encode data

# Retrieve train and test set
train_encoded = encoded_data.iloc[viewed_ops.index,:]  # Only viewed ones
test_encoded = encoded_data.iloc[len(train_ops):,:]  # All test set

# Build binary labels for training set (0 for False and 1 for True)
train_label = [int(x) for x in viewed_ops.interested]

# Train the model
model = LogisticRegression(random_state=0).fit(train_encoded, train_label)

# Predict on the test set
prediction = model.predict_proba(test_encoded)[:,1]
test_ops['prediction'] = prediction
print(sorted(prediction, reverse=True))

###############################################################################

sorted_ops = test_ops.sort_values(by="prediction", ascending=False)

#### Display ranking on test

In [None]:
plot_grid(sorted_ops)

### Conclusions

###############################################################################

Let us act as a specific customer who is interested by food, drinks, sport, and any kind of travel. All corresponding offers are validated, the others are discarded.

We follow the previous guidelines for the first 20 offers (See Appendix to reproduce the experiment). Our use case leads to validate 8 of them. After analyzing this data, we can predict the interest on the test set: 4 offers among the 44 available have a score higher than 50% and would actually be accepted. It clearly fits the needs of our test customer, favoring Trips and sports (The first proposition "Isola 2000" is both), and then food. Offers number 8 to 12 have the same score of 34% and fits the interest of food-related products.

As for Google requests, these first results are the most important: we do not care a lot about ordering less interesting examples. But we can notice that all satisfying offers were correctly picked; they can be found in the 9 first offers.

We saw that the results of our example are convincing. This approach ensures that we collect data on a wide range of products because they are given randomly. On the other hand, the default of this is that the customer has to see lots of potentially uninteresting offers. In this example, the 6 last offers were not considered as interesting, whereas we should expect improvements for a really efficient website.

To be more realistic, it is hard to imagine that a online customer would spend too much time parsing random offers. Let us redo the experience with fewer data. With the 5 first offers, we already can correctly predict 4 interesting offers from our test set. Actually, even very few data can extract interesting information. However, it is an incomplete view of our needs: the interest for food has not been spotted with this small amount of data.

###############################################################################

### Appendix

To reproduce the experiment with 20 offers, set the random seed to 84. Following offers were accepted (in chronological order): Afrique, Pop Bottles, Sport-Elec, Carambar, Montagne Été, FitFiu, Sur les routes de France, Les Landes. And the other were not interesting.

For the 5-offer test, reset the random seed to 84 and validate only Afrique, Pop Bottles and Sport-Elec. 

In [None]:
%reset -f

# 2 - Online Learning

## The Contextual Bandit

The Contextual Bandit is just like the Multi-Armed bandit problem but now the reward probability distribution depends on external variables. Therefore, we add the notion of **context** or **state** to support our decision.

We're going to suppose that the probabilty of reward is of the form

$$\theta(x) = \frac{1}{1 + exp(-f(x))}$$

where 

$$f(x) = \beta_0 + \sum_{i=0}^{d}{\beta_i \cdot x_i} + \epsilon$$

which is just assuming that the probability of reward linearly depends of an external variable $x$ with logistic link.

- $x$: the context. Features of the operation.
- $d$: size of the context
- $\beta_i$: the param learned to predict the probability of interest
- $\theta(x)$: The logistic normalization to compute the probability of reward

### Logistic Regression

Let us implement a regular logistic regression, and use an $\epsilon$-greedy policy to choose which bandit to activate. We try to learn the logistic function behind each bandit:

$$\theta(x) = \frac{1}{1 + exp(-f(x))}$$

where 

$$f(x) = \beta_0 + \sum_{i=0}^{d}{\beta_i \cdot x_i} + \epsilon$$

And select the operation which maximizes $\theta(x)$, except when, with $\epsilon$ probability, we select a random action (excluding the greedy action).

In [None]:
import numpy as np  # added
import pandas as pd
import random

In [None]:
random.seed(84)

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
train_ops = pd.read_pickle("train_2020-08-04.pickle")
test_ops = pd.read_pickle("test_2020-10-30.pickle")

In [None]:
train_ops['interested'] = None
test_ops['interested'] = None

In [None]:
from ipywidgets import AppLayout, Button, GridspecLayout, Image, Layout 

In [None]:
def create_expanded_button(description, button_style):
    return Button(
        description=description,
        button_style=button_style,
        layout=Layout(height='auto', width='auto')
    )

def random_row():
    return random.randint(0, len(train_ops) - 1)

def get_banner(index, dataset):
    return Image(
        value=dataset.loc[index].banner,
        format='jpg',
        width=300,
        height=400,
    )

def plot_grid(df):
    n_rows = (len(df)//3) + (len(df) % 3 != 0)
    grid = GridspecLayout(n_rows , 3)

    cpt = 0
    for i in range(n_rows):
        for j in range(3):
            if cpt < len(df):
                grid[i, j] = get_banner(df.index[cpt], df)
                cpt += 1 
    return grid

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
left_button = create_expanded_button('Not Interested', 'danger')

current_row_number = random_row()
img = get_banner(current_row_number, train_ops)

right_button = create_expanded_button('Interested', 'success')

EPSILON = 0.1

def on_button_clicked(b):
    update_values(b)
    row_number = choose_row_number()
    update_banner(b, row_number)

    
def update_values(b):
    global current_row_number
    
    interested = b.description == "Interested"
    train_ops.loc[current_row_number, 'interested'] = interested


def update_banner(b, row_number):
    global current_row_number

    current_row_number = row_number
    banner = train_ops.loc[current_row_number].banner
    img.value = banner

ready_to_train = False
def choose_row_number(epsilon=EPSILON):
    """Choose a sample number to display

    - Make a random prediction when
      - the training is not possible
      - a random value is below epsilon
    - Else display the most probable operation based on a trained model
      (using the previous algorithm)
    
    Parameters
    ----------
    epsilon: float
        proportion of exploration - instead of exploitation

    """
    # Check if train is possible
    # (If there is at least one interesting and one not interesting)
    viewed_ops = train_ops[~train_ops['interested'].isna()]
    nb_viewed = len(viewed_ops)
    nb_true = len(viewed_ops[viewed_ops.interested])
    nb_false = nb_viewed - nb_true
    ready_to_train = nb_true and nb_false
    
    if not ready_to_train or random.random() < epsilon:
        ## Explore
        row_number = random_row()
        # Ensure that the row has not be seen so far
        while(train_ops.loc[row_number].interested is not None):
            row_number = random_row()
        return row_number
    else:
        # Exploit
        ## Train a new model
        feature_columns = ['secteur_principal', # Interesting columns
            'sous_secteur_principal', 'business_type', 'brand',
            'operation_type', 'front_secteur', 'front_sous_secteur']
        # Encode data
        all_encoded_data = pd.get_dummies(train_ops.loc[:, feature_columns])
        # Select only viewed ones for training
        train_encoded = all_encoded_data.iloc[viewed_ops.index,:]
        # Build binary labels for training set (0 for False and 1 for True)
        train_label = [int(x) for x in viewed_ops.interested]

        model = LogisticRegression(random_state=0).fit(train_encoded, train_label)

        # Select the best row (no seen yet)
        prediction = model.predict_proba(all_encoded_data)[:,1]
        best_row = np.argmax(prediction)
        while best_row in viewed_ops.index:
            prediction[best_row] = 0
            best_row = np.argmax(prediction)

    return best_rows

left_button.on_click(on_button_clicked)
right_button.on_click(on_button_clicked)

AppLayout(
    left_sidebar=left_button,
    center=img,
    right_sidebar=right_button
)

### Sum up operations seen

#### Liked

In [None]:
viewed_ops = train_ops[~train_ops['interested'].isna()]
plot_grid(viewed_ops[viewed_ops.interested])

#### Not Liked

In [None]:
plot_grid(viewed_ops[~viewed_ops.interested.astype(bool)])

### Redo Prediction

In [None]:
###############################################################################
# - Train simple Logistic Regression model on the features and feedbacks      #
# - Make predictions on the test set                                          #
# - Add a new column to the test set with the predicted values                #
###############################################################################

## Prepare the data
feature_columns = ['secteur_principal', # Interesting columns
    'sous_secteur_principal', 'business_type', 'brand',
    'operation_type', 'front_secteur', 'front_sous_secteur']
all_data = pd.concat([train_ops, test_ops], ignore_index=True)
all_data = all_data.loc[:, feature_columns]  # Select only columns of features
encoded_data = pd.get_dummies(all_data)  # Encode data

# Retrieve train and test set
train_encoded = encoded_data.iloc[viewed_ops.index,:]  # Only viewed ones
test_encoded = encoded_data.iloc[len(train_ops):,:]  # All test set

# Build binary labels for training set (0 for False and 1 for True)
train_label = [int(x) for x in viewed_ops.interested]

# Train the model
model = LogisticRegression(random_state=0).fit(train_encoded, train_label)

# Predict on the test set
prediction = model.predict_proba(test_encoded)[:,1]
test_ops['prediction'] = prediction
print(sorted(prediction, reverse=True))

###############################################################################

sorted_ops = test_ops.sort_values(by="prediction", ascending=False)

In [None]:
plot_grid(sorted_ops)

### Conclusions

###############################################################################

The previous experiment is redone (20 offers, validating food-sport-trips). We obtain 18 interesting offers, which is a lot more than the previous algorithm. However, they only deal with trips. Once an interesting sector have been detected, it really focuses on it - it is a good thing , but there is a lack of exploration, here. The two first predictions are trips: they are very highly rated. We could also understand the link with the third one "Verisure": if we go in a trip, we may need an alarm. But then, there was no focus on sport or food.

Hence, it is also important to collect some negative examples. A client can tolerate such offers if he still sees some that are attractive. Furthermore, it tends to highlight more interesting ones. One way of achieving this is to increase the `EPSILON` parameter: it ensures more exploration, more diversity.

A pathological example can be found by extending the tastes of the test customer to all kitchen-related offers. If we stick to these guidelines, the second offer "Moulinex Rowenta" is accepted; but instead of encouraging other kitchen-related objects, it repeatedly favors vacuum cleaners and electronics, which are not what we want. It quickly focuses on the previous offers... but it could go totally wrong. A complementary explaination for this setback is that the interest to "kitchen-related" objects does not perfectly fit the features described in our data: this category is far less obvious than "Trips" or "Clothes" for example.

In a nutshell, this method is efficient for short-term recommandations for it reacts efficiently to interesting offers. However, it needs to fine-tune the trade-off between exploitation (taking advantage of the data) and exploration (diversification) in order to avoid being stuck in a too precise domain.

###############################################################################

# Appendix

To reproduce the experiment with 20 offers, set again the random seed to 84. Only the two first offers are refused: "Sunny Life" and "Moulinex Rowenta". All the following are interesting for our customer (related to trips).

In [None]:
%reset -f

### Thompson Sampling

In 2011, Chapelle & Li published the paper "[An Empirical Evaluation of Thompson Sampling](https://papers.nips.cc/paper/4321-an-empirical-evaluation-of-thompson-sampling.pdf)" that helped revive the interest on Thompson Sampling, showing favorable empirical results in comparison to other heuristics. We're going to borrow the Online Logistic Regression algorithm (Algorithm 3) from the paper. Basically, it's a bayesian logistic regression where we define a prior distribution for our weights $\beta_i$, instead of learning just a single value for them (the expectation of the distribution). 

So, our model, just like the greedy algorithm, is:

$$\theta = \frac{1}{1 + exp(-f(x))}$$

where 

$$f(x) = \beta_0 + \sum_{i=0}^{d}{\beta_i \cdot x_i} + \epsilon$$

but the weights are actually assumed to be distributed as independent gaussians:

$$\beta_i = \mathcal{N}(m_i,q_i^{-1})$$

We initialize all $q_i$'s with a hyperparamenter $\lambda$, which is equivalent to the $\lambda$ used in L2 regularization. Then, at each new training example (or batch of examples) we make the following calculations:

1. Find $\textbf{w}$ as the minimizer of $\frac{1}{2}\sum_{i=1}^{d} q_i(w_i - m_i)^2 + \sum_{j=1}^{n} \textrm{log}(1 + \textrm{exp}(-y_jw^Tx_j))$
2. Update $m_i = w_i$ and perform $q_i = q_i + \sum_{j=1}^{n} x^2_{ij}p_j(1-p_j)$ where $p_j = (1 + \textrm{exp}( -w^Tx_j))^{-1}$ ([Laplace approximation](https://en.wikipedia.org/wiki/Laplace%27s_method))

In essence, we basically altered the logistic regression fitting process to accomodate distributions for the weights. Our Normal priors on the weights are iteratively updated and as the number of observations grow, our uncertainty over their means is reduced. 

We can also increase incentives for exploration or exploitation by defining a hyperparameter $\alpha$, which multiplies the variance of the Normal priors:

$$\beta_i = \mathcal{N}(m_i,\alpha \cdot{} q_i^{-1})$$

With $0 < \alpha < 1$ we reduce the variance of the Normal priors, inducing the algorithm to be greedier, whereas with $\alpha > 1$ we prioritize exploration. Let us implement the algorithm.

- $x$: the context. Features of the operation.
- $\beta_i$: the param learned to predict the probability of interest
- $\theta(x)$: The logistic normalization to compute the probability of reward


- $w$: weights vector
- $m$ and $q$: parameters of the normal priors

In [None]:
import numpy as np
import pandas as pd
import random

In [None]:
random.seed(84)
np.random.seed(7)  # Added

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
train_ops = pd.read_pickle("train_2020-08-04.pickle")
test_ops = pd.read_pickle("test_2020-10-30.pickle")

In [None]:
train_ops['interested'] = None
test_ops['interested'] = None

In [None]:
from ipywidgets import AppLayout, Button, GridspecLayout, Image, Layout 

In [None]:
def create_expanded_button(description, button_style):
    return Button(
        description=description,
        button_style=button_style,
        layout=Layout(height='auto', width='auto')
    )


def random_row():
    return random.randint(0, len(train_ops) - 1)


def get_banner(index, dataset):
    return Image(
        value=dataset.loc[index].banner,
        format='jpg',
        width=300,
        height=400,
    )


def plot_grid(df):
    n_rows = (len(df)//3) + (len(df) % 3 != 0)
    grid = GridspecLayout(n_rows , 3)

    cpt = 0
    for i in range(n_rows):
        for j in range(3):
            if cpt < len(df):
                grid[i, j] = get_banner(df.index[cpt], df)
                cpt += 1 
    return grid

In [None]:
from scipy.optimize import minimize

class OnlineLogisticRegression:

    # initializing
    def __init__(self, n_dim, lambda_=5, alpha=5.0):

        # the only hyperparameter is the deviation on the prior (L2 regularizer)
        self.lambda_ = lambda_; self.alpha = alpha

        # initializing parameters of the model
        self.n_dim = n_dim,
        self.m = np.zeros(self.n_dim)
        self.q = np.ones(self.n_dim) * self.lambda_

        # initializing weights
        self.w = np.random.normal(
            self.m,
            self.alpha * (self.q)**(-1.0),
            size = self.n_dim
        )

    # the loss function
    def loss(self, w, *args):
        X, y = args

        #######################################################################
        # Implement the computation of w                                      #
        #######################################################################

        # Loss to minimize
        loss = 0.5 * (self.q * (self.m - w)**2).sum(axis=0) + np.array([
                np.log(1. + np.exp((-1) * y[j] * w.dot(X[j])))
                for j in range(y.shape[0])
            ]).sum(axis=0)

        #######################################################################
        return loss

    # the gradient
    def grad(self, w, *args):
        X, y = args

        second_calculus = (-1) * np.array([
            y[j] *  X[j] / (1. + np.exp(y[j] * w.dot(X[j])))
            for j in range(y.shape[0])
        ]).sum(axis=0)

        w = self.q * (w - self.m) + second_calculus
        
        return w

    # method for sampling weights
    def get_weights(self):
        #######################################################################
        # Implement the computation beta_i                                    #
        #######################################################################
        
        weights = np.random.normal(
            self.m,
            self.alpha * (self.q)**(-1.0),
            size=self.n_dim
        )
        
        #######################################################################
        
        return weights

    # fitting method
    def fit(self, X, y):

        # step 1, find w
        self.w = minimize(
            self.loss,
            self.w,
            args=(X, y),
            jac=self.grad,
            method="L-BFGS-B",
            options={'maxiter': 20, 'disp':True}
        ).x
        self.m = self.w

        # step 2, update q
        #######################################################################
        # Update the value of q based on the computation of the p_i
        #######################################################################

        p = np.array([
            1. + np.exp(-self.w.dot(X[j]))
            for j in range(y.shape[0])
        ])**(-1)

        self.q += np.array([
            np.array([
                (X[j][i]**2) * p[j]*(1 - p[j])
                for j in range(p.shape[0])
            ]).sum(axis=0)
            for i in range(self.q.shape[0])
        ])
        
        #######################################################################

    # probability output method, using weights sample
    def predict_proba(self, X, mode='sample'):

        # adding intercept to X
        #X = add_constant(X)

        # sampling weights after update
        self.w = self.get_weights()

        # using weight depending on mode
        if mode == 'sample':
            w = self.w # weights are samples of posteriors
        elif mode == 'expected':
            w = self.m # weights are expected values of posteriors
        else:
            raise Exception('mode not recognized!')

        # calculating probabilities
        proba = 1 / (1 + np.exp(-1 * X.dot(w)))
        return np.array([1-proba , proba]).T

In [None]:
left_button = create_expanded_button('Not Interested', 'danger')

current_row_number = random_row()
img = get_banner(current_row_number, train_ops)

right_button = create_expanded_button('Interested', 'success')


def on_button_clicked(b):
    update_values(b)
    row_number = choose_row_number()
    update_banner(b, row_number)


def update_values(b):
    global current_row_number
    
    interested = b.description == "Interested"
    train_ops.loc[current_row_number, 'interested'] = interested


def update_banner(b, row_number):
    global current_row_number

    current_row_number = row_number
    banner = train_ops.loc[current_row_number].banner
    img.value = banner
    

def choose_row_number():
    # Create the datasets
    X = pd.get_dummies(train_ops[train_ops.columns.difference(['operationcd', 'banner', 'interested'])])
    y = train_ops.interested
    
    # Fit the Model
    olr = OnlineLogisticRegression(
        n_dim=X.shape[1]
    )
    
    olr.fit(
        X[~y.isna()].values,
        y.dropna().values
    )

    X.loc[y.isna(), 'prediction'] = X[y.isna()].apply(olr.predict_proba, axis=1).apply(pd.Series)[1].values

    # Choose the row
    best_row = X.sort_values(by="prediction", ascending=False).index[0]
    return best_row

    
left_button.on_click(on_button_clicked)
right_button.on_click(on_button_clicked)

AppLayout(
    left_sidebar=left_button,
    center=img,
    right_sidebar=right_button
)

### Sum up operations seen

#### Liked

In [None]:
viewed_ops = train_ops[~train_ops['interested'].isna()]
plot_grid(viewed_ops[viewed_ops.interested])

#### Not Liked

In [None]:
plot_grid(viewed_ops[~viewed_ops.interested.astype(bool)])

### Redo Prediction

In [None]:
###############################################################################
# - Train the Online Logistic Regression model on the features and feedbacks  #
# - Make predictions on the test set                                          #
# - Add a new column to the test set with the predicted values                #
###############################################################################

## Prepare the data
feature_columns = ['secteur_principal', # Interesting columns
    'sous_secteur_principal', 'business_type', 'brand',
    'operation_type', 'front_secteur', 'front_sous_secteur']
all_data = pd.concat([train_ops, test_ops], ignore_index=True)
all_data = all_data.loc[:, feature_columns] # Select only columns of features
encoded_data = pd.get_dummies(all_data)  # Encode data

# Retrieve train and test set
# Note: OnlineLogisticRegression fits nd.array and not pandas.DataFrame !
train_encoded = np.array(encoded_data.iloc[viewed_ops.index,:])  # Only viewed ones
test_encoded = np.array(encoded_data.iloc[len(train_ops):,:])  # All test set

print(type(train_encoded))

# Build binary labels for training set (0 for False and 1 for True)
train_label = np.array([int(x) for x in viewed_ops.interested])

# Fit the Model
olr = OnlineLogisticRegression(n_dim=train_encoded.shape[1])
olr.fit(train_encoded, train_label)

# Predict on the test set
prediction = olr.predict_proba(test_encoded)[:,1]
test_ops['prediction'] = prediction
print(sorted(prediction, reverse=True))

###############################################################################

sorted_ops = test_ops.sort_values(by="prediction", ascending=False)

In [None]:
plot_grid(sorted_ops)

### Conclusions

In these notebooks, we implemented the Contextual Bandit problem and presented two algorithms to solve it. The first, $\epsilon$-greedy, uses a regular logistic regression to get its greedy estimates about the expeceted rewards $\theta(x)$. The second, Thompson Sampling, relies on the Online Logistic Regression to learn an independent normal distribution for each of the linear model weights $\beta_i \sim \mathcal{N}(m_i, q_i ^ -1)$. We draw samples from these Normal posteriors in order to achieve randomization for our bandit choices.

###############################################################################

Still with the same test customer, we validate 9 offers over the 20 proposed. The repartition of the interesting offers during the training part is more homogeneously than for the eps-greedy algorithm: it alternates between different kind of offers... But on the contrary of the first random algorithm, it seems to propose globally more appropriate content as the session continues. For example, the four last offers of our experiment were positive ones.

The predictions on the test samples might be different for the previous ones for it does not agglomerate all the positive offers at the top. The advantages of using Thompson sampling might not be obvious with this example for the algorithm is highly stochastic. But globally, iterate the tests over more examples should better illustrate the efficiency of the randomization.

Here, it ensures we test both interesting and uninteresting offers, but keeping an eye on the customer's tastes.

Once again, the parameters of the algorithm could be fine-tuned. For example, increasing the parameter `alpha` favors exploration by impacting the variance of the distribution of the $\beta_i$.

###############################################################################

### Appendix

For reproducibility, set the random seeds to 84 for `random` library and 7 for `numpy`. It leads to accept the following (in chronological order): Maison-Villa Appartement France, Sur les Routes de France, Carambar, Canard-Duchene, Montagne Été, Village Nature, Sport-Elec, la Grèce et ses îles, Espagne.