# Likelihood based models

This notebook will outline the likelihood based approach to training on Bandit feedback.

Although before proceeding we will study the output of the simmulator in a little more detail.

In [1]:
import gym, reco_gym
from copy import deepcopy
from reco_gym import env_1_args
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = [6, 3]

ABTestNumberOfUsers=500
NumberOfProducts=10
NumberOfSamples = 20
env_1_args['phi_var']=0.0
env_1_args['number_of_flips']=0
env_1_args['sigma_mu_organic'] = 0.0
env_1_args['sigma_omega']=0
env_1_args['random_seed'] = 42
env_1_args['num_products'] = NumberOfProducts
env_1_args['K'] = 5
env_1_args['number_of_flips'] = 3

env = gym.make('reco-gym-v1')
env.init_gym(env_1_args)

In [2]:
data = deepcopy(env).generate_logs(ABTestNumberOfUsers)

# Logistic Regression Model

## Turn Data into Features

Now we are going to build a _Logistic Regression_ model.

The model will predict _the probability of the click_ for the following data:
* _`Views`_ is a total amount of views of a particular _`Product`_ shown during _Organic_ _`Events`_ **before** a _Bandit_ _`Event`_.
* _`Action`_ is a proposed _`Product`_ at a _Bandit_ _`Event`_.

For example, assume that we have _`10`_ products. In _Organic_ _`Events`_, these products  were shown to a user as follows:
<table>
    <tr>
        <th>Product ID</th>
        <th>Views</th>
    </tr>
    <tr>
        <td>0</td>
        <td>0</td>
    </tr>
    <tr>
        <td>1</td>
        <td>0</td>
    </tr>
    <tr>
        <td>2</td>
        <td>0</td>
    </tr>
    <tr>
        <td>3</td>
        <td>7</td>
    </tr>
    <tr>
        <td>4</td>
        <td>0</td>
    </tr>
    <tr>
        <td>5</td>
        <td>0</td>
    </tr>
    <tr>
        <td>6</td>
        <td>0</td>
    </tr>
    <tr>
        <td>7</td>
        <td>8</td>
    </tr>
    <tr>
        <td>8</td>
        <td>11</td>
    </tr>
    <tr>
        <td>9</td>
        <td>0</td>
    </tr>
</table>

When we want to know the probability of the click for _`Product`_ = _`8`_ with available amounts of _`Views`_, the input data for the model will be:

_`0 0 0 7 0 0 0 0 8 11 0`_ _**`8`**_

The first 10 numbers are _`Views`_ of _`Products`_ (see above), the latest one is the _`Action`_.

The output will be two numbers:
* $0^{th}$ index: $1 - \mathbb{P}_c(P=p|V)$.
* $1^{st}$ index: $\mathbb{P}_c(P=p|V)$.

Here, $\mathbb{P}_c(P=p|V)$ is the probability of the click for a _`Product`_ $p$, provided that we have _`Views`_ $V$.


In all following models, an _`Action`_ will not be used as a number, but it will be decoded as a _vector_.
In our current example, the _`Action`_ is _`8`_. Thus, it is encoded as:

_`0 0 0 0 0 0 0 0`_ _**`1`**_ _`0`_

Here,
* Vector of _`Actions`_ has a size that is equal to the _*number of `Products`*_ i.e. _`10`_.
* _`Action`_ _`8`_ is marked as _`1`_ (_`Action`_ starts with _`0`_).

In [42]:
import math
import numpy as np
import pdb

def build_train_data(data):
    """
    Build Train Data

        Parameters:
            data: offline experiment logs
                the data contains both Organic and Bandit Events
            mark_action_in_views(bool): adds an extra column into a feature set;
                the column has 1 if in an Action that corresponds to a certain Product
                 has at least one preceding View in Organic Events for the current User
            with_normalization(bool): all Views are normalized to the value in range [0; 1];
            weight_history_function(function): weight functions that assigns an appropriate weight
                for View in Organic Event for a certain Product

        Returns:
            :(features, outs)
    """
    num_products = int(data.v.max() + 1)
    number_of_users = int(data.u.max()) + 1

    history = []
    actions = []
    outs = []

    for user_id in range(number_of_users):
        views = np.zeros((0, num_products))
        for _, user_datum in data[data['u'] == user_id].iterrows():
            if user_datum['z'] == 'organic':
                assert (math.isnan(user_datum['a']))
                assert (math.isnan(user_datum['c']))
                assert (not math.isnan(user_datum['v']))

                view = int(user_datum['v'])

                tmp_view = np.zeros(num_products)

                tmp_view[view] = 1

                # Append the latest view at the beginning of all views.
                views = np.append(tmp_view[np.newaxis, :], views, axis = 0)
            else:
                assert (user_datum['z'] == 'bandit')
                assert (not math.isnan(user_datum['a']))
                assert (not math.isnan(user_datum['c']))
                assert (math.isnan(user_datum['v']))

                action = int(user_datum['a'])
                action_flags = np.zeros(num_products, dtype = np.int8)
                action_flags[int(action)] = 1

                click = int(user_datum['c'])

                history.append(views.sum(0))
                actions.append(action_flags)
                outs.append(click)

    return np.array(outs), history, actions

In [43]:
clicks, history, actions = build_train_data(data)

In [44]:
# cross the history and the actions to produce interactions
np.vstack([np.kron(aa,hh) for hh, aa in zip(history, actions)])

40033

In [45]:
len(actions)

40033

In [48]:
history[6]

array([0., 6., 2., 0., 2., 0., 9., 0., 0., 0.])

In [50]:
actions[6]

array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0], dtype=int8)

In [52]:
np.kron(actions[6],history[6])

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 6., 2., 0.,
       2., 0., 9., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

(40033, 100)

The sample of the training data you shall find below.

In [10]:
print("Train Features:\n", train_features01[0:7])
print("Click (Outputs):\n", train_outs01[0:7])

Train Features:
 [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 6. 2. 0. 2. 0. 9. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
Click (Outputs):
 [0 0 0 0 0 0 0]


In [36]:
data[0:27]

Unnamed: 0,a,c,ps,ps-a,t,u,v,z
0,,,,,0,0,0.0,organic
1,3.0,0.0,0.1,"[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, ...",1,0,,bandit
2,4.0,0.0,0.1,"[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, ...",2,0,,bandit
3,5.0,0.0,0.1,"[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, ...",3,0,,bandit
4,,,,,0,1,1.0,organic
5,2.0,0.0,0.1,"[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, ...",1,1,,bandit
6,8.0,0.0,0.1,"[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, ...",2,1,,bandit
7,4.0,0.0,0.1,"[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, ...",3,1,,bandit
8,,,,,4,1,6.0,organic
9,,,,,5,1,6.0,organic


The training data contains a record for every bandit event.  The time in the history up until the bandit event is used to build a user profile or context vector.  Here we simply count the number of times each historical product was viewed up until that point in time.  This may loose some valuable information, the time and the sequence of the views is lost, but it seems like a reasoanble summary to make to produce a fixed dimensional user vector (which is required by many machine learning methods).

This is called the feature engineering approach to producing a user embedding, modelling approaches can also be employed: e.g. see: https://arxiv.org/abs/1904.10784