In [1]:
import json
import os
import shutil
os.chdir('..')

from IPython.display import display
import numpy as np
import pandas as pd

from mab2rec import BanditRecommender, LearningPolicy, NeighborhoodPolicy

tmp_dir = "tmp_dir"
os.makedirs(tmp_dir, exist_ok=True)

# Advanced

* The goal of this notebook is to present an overview of advanced functionality. 
* **Mab2Rec Recommender:** How to interact with a Mab2Rec recommender object and brief overview of the bandit literature.
* **Persistency:** How to persist trained recommender algorithms.
* **Item Control:** Specifying items to use at both the *recommender-level* and at the *user-level*.
    * **Recommender level**: The list of items to train or score can be specified. If this list includes items not in the training data (or trained recommender), these items can be **warm started** given input `item_features`.
    * **User level**: For scoring, one can also specify a list of **eligible items** for each user that the recommender will respect.
* **Memory Efficiency:** The user features input can consume significant memory. The `user_features_dtypes` argument can be used to cast input features to data types that consume less memory. This is especially useful when many binary indicator features are used.


# Table of Contents
1. [Input Data](#Input-Data)
2. [Mab2Rec Recommender](#Mab2Rec-Recommender)
    1. [Input](#Input)
    2. [Arms](#Arms)
    3. [Fit](#Train)
    4. [Recommend](#Recommend)
    5. [Warm Start](#Warm-Start)
3. [Persistency](#Persistency)
4. [Item Control](#Item-Control)
5. [Memory Efficiency](#Memory-Efficiency)
6. [Appendix: Intro to Bandits](#Appendix:-Intro-to-Bandits)
    1. [Context-Free Bandits](#Context-Free-Bandits)
    2. [Contextual Parametric Bandits](#Contextual-Parametric-Bandits)
    3. [Contextual Non-parametric Bandits](#Contextual-Non-Parametric-Bandits)

# Input Data

* Input data is as described in [Data Overview](https://github.com/fidelity/mab2rec/blob/main/notebooks/1_data_overview.ipynb).

In [2]:
# Basic input data
train_data = "data/data_train.csv"
test_data = "data/data_test.csv"
user_features = "data/features_user.csv"
item_features = "data/features_item.csv"

# Extended input data
eligibility_data = "data/extended/data_eligibility.csv"
user_features_dtypes = "data/extended/features_user_dtypes.json"

# Read data
train_df = pd.read_csv(train_data)
test_df = pd.read_csv(test_data)
user_features_df = pd.read_csv(user_features)
item_features_df = pd.read_csv(item_features)
eligibility_df = pd.read_csv(eligibility_data)

# Mab2Rec Recommender

# Input

* The mapping between bandit input parameters and recommender input parameters are as follows: 
    * **Decisions** correspond to `item_id` column in the `data_train.csv`. 
    * **Rewards** correspond to `response` column in the `data_train.csv`. 
    * **Contexts** correspond to user features in the `features_user.csv` for each `user_id` in the `data_train.csv`.

## Arms
* When a recommender is created it has no knowledge of the items, i.e., `arms`, that can be recommended.
* During the `fit` operation, the `arms` of the recommender are initialized with unique items found in `decisions`.

In [3]:
# Data
decisions = ['Item1', 'Item1', 'Item3', 'Item1', 'Item2', 'Item3']
rewards = [0, 1, 1, 0, 1, 0]
contexts = [[0, 0, 0], [1, 0, 1], [0, 1, 1], [0, 0, 0], [1, 1, 1], [0, 1, 0]]

# Bandit: LinGreedy learning policy with epsilon set to 0.1 and top-2
rec = BanditRecommender(LearningPolicy.LinGreedy(epsilon=0.1), top_k=2)

# Training: Fit operations initializes the arms
rec.fit(decisions, rewards, contexts)

rec.mab.arms

['Item1', 'Item2', 'Item3']

* Alternatively, the arms of a recommender can be **set** directly as follows.
* The `set_arms` function simply performs a series of `add_arm` and `remove_arm` operations.

In [4]:
# Bandit: LinGreedy learning policy with epsilon set to 0.1 and top-2
rec = BanditRecommender(LearningPolicy.LinGreedy(epsilon=0.1), top_k=2)

# Set arms explicitly
rec.set_arms(['Item2', 'Item3', 'Item4'])

rec.mab.arms

['Item2', 'Item3', 'Item4']

* Arms can also be **added** or **removed** from the recommender as required.
* Added arms will not have any training data or parameters.

In [5]:
# Bandit: LinGreedy learning policy with epsilon set to 0.1 and top-2
rec = BanditRecommender(LearningPolicy.LinGreedy(epsilon=0.1), top_k=2)

# Training: Fit operations initializes the arms
rec.fit(decisions, rewards, contexts)

# Add arm
rec.add_arm("Item4")

# Remove arm
rec.remove_arm("Item1")

rec.mab.arms

['Item2', 'Item3', 'Item4']

## Fit 

* All bandit policies require historical `decisions` and corresponding `rewards` to be trained.
* Contextual learning policies and neighborhood policies require additional `contexts` data for training.
* The `fit` function is used for the initial training. 
* Following model updates are performed through `partial_fit` for continuous learning.
* The time required to fit a recommender depends on the bandit policy, the number of samples and the features in the context, if contextual.

In [6]:
# Data
decisions = ['Item1', 'Item1', 'Item3', 'Item1', 'Item2', 'Item3']
rewards = [0, 1, 1, 0, 1, 0]
contexts = [[0, 0, 0], [1, 0, 1], [0, 1, 1], [0, 0, 0], [1, 1, 1], [0, 1, 0]]

# Bandit: LinGreedy learning policy with epsilon set to 0.1 and top-2
rec = BanditRecommender(LearningPolicy.LinGreedy(epsilon=0.1), top_k=2)

# Training: Fit operation for the initial training
rec.fit(decisions, rewards, contexts)

# New feedback data becomes available
decisions_new = ['Item4', 'Item1', 'Item3']
rewards_new = [1, 1, 0]
contexts_new = [[0, 1, 1], [1, 1, 1], [0, 1, 0]]

# Partial fit for continuous learning
rec.partial_fit(decisions_new, rewards_new, contexts_new)

## Recommend

* Recommend operation returns the top-*k* arms, or a list of arms if multiple contexts are given, based on the expected reward.
* The definition of top-*k* depends on the specified learning policy.
* Optionally, scores associated with each recommend arm can be returned via `return_scores` parameter.
* Again, the definition of the score will depend on the specified learning policy.

In [7]:
# Data
decisions = ['Item1', 'Item1', 'Item3', 'Item1', 'Item2', 'Item3']
rewards = [0, 1, 1, 0, 1, 0]
contexts = [[0, 0, 0], [1, 0, 1], [0, 1, 1], [0, 0, 0], [1, 1, 1], [0, 1, 0]]

# Bandit: LinGreedy learning policy with epsilon set to 0.1 and top-2
rec = BanditRecommender(LearningPolicy.LinGreedy(epsilon=0.1), top_k=2)

# Training: Fit operation for the initial training
rec.fit(decisions, rewards, contexts)

# Generate top-k recommendation for each context
rec.recommend([[1, 1, 0], [1, 1, 1], [1, 0, 1]], return_scores=True)

([['Item2', 'Item1'], ['Item2', 'Item1'], ['Item1', 'Item2']],
 [[0.6224593312018546, 0.5825702064623147],
  [0.679178699175393, 0.6607563687658172],
  [0.6607563687658172, 0.6224593312018546]])

* Specific arms can be excluded for each individual context recommended via the `excluded_arms` parameter.

In [8]:
# Excluded arms for each individual user/context
excluded_arms = [['Item1'], [], ['Item3']]

# Generate top-k recommendation for each context
rec.recommend([[1, 1, 0], [1, 1, 1], [1, 0, 1]], excluded_arms=excluded_arms)

[['Item2', 'Item3'], ['Item2', 'Item1'], ['Item1', 'Item2']]

## Warm Start

* The [cold-start problem](https://en.wikipedia.org/wiki/Cold_start_(recommender_systems)) refers to situations where some items to be recommended have no historic data.
* The bandit recommender provides a simple warm start procedure that can be used to warm start cold arms using trained arms. 
* To run warm start, features of each of the recommender arms are required. 
* See [Feature Engineering Notebook](https://github.com/fidelity/mab2rec/blob/main/notebooks/2_feature_engineering.ipynb) for more details on creating item features.
* A distance parameter, `distance_quantile`, specifies how "close" a cold arm has to be to a warm arm is also required. 
* If the distance is set to 1.0, all cold items will be warm started and none will be warm started if it is set to 0.0

In [9]:
# Data
decisions = ['Item1', 'Item1', 'Item3', 'Item1', 'Item2', 'Item3']
rewards = [0, 1, 1, 0, 1, 0]
contexts = [[0, 0, 0], [1, 0, 1], [0, 1, 1], [0, 0, 0], [1, 1, 1], [0, 1, 0]]

# Bandit: LinGreedy learning policy with epsilon set to 0.1 and top-2
rec = BanditRecommender(LearningPolicy.LinGreedy(epsilon=0.1), top_k=2)

# Training: Fit operation for the initial training
rec.fit(decisions, rewards, contexts)

# Add new arm with no data
rec.add_arm('Item4')

# Warm start
arm_to_features = {'Item1': [1, 1], 'Item2': [1, 0], 'Item3': [0.5, 0.5], 'Item4': [1, 1]}
rec.warm_start(arm_to_features, distance_quantile=0.75)

# Generate top-k recommendation for each context
rec.recommend([[1, 1, 0], [1, 1, 1], [1, 0, 1]])

[['Item2', 'Item4'], ['Item2', 'Item4'], ['Item4', 'Item1']]

# Persistency

## Save to Pickle

* Pickling the BanditRecommender object is an easy way to store the whole model within a single file.
* This can be achieved by giving an output pickle path to the `train` pipeline. 
* Alternatively, we can run `save_pickle` on the BanditRecommender object.

In [10]:
from mab2rec import BanditRecommender, LearningPolicy
from mab2rec.pipeline import train, score
from mab2rec.utils import save_pickle, load_pickle

# Create recommender with LinGreedy regression policy
rec = BanditRecommender(LearningPolicy.LinGreedy(epsilon=0.1), top_k=4)

# Pickle path to save the artifact
pickle_path = os.path.join(tmp_dir, 'rec.pkl')

# Train and save BanditRecommender
train(rec, data='data/data_train.csv', 
      user_features='data/features_user.csv',
      save_file=pickle_path)

# Alternatively, the returned BanditRecommender object could also be saved directly
save_pickle(rec, pickle_path)

## Load from Pickle

* Loading from the saved pickle file is just as easy.

In [11]:
# Load BanditRecommender from pickle file
rec = load_pickle(pickle_path)

# Recommendations from loaded model
df = score(rec, data='data/data_test.csv', 
           user_features='data/features_user.csv')

# Results
display(df.head())

# Clean-up
os.remove(pickle_path)

Unnamed: 0,user_id,item_id,score
0,259,483,0.672446
1,259,474,0.656253
2,259,12,0.651795
3,259,64,0.650715
4,851,313,0.701218


In [12]:
# Clean up
shutil.rmtree(tmp_dir)

# Item Control

## Train Items

* The list of items to be trained can be specified using the `item_list` argument in the **train** function.
* This allows one to only train a subset of the items that occur in the training data.
* Conversely, it can also be used to specify items that do not occur in the training data, which could be warm-started using appropriate `item_features`.

In [13]:
# Scenario: Train a subset of items in train data

# List of 10 items to train
item_list = train_df['item_id'].unique()[:10].tolist()

# Train
rec = BanditRecommender(LearningPolicy.Random())
train(rec, data=train_df, item_list=item_list)

print("Num items (train data): ", train_df['item_id'].nunique())
print("Num items (recommender): ", len(rec.mab.arms))

Num items (train data):  201
Num items (recommender):  10


In [14]:
# Scenario: Train a superset of items in train data and then warm-start

# Sample training data
train_sample_df = train_df.sample(frac=0.01, random_state=12)

# Set item list to be all items in the full training data
item_list = train_df['item_id'].unique().tolist()

# Train
rec = BanditRecommender(LearningPolicy.Random())
train(rec, data=train_sample_df, item_list=item_list, item_features=item_features_df,
      warm_start=True, warm_start_distance=0.75)

print("Num items (train data): ", train_sample_df['item_id'].nunique())
print("Num items (recommender): ", len(rec.mab.arms))

Num items (train data):  161
Num items (recommender):  201


## Score Items

* Similarly, the list of items to recommend can also be specified using the `item_list` argument in the **score** function.
* Items that are not in the `item_list` will be removed from the recommender and items in the `item_list` not in the recommender will be added. 
* Same as above, if untrained (cold) items are added to the recommender, warm start can be used with the corresponding `item_features`.

In [15]:
# Scenario: Score using a subset of items in trained recommender

# Train recommender on ALL data (i.e., no item list)
rec = BanditRecommender(LearningPolicy.Random())
train(rec, data=train_df)
print("Num items (recommender): ", len(rec.mab.arms))

# Score only 10 items
item_list = train_df['item_id'].unique()[:10].tolist()
df = score(rec, data=test_df, item_list=item_list)
print("Num items (scored recommendations): ", df['item_id'].nunique())

Num items (recommender):  201
Num items (scored recommendations):  10


In [16]:
# Scenario: Score superset of items in trained recommender and then warm-start

# Sample training data
train_sample_df = train_df.sample(frac=0.01, random_state=12)

# Train using sample data
rec = BanditRecommender(LearningPolicy.Random())
train(rec, data=train_sample_df)
print("Num items (recommender): ", len(rec.mab.arms))

# Score ALL items in data after warm start
item_list = train_df['item_id'].unique().tolist()
df = score(rec, data=test_df, item_list=item_list, item_features=item_features_df,
           warm_start=True, warm_start_distance=0.75)
print("Num items (scored recommendations): ", df['item_id'].nunique())

Num items (recommender):  161
Num items (scored recommendations):  201


## Item Eligibility per User

* It is also possible to account for more fine-grained **item eligibility**.
* This becomes important for ensuring that the user gets a recommendation that they haven't seen, or when certain items have very strict requirements on who is allowed to see them.
* To use this functionality, a list of eligible items is required for each user, as shown below.

In [17]:
eligibility_df.head()

Unnamed: 0,user_id,item_id
0,259,"[432, 181, 302, 147, 186, 96, 550, 248, 498, 2..."
1,851,"[474, 4, 591, 660, 132, 268, 218, 265, 133, 49..."
2,712,"[479, 323, 333, 69, 322, 471, 269, 187, 433, 2..."
3,119,"[248, 194, 238, 118, 603, 427, 326, 185, 283, ..."
4,640,"[591, 402, 153, 582, 603, 432, 89, 298, 451, 3..."


In [18]:
# Train
rec = BanditRecommender(LearningPolicy.Random())
train(rec, data=train_df)

# Make recommendations that satisfy eligibility criteria
df = score(rec, data=test_df, item_eligibility=eligibility_df)

df.head()

Unnamed: 0,user_id,item_id,score
0,259,134,0.72978
1,259,70,0.724034
2,259,188,0.719724
3,259,132,0.719464
4,259,523,0.718629


# Memory Efficiency

- When working with large datasets, the memory-intensive default data types of DataFrames can cause out-of-memory errors, or lead to suboptimal performance.
- Mab2Rec allows data types to be specified for user features.
- The [Selective](https://github.com/fidelity/selective) library offers a [memory reduction utility](https://github.com/fidelity/selective/blob/master/feature/utils.py#L189) that can generate a DataFrame with lower memory footprint.
- Mab2Rec can directly utilize the data types of the DataFrame with reduced memory.

In [19]:
with open(user_features_dtypes) as fp:
    display(json.load(fp))

{'user_id': 'uint16',
 'u0': 'uint8',
 'u1': 'uint8',
 'u2': 'uint8',
 'u3': 'uint8',
 'u4': 'uint8',
 'u5': 'uint8',
 'u6': 'uint8',
 'u7': 'uint8',
 'u8': 'uint8',
 'u9': 'uint8',
 'u10': 'uint8',
 'u11': 'uint8',
 'u12': 'uint8',
 'u13': 'uint8',
 'u14': 'uint8',
 'u15': 'uint8',
 'u16': 'uint8',
 'u17': 'uint8',
 'u18': 'uint8',
 'u19': 'uint8',
 'u20': 'uint8',
 'u21': 'uint8',
 'u22': 'uint8',
 'u23': 'uint8',
 'u24': 'uint8',
 'u25': 'uint8',
 'u26': 'uint8',
 'u27': 'uint8',
 'u28': 'uint8',
 'u29': 'uint8',
 'u30': 'uint8',
 'u31': 'uint8'}

In [20]:
# Create recommender with LinUCB regression policy
rec = BanditRecommender(LearningPolicy.LinUCB(alpha=1.25))

# Train using specified data types for user features
train(rec, data='data/data_train.csv', 
      user_features='data/features_user.csv',
      user_features_dtypes=user_features_dtypes)

# Train using specified data types for user features
df = score(rec, data='data/data_test.csv', 
            user_features='data/features_user.csv',
            user_features_dtypes=user_features_dtypes)

# Appendix: Intro to Bandits

## Context-Free Bandits

* Context-free bandits can be used as simple baseline recommenders.
* They are fit using data of items recommended in the past, i.e, `decisions`, and corresponding responses observed for each item, i.e., `rewards`.

In [21]:
# Data
decisions = ['Item1', 'Item1', 'Item3', 'Item1', 'Item2', 'Item3']
rewards = [0, 1, 1, 0, 1, 0]

# Epsilon Greedy learning policy with random exploration set to 25%
rec = BanditRecommender(LearningPolicy.EpsilonGreedy(epsilon=0.25), top_k=2)
rec.fit(decisions, rewards)

# Generate top-k recommendation
rec.recommend()

['Item2', 'Item3']

## Contextual Parametric Bandits

* Parametric bandits assume rewards to be random and distributed independently according to a probability distribution that is specific to each arm.
* The expected reward for each arm is estimated using a **parametric** model.
* The `LinGreedy` learning policy computes the expected reward of each arm by finding a linear combination of the previous rewards of the arm.
* The policy selects the top-*k* arms based on the predicted regression values with probability 1 - $\epsilon$ and random arms with probability $\epsilon$.
* Parametric bandits are typically very efficient.
* They are fit using data of items recommended in the past (i.e, `decisions`), the corresponding response observed for each recommended item (i.e., `rewards`) and user features (i.e., `contexts`) associated with each of the recommendations.

In [22]:
# Data
decisions = ['Item1', 'Item1', 'Item3', 'Item1', 'Item2', 'Item3']
rewards = [0, 1, 1, 0, 1, 0]
contexts = [[0, 0, 0], [1, 0, 1], [0, 1, 1], [0, 0, 0], [1, 1, 1], [0, 1, 0]]

# LinGreedy learning policy with epsilon set to 0.1
rec = BanditRecommender(LearningPolicy.LinGreedy(epsilon=0.1), top_k=2)
rec.fit(decisions, rewards, contexts)

# Generate top-k recommendation for each context
rec.recommend([[1, 1, 0], [1, 1, 1], [0, 1, 0]])

[['Item2', 'Item1'], ['Item2', 'Item1'], ['Item2', 'Item3']]