### Batch Inference Use Case

The notebook below will walk through a use case in which the bandit algorithm utilizes batch inference. While realtime feedback is ideal and the most commonly discussed problem when analyzing multi-armed bandits, many applications of bandit algorithms will involve delayed feedback. An example of this would be discounts offered through emails as there is time between the point at which an email is sent and when a user opens the email or converts using the discount code offered.

In [1]:
from banditcoot.arms import BernoulliArm
from banditcoot.algorithms import EpsilonGreedy
import pandas as pd
import numpy as np
import seaborn as sns

ModuleNotFoundError: No module named 'banditcoot'

In [None]:
n_iter                = 100
horizon               = 200
discounts             = [0.10, 0.20, 0.30, 0.40]
true_conversion_rates = [0.03, 0.10, 0.12, 0.13]
est_conversion_rates  = [0.05, 0.08, 0.12, 0.15]
starting_counts       = [100, 100, 100, 100]
arpu = [(1-i) * 69.99 for i in discounts]
arms = [BernoulliArm(p) for p in true_conversion_rates]

algo = EpsilonGreedy(
    epsilon = 0.2,
    n_arms = 4,
    rewards = arpu,
    conv_rates = est_conversion_rates,
    counts=starting_counts
)

In [None]:
np.array(true_conversion_rates) * np.array(arpu)
np.array(est_conversion_rates) * np.array(arpu)

In [None]:
arm_values = pd.DataFrame(
    columns = [
        "sim_num",
        "cohort",
        "arm_0_value",
        "arm_1_value",
        "arm_2_value",
        "arm_3_value"
    ]
)
for i in range(n_iter):
    
    for j in range(horizon):

        # get cohort of users
        users = pd.DataFrame(
            data = {
                "cohort": 1,
                "user_id": range(100)
            }
        )

        # choose arms for cohort
        users["arm"] = [algo.select_arm() for w in users.user_id]

        # record whether conversibn occurs
        users["conversion"] = users.apply(lambda row: arms[row["arm"]].draw(), axis = 1).astype(int)

        # record revenue from chosen arms for cohort
        users["revenue"] = users.apply(lambda row: row["conversion"] * arpu[row["arm"]], axis = 1)

        # update estimated reward from k arms
        for k in range(len(arms)):
            
            update_values = users.query(f"arm=={k}") \
                .agg({
                    "user_id" : "count", 
                    "conversion" : "sum"
                })
            
            algo.batch_update(
                chosen_arm = k,
                num_times_chosen = update_values["user_id"], 
                num_successes = update_values["conversion"]
            )

        # record 
        current_values = pd.DataFrame(
            {
                "sim_num": i,
                "cohort": j,
                "arm_0_value": [algo.values[0]],
                "arm_1_value": [algo.values[1]],
                "arm_2_value": [algo.values[2]],
                "arm_3_value": [algo.values[3]]
            }
        )
        arm_values = pd.concat([arm_values,current_values], axis = 0)