In [1]:
import bandit
import numpy as np
import matplotlib.pyplot as plt

Assume we're a business that sells one item and we fulfill every order from one of two allocations. Every time an order comes in, we have a decision to make: which store do we try to fulfill from?

In [52]:
import pandas as pd 

tse = bandit.TwoStoresEnv()

data_agg = pd.DataFrame.from_dict({
    'store_idx': [0, 1],
    'p_true': tse.ptfs[:,0],
    'n_obs': [80, 70],
},).set_index('store_idx')
for store_idx in [0, 1]:
    data_agg.loc[store_idx, 'y_obs'] = np.random.binomial(
        n=data_agg.loc[store_idx, 'n_obs'],
        p=data_agg.loc[store_idx, 'p_true'],
        )
data_agg['y_obs'] = data_agg['y_obs'].astype('int')
data_agg['p_obs'] = data_agg['y_obs'] / data_agg['n_obs']
data_agg

Unnamed: 0_level_0,p_true,n_obs,y_obs,p_obs
store_idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.66,80,54,0.675
1,0.45,70,38,0.542857


Note that $p_{\text{true}}$ is the latent probability of a location to fulfill an order successfully. In the standard supervised learning program (discriminative?), we typically have at least one of two related goals:
1. Infer these probabilities from a labeled dataset;
2. Make predictions based on these estimated inferred probabilities.

Let's fit a standard supervised learning model to this:

In [99]:
import torch
import pyro
import pyro.distributions as dist

def bin_reg(X, y = None):
    n_obs = X[:, 1] # number of observations

    # define prior over location probabilities-to-fulfill
    p_prior = dist.Uniform(low=0, high=1)

    with pyro.plate('store', 2):
        p = pyro.sample(
            'p', p_prior,
        )
    
    likelihood_dist = dist.Binomial(
        total_count=n_obs, probs=p,
    )
    y_obs = pyro.sample(
        'y_obs',
        likelihood_dist,
        obs=y,
    )

In [121]:
# Example: fitting a supervised learning model to the data

import torch

X = torch.tensor(data_agg.reset_index()[['store_idx','n_obs']].values)
y = torch.tensor(data_agg['y_obs'].values)

pyro.clear_param_store()

from pyro.infer import NUTS, MCMC

nuts_kernel = NUTS(bin_reg)
mcmc = MCMC(nuts_kernel, num_samples=2_000, warmup_steps=500)

mcmc.run(X, y)

# tse.ptfs[:,0]
ptfs_pred = mcmc.get_samples()['p'].mean(axis=0)
for i in range(2):
    print(f"Store {i}: p_true = {tse.ptfs[i,0]:.3f} ;  p_pred = {ptfs_pred[i]:.3f}")

Sample: 100%|██████████| 2500/2500 [00:12, 203.81it/s, step size=8.54e-01, acc. prob=0.932]

Store 0: p_true = 0.660 ;  p_pred = 0.672
Store 1: p_true = 0.450 ;  p_pred = 0.541





Once we fit any sort of ML model, we can use the model inferences/predictions in order to make decisions.

For example, based on the above results, we would conclude that fulfillment location 0 has a higher baseline probability of being able to fulfill orders, and with this in mind, it would make sense to always allocate orders to this location (remember, this is a very simplified problem).

## Quick decision theory

We can formalize this decision-making process by assigning a cost to a failure and a success. Let's assume a successful fulfillment gives us nothing, but a failure costs us $5. We can write the decision loss/cost (higher is worse!) for deciding to allocate an order to a particular location as

$$ L = 5 \left(1-y\right) $$

where $y$ indicates the eventual result of the allocation. Since the eventual result of any allocation is of course uncertain, we should talk about the "expected" loss for this allocation decision:

$$ \mathbb{E}\left[L\right] = 5 \left( 1- \mathbb{E}\left[y\right]\right) $$


Based on our above inference, the expected result varies by store, and so we have a different expected loss per allocation for each store:

In [122]:
pd.DataFrame.from_dict({
    'store_idx': [0, 1],
    'ptfs_pred': ptfs_pred,
    'expected_loss': 5*(1-ptfs_pred),
})

Unnamed: 0,store_idx,ptfs_pred,expected_loss
0,0,0.671967,1.640167
1,1,0.541167,2.294164


Thus our intuition is borne out by combining our probabilistic belief of how the world works --- the PTFs inferred by our model --- with the decision loss function. For this simple problem, following this decision analysis would lead to us always allocating orders to location 0, since it has a low expected loss.

(Technically we would probably sum this expected loss over the total expected number of orders at each store, but we aren't modeling that in this simple problem)

## Other Approaches



In [123]:
agent = bandit.Agent()

