# Introducing a Two-Armed Bandit Problem

Imagine we are playing a game with two coins: one is a fair coin and the other is an unfair coin that will land on 'heads' 70% of the time. Every round, we get to select one coin to flip, and if that coin lands as a 'heads', we get a payoff of $1. If it lands 'tails' we get nothing. 

These are bernoulli trials so the count of successes is an underlying binomial distribution. The Beta distribution is the conjugate prior for the Beta and Bernoulli distribution.



In [33]:
import numpy as np
import pandas as pd
from scipy.stats import binom
import plotly.graph_objects as go
import plotly.express as px

Goals:
1. Create a object to store the results and state of the game
2. When the object is initialized, it should initialize the game with two coins: one is unfair, the other is fair
3. Create a method that will flip a chosen coin and return the result/reward
4. Track results and plot the posterior likelihood function for the selected arm
5. Record the updated distribution based on the information given and update the priors

In [34]:
class game():
    def __init__(self):
        """Initialize the game with two arms, each with their unique probabilities 'p' of success"""
        if np.random.choice([0,1]) == 0:
            self.p = np.array([0.5, 0.7])
        else:
            self.p = np.array([0.7, 0.5])
        self.points = {0:0, 1:0}
        self.rounds = {0:0, 1:0}
        self.priors = {0:np.ones(1000), 1:np.ones(1000)}
        self.p_grid = np.linspace(0, 1, 1000)
        self.figs = {0:go.Figure(),
                     1:go.Figure()
                     }
    
    def make_bet(self, selected_arm: int, n=1):
        """Make 'n' bets on the selected arm and add points if our rng is less than the probability p"""
        selection = self.p[selected_arm]
        for i in range(n):
            if np.random.uniform() <= selection:
                self.points[selected_arm] += 1
            self.rounds[selected_arm] += 1
        self._make_figs(selected_arm)

    def score(self):
        print(f"points: {self.points}, rounds: {self.rounds}")
    
    def show_answers(self):
        print(self.p)

    def _make_figs(self, selected_arm):
        n = self.rounds[selected_arm]
        k = self.points[selected_arm]

        # Start with uniform prior
        prob_data = binom.pmf(k=k, n=n, p=self.p_grid) # Likelihood 
        # The binom.pmf function returns the probability of k successes over n trials when the probability of success is p
        # We want to plot the likelihood of each of these models across all values of p from 0 to 1
        posterior = prob_data * self.priors[selected_arm]
        posterior = posterior/sum(posterior)
        self.priors[selected_arm] = posterior

        self.figs[selected_arm].update_traces(overwrite=True, marker_color="LightGrey", marker_opacity=0.2)
        self.figs[selected_arm].add_scatter(x=self.p_grid, y=posterior, mode='lines', marker_color='Blue')
        self.figs[selected_arm].update_layout(title=f"{k} successes out of {n} trials")
        self.figs[selected_arm].update_layout(showlegend=False)
        
    def show_fig(self, selected_arm):
        return self.figs[selected_arm]

In [35]:
g = game()
selected_arm = 0

In [36]:
g.make_bet(selected_arm, n=10)

In [37]:
g.score()

points: {0: 7, 1: 0}, rounds: {0: 10, 1: 0}


So now that we've been able to create our function to initialize a game and simulate flipping a coin several times, we want to be able to incorporate new information into our decisionmaking. That is, based on the results of the coin flips, how can we use that new information to determine which coin is the 'winner'?

We can use the method of Bayesian updating to put that new knowledge to use. We can use the idea that the Posterior is the Prior probability times the Likelihood of the model given the data.


Recall Bayes Theorem:

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

The hard part is translating our problem into a bayesian inference problem. 

We can translate the above number of 'wins' as a Binomial Distribution and compute our posterior probabilities of each p.

In [38]:
binom.pmf(g.points[0], g.rounds[0], 0.5) # What is the likelihood p=0.5 given the data?

0.11718750000000014

In [40]:
g.make_bet(selected_arm, 10)
g.show_fig(selected_arm)
# px.line(x=p_grid, y=posterior, title=f"Posterior Probability of a Bin({k}, {n}, p) with probability p given {k} successes and {n} trials")

Reference: [Statistical Rethinking Slides, Winter 2019](https://speakerdeck.com/rmcelreath/l02-statistical-rethinking-winter-2019?slide=22)

In [42]:
g.make_bet(selected_arm, n=100)
g.show_fig(selected_arm)
