In [None]:
# Standard library
import random

# Basic data libraries
import numpy as np
import pandas as pd

# Graphs
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
from bokeh.models import Panel, Tabs, HoverTool, ColumnDataSource
output_notebook()

# Generalization of the law of large numbers

## Introduction

In December 2020, Kaggle launched its annual [Santa's Christmas competition](https://www.kaggle.com/c/santa-2020/overview/environment-rules).
This year participants were trying to solve multi-armed bandit problem - 
having a set of 100 one-armed bandits (slot machines), each with a random initial probability
of a reward which is not known to participants. 
Each machine can output a reward according to its probability of a reward when pulled.
The goal is to get the highest possible reward, while the number of pulls (trials) is limited.
There were two twists to the classic problem, your agent plays
against another agent and the likelihood of reward decreases by 3 % after
each machine pull (i.e. probability of a reward after $N$-th pull is $P_{N} = P_0 \cdot 0.97^N$).

In order to do well in the competition, one needs to estimate the initial likelihood of reward $P_0$ 
of a machine as precisely as possible. 
This is not possible to do with the original law of large numbers
as it is stated only for the case where random variables (results of trials) are independent
and identically distributed. In our case, they are not identically distributed as the
distribution changes after each trial. It is not a simple Bernoulli trial, because
the probability of success changes. We need to restate the law of large numbers
in such a way that the condition of the probability of success not changing is lifted.

## Generalization using history

The original law of large numbers works only with counts.
Be it for one-armed bandits, where we need to know only the number
of times when we did or didn't get a reward, or for a simple dice rolling,
where we count occurrences of all values.
It does not work with history because it does not matter
whether one-armed bandit produces history [0, 0, 1, 1] or [1, 0, 0, 1].
The count of values is still the same and we can assume that probability to get
a reward is 50 % as the average converges to the expected value (in this case, the probability of a reward).
The information about history is redundant.
This is the consequence of the fact that the probability of a reward does not change in time, i.e. $P_N = P_0$.

If the likelihood of reward changes in time, we need to take into account history.
The probability of a reward depends only on the initial probability and the number of trials,
after $N$-th trial it is $P_N = f(P_0, N)$. The history is composed out of trial outcomes $E_i$
and has length $N$.
While $f(P_0, N)$ is set, for each outcome $E_i$ in history 
and every possible initial probability $P_0$ we can compute
the probability that $P_0$ produces the outcome, $p_i = P(E_i | f, P_0)$.
These individual probabilities $p_i$ multiplied together give the probability $P_H$ of the initial probability
to produce the history of outcomes when function $f$ is applied after every trial, 
$P_H (f, P_0) = \prod_i p_i$. 
The true initial probability is then most likely the one with the highest $P_H$.

> **The bandit's initial probability of a reward is most likely the one 
> which yields the history of outcomes with the highest probability.** 

Restated this way, the law works the same way as before in the case that $P_0$ does not change, 
because $f$ can be just $f(P_0, N) = P_0$.
However, it works now even in the case that the probability of a reward changes between trials.
The longer is the history of outcomes, the more precise is the estimation.

## Example

The example is from the Kaggle competition, where machines have 3% probability
decay.
A machine is pulled three times.
For the first and the third time we get a reward, on the second pull we get nothing.
The machine produced history [1, 0, 1].
What initial probability is the most likely to produce this history?

If the initial probability was 1, then after the first pull, 
the probability of success would be 0.97 and after the second pull it would be 0.9409.
The probability that the machine produces history [1, 0, 1] is:

$$
\underbrace{1}_{\substack{\text{Probability of success} \\ \text{on the first pull}}} \cdot \underbrace{(1-0.97)}_{\substack{\text{Probability of loss} \\ \text{on the second pull}}} \cdot \underbrace{0.9409}_{\substack{\text{Probability of success} \\ \text{on the third pull}}} = 0.0282
$$

With the initial probability 0, the machine can not produce this history at all. 

$$ 
0 \cdot (1 - 0) \cdot 0 = 0
$$

Now, consider the middle, initial probability 0.5, the probability of producing the history is:

$$ 
0.5 \cdot (1 - 0.485) \cdot 0.4705 = 0.1211
$$

We see that it is more likely that the initial probability of the machine was 0.5 than 1 
and the initial probability 0 is completely out of question. 

The Kaggle problem is a bit easier than a general case as the
initial probability was always integer between 0 and 100 (in percents) and not a float.
The function to find out the most likely probability can be easily vectorized,
computing the probability of producing the history for all possible initial probabilities at once.
Specifically for history [1, 0, 1] we can find out that the most likely
initial probability with 3% decay is 0.71 and the probability of producing the history
is $P_H = 0.1481$.

## Limitation

When $f(P_0, N)$ is set in such a way that $P_N$ grows or decays, similarly as in the Kaggle case,
there is a possibility that the assumed $P_0$ will not converge to the true $P_0$.
Similarly as in the original law of large numbers, the estimate of $P_0$ is the more
precise, the more observations we have.
However, unlike with the original law of large numbers, 
we can eventually reach a state where no further observations will improve the estimate.
An extreme example of this maybe be when $f(P_0, N) = \text{max}(0, P_0-1 \cdot N)$, i.e.
after the first pull onwards, the probability of a reward is zero.
Whatever the initial probability is, the assumed original probability is
dependent solely on the first trial. 
If a reward is gained, assumed probability will be $P_0 = 1$,
in the other case, it will be $P_0 = 0$.
No further trials can change the assumed $P_0$.

---

Cite as:

```
@article{wagner2021generalization,
  title   = "Generalization of the law of large numbers",
  author  = "Wagner, Jakub",
  journal = "somnambwl.netlify.app",
  year    = "2021",
  url     = "https://somnambwl.netlify.app/Science/LawOfLargeNumbers/"
}
```

## Python code

The following part contains the Python code which defines a class for a single machine. Code used in the competition, where there is a need to count with opponent pulls is available in [another Kaggle notebook](https://www.kaggle.com/somnambwl/santa-2020).

In [None]:
def probability_change(initial_probability, N):
    """
    Computes probability to gain a reward after `N`-th pull.
    
    Parameters
    ==========
    initial_probability: np.array
        Possible initial probabilities to gain a reward when a machine is pulled.
    N: int
        Number of pulls
        
    Returns
    =======
    float
        Probability to gain a reward after `N`-th pull.
    """
    return initial_probability * 0.97**N

class VendingMachine(object):
    """
    A class describing a single machine.
    
    Attributes
    ==========
    pulls: int
        How many times was this machine pulled?
    rewards: int
        How many times a reward came out of the machine after a pull?
    unknown: int
        How many times we don't know if there was a reward or not?
    losses: int
        How many times the machine was pulled and did not output a reward?
    history: list of str
        "S" is for success, "F" for failure, "?" for unknown result
        (when e.g. the opponent pulls the machine).
    probability_change: func
        Function that computes the probability to gain a reward out of the machine
        after the N-th pull.
    aop_history: list of float
        History of assumed original probabilities, i.e. what we thought
        that was the initial probability of a reward of the machine at a certain step.
        For example if it is [100, 50], after the first pull we estimated the initial
        probability to be 100 %, while after the second step we estimated it to 50 %.
    acp_history: list of float
        History of assumed current probabilities, i.e. what we thought
        that was the current probability of a reward at a certain step.
    """
    
    def __init__(self, probability_change):
        self.rewards = 0
        self.unknown = 0
        self.losses = 0
        self.pulls = 0
        self.history = []
        self.markers = []
        self.probability_change = probability_change
        self.aop_history = []
        self.acp_history = []
        
    def pulled(self, result):
        """
        Record my pull.
        
        Parameters
        ==========
        result: str
            "S" for success, "F" for loss and "?" for an unknown outcome..
        """
        self.pulls += 1
        result = str(result)
        if result == "S":
            self.rewards += 1
            self.markers.append("^")
        elif result == "?":
            self.unknowns +=1
            self.markers.append("s")
        elif result == "F":
            self.losses += 1
            self.markers.append("v")
        self.history.append(result)
        self.aop_history.append(self.assumed_original_probability)
        self.acp_history.append(self.assumed_current_probability)
        
    def assume_probability_from_history(self):
        """
        Assumes original probability of reward.
        
        As we have a history on a vending machine, we can compute
        how likely is the current probability. We basically go through
        all possible starting probabilities (for cycle hidden
        in linspace) and the probability that is most likely to 
        give the current history of pulls is the most certainly
        the original probability.
        """
        base = np.linspace(0, 1, 101)
        probabilities = np.ones(101)
        for N_pulls, event in enumerate(self.history):
            probability_of_success = self.probability_change(base, N_pulls)
            if event == "S":
                probabilities *= (probability_of_success)
            elif event == "F":
                probabilities *= (1-probability_of_success)
            elif event == "?":
                pass
        return np.argmax(probabilities) * 0.01
            
    @property
    def assumed_original_probability(self):
        """
        Returns the assumed original probability.
        """
        return self.assume_probability_from_history()
    
    @property
    def assumed_current_probability(self):
        """
        Return the assumed current probability of reward.
        
        As we can compute the original probability, we just apply
        the decay and that is the current probability of a reward.
        """
        return self.probability_change(self.assumed_original_probability, self.pulls)

In [None]:
class TrainingMachine(VendingMachine):
    """
    Extension for a single machine.
    
    In the competition, we select one of 100 machines and the environment
    returns an observation, whether we got a reward. We are now interested
    only in a single machine and estimating its initial probability of a reward,
    so this extension for a single machine adds an additional function that
    simulates a single pull.
    """
    
    def __init__(self, initial_probability, probability_change):
        super().__init__(probability_change)
        self.initial_probability = initial_probability
        self.probability = initial_probability
        self.probability_history = [initial_probability]
    
    def pull(self, me=True):
        """
        Simulate a single pull.
        
        Parameters
        ==========
        me: bool (True)
            If True, then we know the result - whether we got a reward or not.
            False is used if an opponent is present and we do not get an
            information about their rewards.
        """
        got_reward = random.random() < self.probability
        reward_string = "S" if got_reward else "F"
        if me:
            self.pulled(reward_string)
        else:
            self.pulled("?")
        self.probability = self.probability_change(self.initial_probability, self.pulls)
        self.probability_history.append(self.probability)
        
    def set_history(self, history):
        """
        Set a custom history.
        
        Useful for answering questions like
        'How much would assumed probability deviate from the true probability if,
        e.g. machine returned reward five times in a row?'
        
        Parameters
        ==========
        history: list of str
            "S" is for success, "F" for failure, "?" for unknown result
        """
        for reward in history:
            self.pulled(reward)
            self.probability = self.probability_change(self.initial_probability, self.pulls)
            self.probability_history.append(self.probability)        

## Playground

In the following cell, it is possible to try out different reward-changing functions, set different initial probability and the number of pulls.

In [None]:
def probability_change(initial_probability, N):
    """
    Computes probability to gain a reward after `N`-th pull.
    
    Parameters
    ==========
    initial_probability: np.array
        Possible initial probabilities to gain a reward when a machine is pulled.
    N: int
        Number of pulls
        
    Returns
    =======
    float
        Probability to gain a reward after `N`-th pull.
    """
    return initial_probability * 0.97**N

tm = TrainingMachine(0.5, probability_change)
# tm.set_history(["S", "S", "S", "S", "S"])   # Set a custom history
for i in range(100):   # Simulate pulls
    tm.pull()

In [None]:
green = "#3FA719"
red = "#B33F3F"
blue = "#7AB8E8"
purple = "#855CBF"
gray = "#A7848E"


# First tab with original probability

y = np.array(tm.aop_history)
y2 = np.ones(len(y)+1) * tm.initial_probability
x = np.array(list(range(len(y))))+1
x2 = list(range(len(y)+1))
df = pd.DataFrame.from_records(zip(x, y), columns=["N_trials", "assumed_original_probability"])
source = ColumnDataSource(df)
m = np.array(tm.markers)

p1 = figure(plot_width=700, plot_height=500, y_range=(-0.1, 1.1))
l1 = p1.line(x="N_trials", y="assumed_original_probability", source=source, line_width=3, color=purple, alpha=0.7, legend_label="Assumed")
p1.line(x2, y2, line_width=3, color=gray, alpha=0.7, legend_label="True")
for marker_tag, marker_type, color in zip(["^", "v", "s"], ["triangle", "inverted_triangle", "square"], [green, red, blue]):
    p1.scatter(x[m==marker_tag], y[m==marker_tag], marker=marker_type, color=color, size=8, alpha=0.3)
p1.xaxis.axis_label = "Number of trials"
p1.yaxis.axis_label = "Assumed original probability"
ht1 = HoverTool(
    tooltips=[
        ("Assumed probability", "@assumed_original_probability" ),],
    mode='vline',
    renderers=[l1])
p1.add_tools(ht1)
tab1 = Panel(child=p1, title="Assumed initial probability")


# Second tab with current probability

y1 = np.array([None]+tm.acp_history)
y2 = np.array(tm.probability_history)
x1 = np.array(list(range(len(y1))))
x2 = np.array(list(range(len(y2))))
m = np.array(tm.markers)
df = pd.DataFrame.from_records(zip(x1, y1, y2), columns=["N_trials", "assumed_current_probability", "true_current_probability"])
source = ColumnDataSource(df)


p2 = figure(plot_width=700, plot_height=500, y_range=(-0.1, 1.1))
r1 = p2.line(x="N_trials", y="true_current_probability", source=source, line_width=3, color=gray, alpha=0.8, legend_label="True")
p2.line(x1, y1, line_width=3, color=purple, alpha=0.7, legend_label="Assumed")
for marker_tag, marker_type, color in zip(["^", "v", "s"], ["triangle", "inverted_triangle", "square"], [green, red, blue]):
    mask = np.array([False]+list(m==marker_tag))
    p2.scatter(x1[mask], y1[mask], marker=marker_type, color=color, size=8, alpha=0.4)
p2.xaxis.axis_label = "Number of trials"
p2.yaxis.axis_label = "Probability"
ht2 = HoverTool(
    tooltips=[
        ("True probability", "@true_current_probability"),
        ("Assumed probability", "@assumed_current_probability" ),],
    mode='vline',
    renderers=[r1])
p2.add_tools(ht2)
tab2 = Panel(child=p2, title="Assumed current probability")


show(Tabs(tabs=[tab1, tab2]), notebook_handle=True)