In [None]:
import numpy as np
import matplotlib.pyplot as plt

# 1.2 What is Probability Theory

**Probability Theory** is a mathematical framework for computing the probability of complex events, assuming we know the probabilites of the base events.

For example:

- Given P(X=H) = 0.5 and P(X=T) = 0.5 (the base event probabilities), what is the probability of P(X={HTTH}) (the complex event) assuming order matters?

Suppose we let the $i^{th}$ event $x_i = 1$ for `heads` and $x_i = -1$ for `tails`.

We observe that for $S_{10000} = x_1 + x_2 + \dots + x_{10000}$, $S_{10000} \approx 0$

In [None]:
# we can generate k=10000 random coin flips represented by either 1 or -1 and sum the results to observe how close the answer is to 0
# repeating this experiment n=100 times allows us to view the distribution of experiment results
n=100
def generate_flips(k, n):
    # np.random.rand(k,n) creates a kxn matrix of random values in [0,1) sampled from a uniform distribution
    X = 2 * (np.random.rand(k,n) > 0.5) - 1
    S = np.sum(X, axis=0) # collapses the rows -> S is 1xn
    return S

Using probability theory, we can show that $P(|S_k| \geq 4\sqrt{k}) < 0.000002\%$ as observable below.

In [None]:
f, axs = plt.subplots(1,3)
f.set_size_inches(21,4)
ks = [100, 1000, 10000]
for i in range(0,3):
    ax=axs[i]
    k = ks[i]
    ax.hist(generate_flips(k=ks[i], n=n), bins=10)
    ax.set_xlim(-k, k)
    ax.grid()
    d = 4*np.sqrt(k)
    ax.plot([-d,-d], [0,30], 'r')
    ax.plot([d,d], [0,30], 'r')
    ax.set_xlabel('sum')
    ax.set_ylabel('frequency')
    ax.set_title(f'Summing k={k} Coin Flips Over n={n} Experiments')

Note, as $k \rightarrow \infty$, $\frac{4\sqrt{k}}{k} = \frac{4}{\sqrt{k}} \rightarrow 0$ and since $|S_k| < 4\sqrt{k}$ with a probability of $P > 99.999998\%$, it follows that as $k \rightarrow \infty$, $\frac{|S_k|}{k} \rightarrow 0$.

The formal mathematical structure provided by the branch of Probability Theory allows us to determine the probabilities of complex events more precisely and with less computational cost than the *Monte-Carlo* simulations/experiments above. 