In [None]:
import numpy as np
import matplotlib.pyplot as plt

# 1.2 What is Probability Theory

**Probability Theory** is a mathematical framework for computing the probability of complex events, assuming we know the probabilites of the base events.

For example:

- Given P(X=H) = 0.5 and P(X=T) = 0.5 (the base event probabilities), what is the probability of P(X={HTTH}) (the complex event) assuming order matters?

Suppose we let the $i^{th}$ event $x_i = 1$ for `heads` and $x_i = -1$ for `tails`.

We observe that for $S_{10000} = x_1 + x_2 + \dots + x_{10000}$, $S_{10000} \approx 0$

In [None]:
# we can generate k=10000 random coin flips represented by either 1 or -1 and sum the results to observe how close the answer is to 0
# repeating this experiment n=100 times allows us to view the distribution of experiment results
n=100
def generate_flips(k, n):
    # np.random.rand(k,n) creates a kxn matrix of random values in [0,1) sampled from a uniform distribution
    X = 2 * (np.random.rand(k,n) > 0.5) - 1
    S = np.sum(X, axis=0) # collapses the rows -> S is 1xn
    return S

Using probability theory, we can show that $P(|S_k| \geq 4\sqrt{k}) < 0.000002\%$ as observable below.

In [None]:
f, axs = plt.subplots(1,3)
f.set_size_inches(21,4)
ks = [100, 1000, 10000]
for i in range(0,3):
    ax=axs[i]
    k = ks[i]
    ax.hist(generate_flips(k=ks[i], n=n), bins=10)
    ax.set_xlim(-k, k)
    ax.grid()
    d = 4*np.sqrt(k)
    ax.plot([-d,-d], [0,30], 'r')
    ax.plot([d,d], [0,30], 'r')
    ax.set_xlabel('sum')
    ax.set_ylabel('frequency')
    ax.set_title(f'Summing k={k} Coin Flips Over n={n} Experiments')

Note, as $k \rightarrow \infty$, $\frac{4\sqrt{k}}{k} = \frac{4}{\sqrt{k}} \rightarrow 0$ and since $|S_k| < 4\sqrt{k}$ with a probability of $P > 99.999998\%$, it follows that as $k \rightarrow \infty$, $\frac{|S_k|}{k} \rightarrow 0$.

The formal mathematical structure provided by the branch of Probability Theory allows us to determine the probabilities of complex events more precisely and with less computational cost than the *Monte-Carlo* simulations/experiments above. 

# 1.3 What is Statistics?

Where as **Probability Theory** deals with computing *complex probabilities* given the underlying *base probabilites*, **Statistics** attempts to compute the *base probabilities* given **data** generated by some complex stochastic process.

For example:

- We flip a coin 1000 times and get 570 heads. Can we conclude $P(Heads) = 0.5$ i.e. is the coin fair?

Answer using the logic of Statistical Inference:

- Suppose the coin is fair.

- Using **probability theory**, calculate the probability of getting 570 heads assuming the coin is fair.

    - from our previous formulation we see that $S_{k=1000} =  570 - 430 = 140$

    - since $S_{k=1000} = 140 > 126.49 \approx 4\sqrt{k=1000}$ and $P(S_k > 4\sqrt{k})$ is incredibly low, $P(S_{k=1000} = 140)$ must also be incredibly low. 

- If this probability is smaller than some pre-determined threashold, we can **reject** *with confidence* the *hypothesis* that the coin is fair (hypothesis testing).

    - since $S_k > 4\sqrt{k}$, we can reject with *confidence* the hypothesis that the coin is fair.


Statistical inferencing can be used to test hypotheses across various real-world applications:

- do the number of votes recieved by a political party during an election fall within the expected probabilistic range assuming the election was fair?

- does the measured engagement between two different versions of a webpage indicate a superioir design or normal statistical variation? 

# 1.4 The-Three-Card-Puzzle

Suppose we have `three cards in a hat`. One card is `blue on both sides`, one card is `red on both sides` and the last card is `blue on one side and red on the other`. Given a random draw from this hat suppose someone makes a wager where `you win 1 dollar if the colour on the other side is different` and `they win 1 dollar if the colour on the other side is the same`. They argue that for any given colour, there is one card that has the other colour on the back and one card with the same colour on the back $\rightarrow$ the bet is fair.

Let us test this scenario with a monte-carlo simulation:

In [None]:
hat = [('red', 'red'), ('blue','blue'), ('red', 'blue')]
counts = {'same':0, 'diff': 0}

for i in range(100):
    draw = int(np.random.randint(0,3))
    side = int(np.random.randint(0,2))

    if hat[draw][(side+1)%2] == hat[draw][side]:
        counts['same'] += 1
    else:
        counts['diff'] += 1


print(counts)

plt.figure(figsize=(6,4))
plt.bar(x=list(counts.keys()), height=list(counts.values()), width=[0.3,0.3], align="center", )
plt.title('Same vs Different Coloured Sides of Randomly Drawn Cards')
plt.grid()

We see the results of the simulation do not agree with the argumentation. This is because only 1/3 cards result in a win while 2/3 result in a loss, meaning you are twice as likely to lose as you are to win.

Thus your earnings per round can be calculated as $1\$ \times (1/3) - 1\$ \times (2/3) = -0.33\$$ which means you lose roughly $33$ cents per round.

Another insightful interpretation is that regardless of the colour you see, that colour is twice as likely to exist on a homogeneous card than the heterogeneous card.

The original argument, while sounding convincing, does not take into account the concepts of **outcome** and **event**. 

# 1.5 History of Probability and Statistics

### Frequentist Perspective

- To assign a probability to an outcome is the same as saying the frequency of said outcome across repeated trials converges to the assigned probability.

- It is this conceptualization upon which probability theory is built.

- While this approach makes sense in situations where random choices can be repeated, it is not always possible.  

### Bayesian Perspective

- Useful in situations where it doesnt make sense to repeat an experiment multiple times such as patient diagnosis or weather predictions

- While both frequentist and bayesian statistics use the same math, bayesian statistics deals with combining and evaluating different pieces of evidence

- Much of the discussion around bayesian statistics is not mathematical as we are dealing with opinion or pieces of evidence that are not necessarily quantifiable