# Review of Probability

## CSCI E-83
## Stephen Elston

Probability theory is at the core of probabalistic programming. Therefore, it is important to have a good understanding of the principles of probability theory. 

In [5]:
import pandas as pd
import numpy as np

## Axioms of probability

All probability distributions must have a certain properties, which we refer to as the **axioms of probability**. These are:

- Probability for any set, A, is bounded between 0 and 1:  

$$0 \le P(A) \le 1 $$
- Probability of the Sample Space = 1:  

$$P(S) = \sum_{All\ i}P(a_i) = 1 $$

- The probability of finite independent unions is the sum of their probabilities:

$$P(A \cup B) = P(A) + P(B)\\ 
if\ and\ only\ if\\ 
A \cap B = 0 $$

To make these ideas concrete, let's try an example. The code in the cell below creates a data frame with the the probabilities of hair and eye color combinations. 

In [4]:
eyeHair = pd.DataFrame({'Black':[0.11, 0.03, 0.03, 0.01], 
                     'Brunette':[0.2, 0.14, 0.09, 0.05],
                     'Red':[0.04, 0.03, 0.02, 0.02],
                     'Blond':[0.01, 0.16, 0.02, 0.03]}, 
                      index = ['Brown', 'Blue', 'Hazel', 'Green'])
eyeHair

Unnamed: 0,Black,Blond,Brunette,Red
Brown,0.11,0.01,0.2,0.04
Blue,0.03,0.16,0.14,0.03
Hazel,0.03,0.02,0.09,0.02
Green,0.01,0.03,0.05,0.02


This table contains a bivariate distribution of $p(hair,eye)$. For example, the probability of a subject in this sample having black hair and brown eyes: $p(black,brown) = $0.11$.

You can see that all of these probabilities are in the range $0 \le p(hair,eye) \le 1.0$, and therefore satisfy one of the axioms. 

We can test if these probabilities add up to 1.0. 

In [19]:
np.array(eyeHair).sum()

0.9899999999999999

To within the rounding error, this probabilities add to 1.0 and satisfy another axiom. 

The question of independence or dependence is a bit more complicated, and will be addressed later. 

## Marginal distributions

In many cases 

## Conditional distributions and Bayes' Theorem

A probability distribution of one random variable can be conditionally dependent on another random variable. Bayes' theorem, also known as Bayes' rule, gives us a powerful tool to think about and analyze conditional probabilities. We can 

$$P(A \cup B) = P(A|B)P(B)\\
P(B \cup A) = P(B|A)P(A)$$

Now:

$$P(A \cup B) = P(B \cup A)$$

This leads to Bayes' theorem as follows:

$$P(A|B)P(B) = P(B|A)P(A)\\
P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

Which is Bayes' theorem. 

## Independence
