# Probability Theory

### Probability vs Statistics

**Probability** - predictions about future events based on models or causes that we assume; ***predicting data***

**Statistics** - analyze data and past events to infer what those models or causes could be; ***using data to predict***

##### Basic Probability

* Probability of Event = $P$
* Probability of a Complimentary Event = $1-P$
* Probability of Composite Independent Events = $\prod{P}$
* The probability of any event must be between $[0, 1]$.

##### Example - Coin Flips

$P(\text{H}) = 0.5$

$P(\text{T}) = 1 - P(\text{H}) = P(\text{not H}) = 0.5$

Across $n$ coin flips, the probability of seeing $n$ heads is $P(\text{H})^n$.


* Probability of Event = $P$
* Probability of a Complimentary Event = $1-P$
* Probability of Composite Independent Events = $\prod{P}$
* The probability of any event must be between $[0, 1]$.

##### Example - Coin Flips

$P(\text{H}) = 0.5$

$P(\text{T}) = 1 - P(\text{H}) = P(\text{not H}) = 0.5$

Across $n$ coin flips, the probability of seeing $n$ heads is $P(\text{H})^n$.

### Binomial Distribution

(See OneNote notes for further examples as to how we derive the binomial distribution formula.)

The binomial distribution can be used when there are two possible independent outcomes for an event. Examples include coin flips where the outcome is heads or tails, whether a customer buys a produt or not, or whether or not a transaction is fraudulent.

Let $p$ be the probability of a particular outcome, e.g., probability of landing on heads. Let $k$ be the number of times a particular outcome occurs, e.g., number of heads. Then the binomial  distribution is defined as:

$\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}$

The first half keeps track of the total number of ways the particular outcome can occur. For instance, if we flip a coin five times, then the number of possible ways there can be three heads is:

$\frac{5!}{3!2!} = 10$

The second half keeps track of the probability of each of the outcomes. If we continue with the example with a biased coin where $p = 0.8$, then we have:

$(0.8)^3(0.2)^2 = .02048$

Using the binomial distribution formula, the total probability is then $10 * .02048 = 0.2048$.

### Conditional Probability

When the outcome of one event depends on an earlier event, we must use conditional probabilities.

$P(A|B) = \frac{P(A \cap B)}{P(B)}$

where $|$ is read as *given* and $\cap$ represents *and*. So the above would read, "The probability of $A$ given $B$ is the probability of $A$ and $B$ divided by the probability of $B$."

$P(\text{positive} | \text{disease}) = \frac{P(\text{positive} \cap \text{disease})}{P(\text{disease})}$

$\text{posterior} = \frac{\text{joint}}{\text{prior}}$

In the OneNote examples, we rearranged the above to compute

$P(\text{positive} \cap \text{disease}) = P(\text{positive} | \text{disease})P(\text{disease})$


### Bayes Rule

Imagine we have a variable we care about that is hidden, meaning it cannot be meausred directly, but that we do have a test. Let's use the example of cancer and test that tests either positive or negative.

The knowledge we come to the problem with is known as our **prior**. In this case, our prior is the prevelance of cancer, where $C$ is our random variable for cancer.

**Prior**:  $P(\text{C})$

Once we begin testing, we begin to obtain evidence that add information to our prior. In this case, positive ($\text{T}_\text{p}$) and negative ($\text{T}_\text{n}$) test results.

**Sensitivity**: $P(\text{T}_\text{p} | \text{C})$ - true positive rate

**Specificity**: $P(\text{T}_\text{n} | \lnot\text{C})$ - true negative rate

Using our prior and our evidence, we can compute joint distributions. We will first compute the joint distribution for positive test cases.

$P(\text{C}, \text{T}_\text{p}) = P(\text{T}_\text{p} | \text{C})P(\text{C})$

$P(\lnot\text{C}, \text{T}_\text{p}) = P(\text{T}_\text{p} | \lnot\text{C})P(\lnot\text{C}) = (1 - P(\text{T}_\text{n}|\lnot\text{C}))(1 - P(\text{C}))$

Once we have our joint distributions, we can compute $P(\text{T}_\text{p})$. Note that the joint distributions are required as the conditional probabilities, $P(\text{T}_\text{p} | \text{C})$ and $P(\text{T}_\text{p} | \lnot\text{C})$ do not consider the prevelance of $C$.

$P(\text{T}_\text{p}) = P(\text{C}, \text{T}_\text{p}) + P(\lnot\text{C}, \text{T}_\text{p})$

We will use this value to normalize our joint distributions and compute our posterior probability distributions. We can think of the joint distributions as portions of a total area defined by $P({\text{T}_\text{p}})$. (See OneNote for example drawing.)

$P(\text{C} | \text{T}_\text{p}) = \frac{P(\text{C}, \text{T}_\text{p})}{P(\text{T}_\text{p})}$

$P(\lnot\text{C} | \text{T}_\text{p}) = \frac{P(\lnot\text{C}, \text{T}_\text{p})}{P(\text{T}_\text{p})}$

To compute the posterior probabilities given negative test results, $P(\text{C}|\text{T}_\text{n})$ and $P(\lnot\text{C}|\text{T}_\text{n})$, we can swap $\text{T}_\text{p}$ and $\text{T}_\text{n}$ in the above algorithm.

### Simulating Events in Python

##### Flipping a Coin

In [18]:
import numpy as np
import matplotlib.pyplot as plt
from timeit import default_timer as timer
%matplotlib inline

In [3]:
np.random.randint(2, size=10000).mean()

0.4993

In [4]:
np.random.choice([0, 1], size=10000, p=[0.8, 0.2]).mean()

0.2059

##### Two Fair Coin Flips - Percentage of Two Heads?

In [5]:
# 0 is Heads, 1 is Tails
tests = np.random.randint(2, size=(int(1e6), 2))
print(tests[:10])
(tests.sum(axis=1) == 0).mean()

[[1 0]
 [1 1]
 [0 0]
 [0 1]
 [1 1]
 [1 1]
 [1 0]
 [0 0]
 [0 0]
 [1 1]]


0.250035

##### Three Fair Coin Flips - Probality Exactly One Head?

In [6]:
tests = np.random.randint(2, size=(int(1e6), 3))
(tests.sum(axis=1) == 2).mean()

0.374306

##### Three Biased Coin Flips - $P(\text{H}) = 0.6 $ - Probability Cxactly One Head?

In [7]:
tests = np.random.choice([0,1], size=(int(1e6), 3), p=[0.6, 0.4])
(tests.sum(axis=1) == 2).mean()

0.287924

##### Die Roll - Probablity of an Even Number?

In [8]:
tests = np.random.randint(6, size=int(1e6))
(tests%2 == 0).mean()

0.500229

##### Two Dice Rolls - Probability of a Double?

In [19]:


tests = np.random.randint(6, size=(int(1e6), 2))
print([np.all(row == row[0]) for row in tests].count(True) / len(tests))
timer.time() - start

AttributeError: 'builtin_function_or_method' object has no attribute 'time'