# Basics

Probability == predications about the future (predicting data)
Statistics == analyze data from the past to infer what those models or causes could be (using data to predict)

## Probability Formula

`P (A) = 1 - P(¬A)` -- A = 1 minus the probability of **not A**

## Examples

One coin flip with a fair coin
- `P(Heads) = .5`

Two flips
- `P(Heads, Heads) = 25` (probability of heads * probability of heads)

One coin flip with rigged coin
- `P(Heads) = 0.6` therefore `P(Tails) = 0.4`
- `P(Heads, Heads) = .36` (.6 * .6)

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

Notice that the probabilities all add up to total of `1`

Probability of rolling one heads with 3 coin flips (fair coin)
- Drawing a table out shows that 3 occassions result in 1 heads. Probability of each outcome is `125. 

`3 occasions * .125 = .375`

![image-3.png](attachment:image-3.png)


# Binomial Distribution

**Binomial Distribution** determines the probability of a string of independent 'coin flip like events.'

## Truth Tables

A truth table for 4 coin flips would look like this - it's a mapping of every possible outcome

| Case | Result |
|------|--------|
| 1 | HHHH |
| 2 | HHHT |
| 3 | HHTH |
| 4 | HHTT |
| 5 | HTHH |
| 6 | HTHT |
| 7 | HTTH |
| 8 | HTTT |
| 9 | THHH |
| 10 | THHT |
| 11 | THTH |
| 12 | THTT |
| 13 | TTHH |
| 14 | TTHT |
| 15 | TTTH |
| 16 | TTTT |

- With an even amount of coins, you will have opportunities where you will have 2 of each 50% outcome. With an odd amount of coins (5, for example), you will never have a situation where there will be an equal amount of each result.
- When trying to calculate something like "how many times can there only be one heads result in 4 coin flips", think about how many places that "H" can sit in the result sequence. It'd be 4!
- In another example with 5 coins
  - How many outcomes will have 2 heads?
    - 1 heads = 5 outcomes, 2 heads = 10
      - first head can be placed in 5 different locations, meaning second can only be placed in 4. To prevent double counting, you divide the result by 2
        - `20/(2 * 1) = 10`
  - How many outcomes will have 3 heads?
    - H1 = 5 locations, H2 = 4 locations, H3 = 3 locations
      - `5*4*3 = 60` --> `60/(3 * 2 * 1)= 10`
      ![image.png](attachment:image.png)
  - 10 flips, how many outcomes have 4 heads?
    - 10 locations, then 9, then 8, then 7
      - `10*9*8*7 = 5040`
      - Remove duplicates by dividing total options (`4*3*2*1`)
        - `5040/24 = 210`
  - 10 flips, 5 heads?
    - `5040*6 = 30240`
    - `30240/(24*5) = 252`

## Binomial Counting

### Factorials

![image-2.png](attachment:image-2.png)

- Using factorials, where `n = # coin flips` and `k = # heads`, what formula represents `n=10 coin flips`, `k=5 heads`
  - `n! / k!(n-k)!`

## Binomial Probability

### Fair coin
- Probability is a number b/w 0-1
- Truth table size is numbers of "sides" or unique outcomes from an individual item (in this case, the outcomes are Heads/Tails, so 2 "sides") to the power of the number of flips
- coin flipped 10 times would have 2^10 possible outcomes, or 1024.
- To calculate probability for the examples below...
  - Calculate `n! / k!(n-k)!`
  - Divide the number above by the Truth Table size/num possibilities

- 1 head (k) in 5 flips (n)

```
5! / 1!(5-1)!
120/4! = 120/24 = 5
5 / 2^5 = 5/32 = 0.15625
```

- 3 heads (k) 5 flips (n)

```
5! / 3!(5-3)!
120/ 6 * 2
120/12 = 10
10 / 2^5 = 10/32 = 0.3125
```

### Loaded coin

```
P(H) = 0.8
P(T) = 0.2
```

- Probability of 1 head in 3 coin flips w/ loaded coins? Do this with a truth table

| Case | Result | 
|------|--------|
| 1 | HHH |
| 2 | HHT |
| 3 | HTH |
| 4 | **HTT** |
| 5 | THH |
| 6 | **THT** |
| 7 | **TTH** |
| 8 | TTT |

```
P(TTH) = (.2 * .2 * .8) * 3 = 0.096
No further math required
```

![image-3.png](attachment:image-3.png)

# Conditional Probability

Probability of the outcome of one event is affected by the outcome of another event

## Probability of Positive Test

Cancer probability
- P(C) = $P_{0}$
- P(Pos|C) = $P_{1}$
- P(Neg|¬C) = $P_{2}$

![image.png](attachment:image.png)

- What is the prob of getting a positive test for cancer for the general population, given the table above?
  - `0.1 * 0.9 + (1 - 0.1) * (1 - 0.8) = 0.27`
  - Conditional Probability Theory: 
    ```
    P(Pos) = P(C) * P(Pos | C) + P(¬C) * P(Pos|¬C)

    P(¬C) = 1 - P(C) = 1 - .1

    P(Pos|¬C) = 1 - P(Neg|¬C) = 1 - .8

    .1(.9) + (1 - .1)(1 - .8) = .27
    ```
- Given someone having cancer, the chances a test will come out positive are .9 (`P(Positive|Cancer) = 0.9`). `|` denotes the outcome of the first condition depends on the second (Probability of positive result depending on subject having cancer)

## Total Probability 

![image-2.png](attachment:image-2.png)

`P(Test)=P(Test∣Disease)P(Disease)+P(Test∣¬Disease)⋅P(¬Disease)`

![image-3.png](attachment:image-3.png)



# Bayes Rule

## Prior and Posterior Probabilities

Prior probability = probability before taking a test
Test evidence = added to prior prob
Posterior probability = resulting probability 

`Prior probability + Test Evidence ---> Posterior Probability`

![image.png](attachment:image.png)

Posterior = `P(C|Pos) = P(C) * P(Pos|C)`
            `P(not C|Pos) = P(not C) * P(Pos|not C)`







## Conditional Prob Normalization

I literally have no clue. All the material said to do was sum up the posterior probability values, which is how we got the .108 value below, which is the **normalizer**. You can divide a joint probability of two events by the normalizer and get the posterior probability=

![image.png](attachment:image.png)

I really don't understand this.

## Bayes Rule Diagram

| Term | Case            |
|------|-----------------|
| Prior Probability | P(Cancer) |
| Sensitivity | P(Pos\|Cancer) |
| Specificity | P(Neg.\|not Cancer) |

1. Cancer hypothesis = Prior prob * sensitivity
2. No cancer hypot = Prior prob * (1-sensitivity)
3. Normalizer = Cancer hypot + No cancer hypot (normally != 1)

Normalize the hypotheses w/normalizer. Normalizer represents prob of positive test and is therefore independent of cancer diagnosis, therefore can normalie both cases (cancer and no cancer)
1. Posterior prob (cancer) = Cancer hypot/normalizer
2. Posterior prob (no cancer) = No cancer hypot/normalizer
3. Adding posterior probs = 1

## Equivalent Bayes Rule Diagram

Same algorithm works if test is negative, you just replace Pos -> Neg
1. Cancer hypot = prior prob * (1-sensitivity)
2. No cancer hypot = prior prob * specificity
3. Normalizer = cancer hypot + no cancer hypot

    ```
    P(Pos, C) = P(Pos|C) P(C)
    P(Neg, C) = P(Neg|C) P(C)
    P(Pos, ¬¬C) = P(Pos|¬¬C) P(C)
    P(Neg, ¬¬C) = P(Neg|¬¬C) P(C)

    ```

![image.png](attachment:image.png)

Normalizer would be .891 + .001 = .892

![image-2.png](attachment:image-2.png)

## Final Conditional Results

To normalize, you will divide the posterior probabilities by the normalizer

![image.png](attachment:image.png)