# Unit 2
In this section we look at conditional probability.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# The Condition of Bayesville
From the example in the video, we discussed the probability of having a disease. An individual Person1 tested positive for a rare disease (maybe 1% of people have it). The test is known to be $95%$ accurate, which leaves room for error. Some key takeaways from the discussion:
- Even though the test was positive, it was still unlikey that the individual had it.
- There was a higher chance of error for false positives than false negatives.
- It lead to some questions:

We need to distinguish between the probability of having a disease- event $A$- given a postive test result- event $B$- and the probability of testing positive given that one has the disease:
\begin{equation}
  P(A|B)\neq P(B|A)
\end{equation}

Whenever new information is gathered- by observation of gathering of new evidence or data- our beliefs must be updated in light of the new data. Conditional probability is the concept that addresses this fundamental question: how should we update our beliefs in light of the evidence we observe? This gives rise to the eqution shown, or $\text{Baye's Rule}$. Baye's rule alongisde the $\text{law of total probability}$ can be used to solve many problems. 

## Thinking Conditionally
It is useful to think that _all probabilities are conditional_, because there is always background knowledge or assumptions built in. Furthermore, conditional probability allows complex problems to be decompsed into moree manageable pieces.  

# Definition and Intuition
If $A$ and $B$ are events with $P(B)>0$, then the _conditional probability_ of $A$ given $B$ is:
\begin{equation}
  P(A|B)=\frac{A\cap B}{B}
\end{equation}
We are looking at $A$ given $B$, thus $A$ is the event whose uncertainty we would like to update in light of new evidence $B$ that has been given. 
- $P(A)$ is the $prior$ probability of $A$: before updating basedd on evidence.
- $P(A|B)$ is the $posterior$ probability of $A$: after updating basedd on evidence.

### Two Cards Example
A standard deck of 52 playing cards is shuffled well. Two cards are drawn randomly, one at a time without replacement. Let $A$ be the event that the first card is a heart, and $B$ be the event that the second card is red. Find $P(A|B)$ and $P(B|A)$.\
\
To solve this problem remember to also consider what we have learnt up to this ppoint, namely the naive and general definitions of % and the counting and sampling methods.\
\
By the naive definition and the multiplication rule:
\begin{equation}
  P(A\cap B)=\frac{13*25}{52*51}=\frac{25}{204}
\end{equation}
This is true because there are 13 cards out of 52 that are hearts and, if the card is a hearts it is red, so there are 25 (not 26) out of 51 cards thereafter which would satisfy event B. \
$P(A)$ is $\frac{13}{52}=\frac{1}{4}$. \
$P(B)$ is $P(B)=\frac{26*51}{52*51}$ because for the second card can be any 26 reds of the 52 cards. For the first, it can be any of the other 52 cards that is not chosen for the second card. The multiplication rule does not require chronological ordering, thus can be expressed as given. Finally:
\begin{equation}
  P(A|B)=\frac{P(A\cap B)}{P(B)}=\frac{25/204}{1/2}=\frac{25}{102}
\end{equation}
\begin{equation}
  P(B|A)=\frac{P(B\cap A)}{P(A)}=\frac{25/204}{1/4}=\frac{25}{51}
\end{equation}
- Make sure the the order of $(|)$ is correct. Confusing $A$ and $B$ is known as _prosecutor's fallacy_.
- The chronological order in which the cards were chosen _does not_ dictate whether or not conditional %'s can be looked at. This means that both $P(A|B)$ and $P(B|A)$ make sense. 

### Intuition with Pebble World
Given a finite sample space with events $A$ and $B$ and with _total mass 1_. Pebbles in $A$ are crossed out and $B$ is bolded. \

| ~0~  ~0~  ~0~ | \
| ~0~  ~__0__~  __0__ | \
| __0__  __0__  __0__ | \

Leaning that $B$ occured, pebbles in $B^c$ can be removed. $P(A\cap B)$ is the total mass of the pebbles remaining in $A$. Then it can be _renormalized_ s.t the remaining mass has a total of 1, by dividing by $P(B)$. 

### Intuition with Frequentist Interpretation
Imagine repeating an experiment many times, randomly generating a long list of observed outcomes, each of them represented by a string of twenty-four 0's and 1's.$B$ is the event that the first digit is 1 and $A$ is the event that the second digit is 1. The conditional probability of $A$ given $B$ can then be thought of in a natural way: it is the fraction of times that $A$ occurs, restricting attention to the trials where $B$ occurs. Conditioning on $B$, we circle all the repetitions where $B$ occurred, and then we look at the fraction of circled repetitions in which event $A$ also occurred. \
\
In symbols, let $n_B, n_B, n_{AB}$ be the number of occurrences of $A,B,A\cap B$ respectively in a large number $n$ of repetitions of the experiment. The frequentist interpretation is that
\begin{equation}
  P(A)\approx \frac{n_A}{n}, P(B)\approx \frac{n_B}{n}, P(A\cap B)\approx \frac{n_AB}{n}
\end{equation}
From this frequentist view, $P(A|B)$ can be interpreted as $n_{AB}/n_B$ which is equal to $(n_{AB}/n)/(n_B/n)$ which translates to $P(A\cap B)/P(B)$.


## Baye's Rule and the Law of Total Probability
There are consequences of the simple definition of conditional % (which is a ratio of two %'s).
### Consequence 1
For events $A$ and $B$ with positive %'s.
\begin{equation}
  P(A\cap B)=P(B)P(A|B)=P(A)P(B|A)
\end{equation}
This argument seems circiular at first, because $P(A|B)$ was defined in terms of $P(A\cap B)$, but it is useful. It allows the finding of conditional %'s w/o the need to go back to the definition.
