# Bayes Theorem

![Bayes Theorm](bayes.png)

<a href="https://towardsdatascience.com/bayes-rule-with-a-simple-and-practical-example-2bce3d0f4ad0"> Image courtesy of Tirthajyoti Sarkar</a>

In which I attempt to tackle some practice problems and then move on from probabilities to distributions.

## Part 1: Bayes Practice Problems

### Problem 1: IRAs

You work at a financial services company that offers individual retirement accounts (IRAs). To target high potential customers, you want to find out if people with children are more likely to invest in IRAs.  

So far you have found:
- 30% of all Americans have IRAs.
- half of Americans have children.
- two-thirds of those with IRAs have children.

Based on this information, what is the probability that someone with children has an IRA? 

In [1]:
import numpy as np

In [3]:
p_children_ira = 2/3
P_ira = .3
P_children = .5

p_children_ira * P_ira / P_children

0.39999999999999997

---

### Problem 2: Resident Satisfaction

Suppose I own a hotel and an apartment building. Times are tight, and I need to sell one of them. Either would earn me the exact same amount of money, so that doesn't help me to make my decision. Instead, I will make my decision based on my residents' satisfaction. 

- Whichever building has a higher percentage of satisfied people is the one I will choose to keep.
- _However_, if the percent of satisfied guests for my hotel and the percent of satified guests for my apartment building are within 10% of one another, I will sell neither until I can bring in some outside consultants to help me make my decision.
- Last winter, I administered a survey to *all* residents and found that 60% of them are satisfied.
- Among satisfied respondents, two out of every five came from the apartment building.
- One-third of my residents live in my apartment building.

What action will I take?

In [7]:
p_satisfied_apartment = 2/5
p_satisfied_hotel = 3/5
p_residents_apartment = 1/3
p_residents_hotel = 2/3
p_satisfied = .6

p_satisfied_apartment = p_satisfied_apartment * p_satisfied/p_residents_apartment
print(f'{round(p_satisfied_apartment, 2)*100}% apartment residents are satisfied.')

p_satisfied_hotel = p_satisfied_hotel * p_satisfied/p_residents_hotel
print(f'{round(p_satisfied_hotel, 2)*100}% apartment residents are satisfied.')

72.0% apartment residents are satisfied.
54.0% apartment residents are satisfied.


Sell the hotel.

---

### Problem 3: Coin Game

Suppose you and your friend are playing a game. Your friend has laid four coins out in front of you. If you flip heads, you win a dollar from your friend. If you flip tails, you owe a dollar to your friend. However, the coins in front of you are not fair.
- One coin has a 80% chance of flipping heads. (Call this coin A.)
- One coin has a 60% chance of flipping heads. (Call this coin B.)
- One coin has a 40% chance of flipping heads. (Call this coin C.)
- One coin has a 10% chance of flipping heads. (Call this coin D.)

#### Problem 3 (a): Suppose you select one coin at random. That is, you don't know whether you selected coin A, B, C, or D. You flip heads. Given this data, what are the probabilities that you selected coin A, coin B, coin C, and coin D?

In [8]:
p_a = .25
p_b = .25
p_c = .25
p_d = .25

p_head_a = .8
p_head_b = .6
p_head_c = .4
p_head_d = .1

p_a_head = p_head_a * p_a / ((p_head_a + p_head_b + p_head_c + p_head_d)/4)
p_b_head = p_head_b * p_b / ((p_head_a + p_head_b + p_head_c + p_head_d)/4)
p_c_head = p_head_c * p_c / ((p_head_a + p_head_b + p_head_c + p_head_d)/4)
p_d_head = p_head_d * p_d / ((p_head_a + p_head_b + p_head_c + p_head_d)/4)

print(f'If I flip heads, there is a {round(p_a_head, 2)*100}% chance I flipped coin A.')
print(f'If I flip heads, there is a  {round(p_b_head, 2)*100}% chance I flipped coin b.')
print(f'If I flip heads, there is a  {round(p_c_head, 2)*100}% chance I flipped coin C.')
print(f'If I flip heads, there is a  {round(p_d_head, 2)*100}% chance I flipped coin D.')

If I flip heads, there is a 42.0% chance I flipped coin A.
If I flip heads, there is a  32.0% chance I flipped coin b.
If I flip heads, there is a  21.0% chance I flipped coin C.
If I flip heads, there is a  5.0% chance I flipped coin D.


#### Problem 3 (b): Suppose you are using the same coin as before. That is, you _still_ don't know whether you selected coin A, B, C, or D - although you have a better idea now that you flipped heads on your first flip! On this second flip, you flip tails. Given this data, what are the probabilities that you selected coin A, coin B, coin C, and coin D?

$$
\begin{eqnarray*}
P(A|B\cap C) &=& \frac{P(A\cap B \cap C)}{P(B\cap C)} \\
\\
&=& \frac{P(A\cap B|C)P(C)}{P(B|C)P(C)} \\
\\
&=& \frac{P(A\cap B|C)}{P(B|C)} \\
\end{eqnarray*}
$$

In [9]:
p_tail_a = 1 - p_head_a
p_tail_b = 1 - p_head_b
p_tail_c = 1 - p_head_c
p_tail_d = 1 - p_head_d

p_tail_a_head = p_tail_a * p_a_head
p_tail_b_head = p_tail_b * p_b_head
p_tail_c_head = p_tail_c * p_c_head
p_tail_d_head = p_tail_d * p_d_head

p_a_tail_head = p_tail_a_head / (p_tail_a_head + p_tail_b_head + p_tail_c_head + p_tail_d_head)
p_b_tail_head = p_tail_b_head / (p_tail_a_head + p_tail_b_head + p_tail_c_head + p_tail_d_head)
p_c_tail_head = p_tail_c_head / (p_tail_a_head + p_tail_b_head + p_tail_c_head + p_tail_d_head)
p_d_tail_head = p_tail_d_head / (p_tail_a_head + p_tail_b_head + p_tail_c_head + p_tail_d_head)

print(f'The probability I selected coin A is {round(p_a_tail_head, 2)*100}%.')
print(f'The probability I selected coin B is {round(p_b_tail_head, 2)*100}%.')
print(f'The probability I selected coin C is {round(p_c_tail_head, 2)*100}%.')
print(f'The probability I selected coin D is {round(p_d_tail_head, 2)*100}%.')

The probability I selected coin A is 22.0%.
The probability I selected coin B is 33.0%.
The probability I selected coin C is 33.0%.
The probability I selected coin D is 12.0%.


#### BONUS: Problem 3(c): Suppose you are using the same coin as before. That is, you _still_ don't know whether you selected coin A, B, C, or D - although you have a better idea now that you flipped heads, then tails on your first two flips! On this third flip, you flip tails. Given this data, what are the probabilities that you selected coin A, coin B, coin C, and coin D?

In [10]:
p_tail_a_tail_head = p_tail_a * p_a_tail_head
p_tail_b_tail_head = p_tail_b * p_b_tail_head
p_tail_c_tail_head = p_tail_c * p_c_tail_head
p_tail_d_tail_head = p_tail_d * p_d_tail_head

p_a_tail_tail_head = p_tail_a_tail_head / (p_tail_a_tail_head + p_tail_b_tail_head +\
                                           p_tail_c_tail_head + p_tail_d_tail_head)
p_b_tail_tail_head = p_tail_b_tail_head / (p_tail_a_tail_head + p_tail_b_tail_head +\
                                           p_tail_c_tail_head + p_tail_d_tail_head)
p_c_tail_tail_head = p_tail_c_tail_head / (p_tail_a_tail_head + p_tail_b_tail_head +\
                                           p_tail_c_tail_head + p_tail_d_tail_head)
p_d_tail_tail_head = p_tail_d_tail_head / (p_tail_a_tail_head + p_tail_b_tail_head +\
                                           p_tail_c_tail_head + p_tail_d_tail_head)

print(f'The probability I selected coin A is {round(p_a_tail_tail_head, 2)*100}%.')
print(f'The probability I selected coin B is {round(p_b_tail_tail_head, 2)*100}%.')
print(f'The probability I selected coin C is {round(p_c_tail_tail_head, 2)*100}%.')
print(f'The probability I selected coin D is {round(p_d_tail_tail_head, 2)*100}%.')

The probability I selected coin A is 9.0%.
The probability I selected coin B is 27.0%.
The probability I selected coin C is 41.0%.
The probability I selected coin D is 23.0%.


---

## Part 2: Moving from Probabilities to Distributions

### Problem 4: Prior Probabilities

#### Problem 4 (a): In Problem 3, before you had flipped any coin, what was the probability that you had selected coin A? Coin B? Coin C? Coin D?

**Answer:**

25% for each coin.

#### Problem 4 (b): What is the definition of a distribution?

**Answer:**

A distribution is the set of all possible values for a variable and how often they occur.

#### Problem 4 (c): What "named distribution" (i.e. a distribution that has a common name) could we apply to a situation where selecting coins A-D is equally likely?

**Answer:**

Uniform

#### Problem 4 (d): Suppose that I have only one coin and I want to conduct inference on the probability of flipping heads, $p$. Note that $p$ is unknown. If I want to specify some prior distribution where all probabilities $p$ are equally likely, should I use a discrete or continuous distribution? Why?

**Answer:**

We want to conduct inference on our parameter $p$, which takes on an uncountably infinite number of values between 0 and 1. Since the values are uncountably infinite, we should pick a continuous distribution. 

#### Problem 4 (e): Suppose that I have only one coin and I want to conduct inference on the probability of flipping heads, $p$. Note that $p$ is unknown. If I want to specify some prior distribution where all probabilities $p$ are equally likely, then what named distribution might I use?

**Answer:**

A continuous Uniform distribution on $[0,1]$.

#### Problem 4 (f): Suppose I have only one coin with some unknown probability of flipping heads $p$. If I think 50% is the likeliest value for $p$ and, as we get farther away from 50%, that value is less and less likely, what named distribution could I use?

**Answer:**

- We might use a Normal distribution with a mean of 50%. In this case, the likeliest value for $p$ is 50% and each value for $p$ is less likely as we get farther from 50%. We'll want to select a $\sigma$ that accurately reflects how much certainty we have in 50%. For example, if we use a $Normal$(0.5,0.5), then we're relatively uncertain in the true value. If we use a $Normal$(0.5, 0.01), though, we're much more certain that 50% is the best value. (In this case, 99.7% of observations would be between 0.47 and 0.53.)
    - A downside to this choice is that the Normal distribution is supported between $-\infty$ and $+\infty$. Thus, we'll have positive probabilities outside of $[0,1]$. We'll want to select a $\sigma$ value that minimizes how much [leakage](https://arxiv.org/abs/1201.3611) we observe.
- Beta distribution is a great choice for probabilities because it's a continuous distribution defined between 0 and 1. In this case, we might select a $Beta(\alpha, \beta)$ where $\alpha = \beta$. If $\alpha = \beta$, then the distribution is symmetric. If $\alpha$ and $\beta$ are both greater than 1, then the mode (peak) will be at 0.5.

#### Problem 4 (g): When we listed out our prior probabilities for each coin, we got the same number of posterior probabilities. For example, when we had four prior probabilities (one for Coin A, one for Coin B, one for Coin C, and one for Coin D), then we had four posterior probabilities (one for Coin A, for Coin B, one for Coin C, and one for Coin D). However, if we have one prior distribution, then we will get one posterior distribution. What are some advantages to being able to summarize our posterior with one distribution instead of a big list of probabilities?

**Answer:**

- If we only need to compute one posterior distribution instead of many posterior probabilities, then we don't have to do as many manual calculations.
- If we have an uncountable number of possibilities for our prior, we cannot actually enumerate all possible posteriors. Therefore, one posterior distribution is the only way for us to properly study posterior probabilities in this case.