# Week 5: Bayesian Inference

---

Useful reference: An Introduction to Bayesian Thinking (statswithr.github.io)
 
Reading materials: 
https://towardsdatascience.com/how-bayes-theorem-helped-win-the-second-world-war-7f3be5f4676c
Can A 250-Year-Old Mathematical Theorem Find A Missing Plane? : The Two-Way : NPR
https://www.colorado.edu/amath/sites/default/files/attached-files/vallverdu08.pdf

### Exercise 0:
 
1. What is Bayes Rule?
2. What are the terms?
3. How is it different to likelihood inference?


### Exercise 1:
We are interested in determining the probability that someone has covid given that the rapid test is positive.
 
We know the following pieces of information:
* The probability the rapid test is positive given that the person does have covid is 0.95.
* The probability the rapid test is negative given that the person does not have covid is 0.90.
* The probability of the general population having covid is 0.20.
* What do each of the terms in Bayes rule refer to us calculating? 
* Why is it important to calculate probabilities like the one stated above?
 
### Exercise 2:
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?
 
Deriving posterior distributions 
 
For the following cases 
* Prior: f(p)~Beta(a, b) Likelihood: f(x| p)~Binomial(n, p)
* Prior: f(mu)Normal(a, b) Likelihood: f(x| mu, sigma)~Normal(mu, sigma) (sigma known)
* Prior: f(mu)~Laplace(a, b) Likelihood: f(x| mu, sigma)~Normal(mu, sigma) (sigma known)
 
1. Why might we use a Laplace instead of a Normal distribution?
2. Why are conjugate priors useful?
3. How do we solve the cases where we do not have closed form? 
4. If we take the regression example from the previous week 4 how can we estimate the parameters?
5. How can we determine the priors?
6. Why might the distribution of the priors be different?
7. Using a Normal prior for Beta and an inverse Gamma prior for sigma^2 can you derive the posterior distribution?
8. How does this case link to Ridge Regression?

---

## Exercise 0:

Bayes theorem describes the probability of an event happening, when there is prior knowledge of related events. It has the following formulation:

1. What is Bayes rule: $P(A|B) = \frac{P(B | A)P(A)}{P(B)}$
   1. Note, this follows from the definition of conditional probability, $P(A|B) = \frac{P(A \cap B)}{P(B)}$

2. What are the terms?:
   1. $P(A|B)$ is the conditional probability the $A$ happens given that $B$ is true.
   2. $P(B|A)$ is the conditional probability that $B$ happens given that $A$ is true. Can be interpreted as the likelihood of $A$ given a fixed $B$. $P(B|A) = L(A|B)$
   3. $P(A)$, $P(B)$ are the probabilties of observing $A$ and $B$ without any prior conditions, these are known as the *prior* and *marginal* distributions respectivly.

3. How is it different to Likelihood inference?:
   1. Likelihood inference relies solely on the observed data, whilst Bayesian Inference uses prior knowledge to improve inference results.

## Exercise 1:

In the following exercise, we want to determine the probability someone has covid given the rapid flow test is positive. This can be expressed in Bayes form as:

$P(Has Covid | test positive) = \frac{P(test positive | has covid)P(Has covid)}{P(test positive)}$

We know that:
1. $P(test positive | Has covid) = 0.95$
2. $P(\neg(test positive | Has covid)) = 0.90$
3. $P(Has covid) = 0.2$

All we need is the $P(test positive)$ marginal distribution, to do this we note:

$P(test positive) = P(test positive \cap Has covid) + P(test positive \cap \neg Has covid)$

Then using the definition of conditional probability,

$P(test positive) = P(Has covid)P(test positive | has covid) + P(\neg Has covid)P(test positive | \neg Has covid)$

$P(test positive) = 0.2*0.95 + (1-0.2)*(1-0.9) = 0.27$

Thus substituting into Bayes formula:

$P(Has Covid | test positive) = \frac{0.95*0.2}{0.27} = 0.704 \approx 71 \%$

## Exercise 2:

We define $C$ to be the event that there is a car behind door 1, and $E$ the event that monty opens a door with a goat behind it.
Then we want to find $P(C|E)$, which given Bayes theory has the form:

$P(C|E) = \frac{P(E|C)P(C)}{P(E)}$

We know that:
* P(E|C) = 1, the probability that monty opens a door with a goat behind it given there is a car behind door 1, this will be 1 as he always picks the door with a goat. (his prior knowledge).
* P(C) = \frac{1}{3}, there a three doors to choose from and without prior knowledge a 1 in three chance to pick a door with a car.
* P(E) = 1, as monty will always pick a door with a goat behind it.

$P(C|E) = \frac{\frac{1}{3}*1}{1} = \frac{1}{3}$

Then defining the probability that the remaining door has a car behind it $P(O)$. We know that:

$P(O|E) + P(C|E) = 1$ as either the car is behind door 1 or its behind the remaining door.

so $P(O|E) = 1-\frac{1}{3} = \frac{2}{3}$

### Deriving Posterior Distributions

Suppose X, Y are two RVs having joint PDF or PMF $f_{X,Y}(x,y)$, then the *marginal distribution* of $X$ is given by:
$f_X(x) = \int f_{X,Y}(x,y)\,dy$ in the continuous case. Or by the PMF
$f_X(x)= \sum{f_{X,Y}(x,y)}$ in the discrete case. This describes the probability distribution of $X$ alone.

**Conditional Distribution:** of $Y$ given $X=x$ is defined by the PDF or PMF

$f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}$

Doing a similar for the other variable, we can decompose the joint PDF as follows:

$f_{X,Y}(x,y) = f_{Y|X}(y|x)f_X(x) = f_{X|Y}(x|y)f_Y(y)$

In Bayesian analysis, before data is observed, the unknow parameter is modeled as a random variable $\Theta$ with probability distribution $f_\Theta(\theta)$, called the *prior distribution*.

When we have observed data $X$, we can model the joint distribution as

$f_{X,\Theta}(x, \theta) = f_{X|\Theta}(x,\theta)f_{\Theta}(\theta)$

and the marginal distribution of $X$ (continuous case) follows:

$f_X(x) = \int f_{X,\Theta}(x, \theta)\, d\theta = \int f_{X|\Theta}(x|\theta)f_{\Theta}(\theta)\, d\theta$

Hence the conditional distribution (**Posterior distribution**) of $\Theta$ given $X=x$ is

$f_{\Theta|X}(\theta|x) = \frac{f_{X,\Theta}(x,\theta)}{f_X(x)} = \frac{f_{X|\Theta}(x|\theta)f_\Theta(\theta)}{ \int f_{X|\Theta}(x|\theta')f_{\Theta}(\theta')\, d\theta' }$


Ususally summarised as $f_{\Theta|X}(\theta|x) \propto f_{X|\Theta}(x,\theta)f_\Theta(\theta)$ i.e

$Posterior density \propto Likelihood * Prior density$

1. Prior: f(p)~Beta(a, b) Likelihood: f(x| p)~Binomial(n, p)
2. Prior: f(mu)Normal(a, b) Likelihood: f(x| mu, sigma)~Normal(mu, sigma) (sigma known)
3. Prior: f(mu)~Laplace(a, b) Likelihood: f(x| mu, sigma)~Normal(mu, sigma) (sigma known)

1. Posterior $f_{P|X}(p|x) = f_{X|P}(x|p)f_P(p) = Binomial(n, p) * Beta(a, b)$

$$Binomial(n, p) = {n \choose k} p^k (1-p)^{n-k}$$

$$Beta(\alpha, \beta) = \frac{x^{\alpha -1}(1-x)^{\beta -1}}{B(\alpha, \beta)}\quad \text{Where: }B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}$$


