# Module 2 Ungraded Lab 1: Introduction to Bayesian Inference

Ungraded labs are designed to give learners an opportunity to apply theoretical material presented in previous videos. The goals of this ungraded lab are to (1) apply Bayes' theorem in order to infer the value of an unknown parameter, (2) quantify your uncertainty about your inference, and (3) compare this "Bayesian" inference method with a frequentist method.  Ungraded labs can be completed when assigned or in conjunction with upcoming videos. Answers to the lab will be provided in upcoming videos.

# Problem #2: Maximum likelihood vs Bayesian inference

Imagine that, before class, I rolled a six-sided die and secretly recorded the outcome; this number is the target, $T$. Your goal in this problem is to guess $T$. 

There’s some evidence to take into account, however. Imagine that, after setting $T$, I roll the same six-sided die $10$ times. I don’t let you see how it lands, but each time I tell you (accurately) whether the number you just rolled was greater than, equal to, or less than $T$. After $10$ trials, you must guess what the target is. Here is the outcome of those $10$ trials:

$\mathbf{x} = (G, G, G, E, L, G, L, E, G, E)$

where 
- $G$ = the roll was greater than $T$
- $L$ = the roll was less than $T$, and 
- $E$ = the roll was equal to $T$.

Let's use Bayes' theorem to come up with a strategy for guessing $T$.

Bayes' theorem tells us that, for $t = 1,...,6$:

\begin{align*}
P(T = t \, | \, \mathbf{x}) = \frac{P(\mathbf{x} \, | \, T = t)P(T=t)}{P(\mathbf{x})}  = \frac{P(\mathbf{x} \, | \, T = t)P(T=t)}{\sum^6_{t=1} P(\mathbf{x} \, | \, T = t)P(T=t)},
\end{align*}

where 

- $P(\mathbf{x} \, | \, T = t)$ is the *likelihood function* of the data conditioned on $T = t$.

- $P(T=t)$ is the *prior distribution* on $T=t$. Because I rolled the die in my office, the probability that $T=t$ for $t = 1,...,6$ is $P(T=t) = 1/6$. 

- $P(\mathbf{x}) = \sum^6_{t=1} P(\mathbf{x} \, | \, T = t)P(T=t)$ is the probability of the *evidence*. The equality is given by the Law of Total Probability. 

- $P(T = t \, | \, \mathbf{x})$ is the *posterior probability* of $T = t$ given the observations $\mathbf{x}$.

In order to compute the likelihood function for a given $T=t$, we'll need to calculate the probabilities of being less than ($p_l$), equal to ($p_e$), or greater than ($p_g$), $T = t$. 

For example, if $T = 1$, then $p_l = 0$ because it is not possible to roll less than a $1$. $p_e = 1/6$, because that's the probability of rolling a $1$. And $p_g = 5*(1/6) = 5/6$, because rolling greater than $1$ means rolling a $2$ **or** a $3$ **or** ... a $6$. The word **or** suggests summing each of the probabilities, and since each face has probability $1/6$, we multiply by $5$.

**(a) Write down general formula (as a function of $t$) for $P(l \, | \, T = t)$, $P(e \, | \, T = t)$, and $P(g \, | \, T = t)$. This question is asking for a mathematical expression, not R code.**

**(b) Now, write an R function, called `p`, that takes in a value of $T$ and returns $P(l \, | \, T = t)$, $P(e \, | \, T = t)$, and $P(g \, | \, T = t)$.**

Next, calculate the likelihood function for a given $T=t$ together (we don't quite know how to do this on our own...yet).

Note that all of our rolls are independent, so the likelihood function - which is a joint probability density function (pdf) of the data given $T = t$ - is the product of the marginal pdfs, interpreted as a function of $t$.

**(c) Write down the likelihood function for our data, $\mathbf{x}$, given above in the problem description.**

**(d) Now, write an R function, called `likelihood`, that takes in a value of $T$ and returns the likelihood function.**

**(e) Now, compute the probability of the data/evidence, $P(\mathbf{x}) = \sum^6_{t=1} P(\mathbf{x} \, | \, T = t)P(T=t)$. (Note that this is tricky in general, since it requires that we change $T$, and thus change $p_l$, $p_e$, and $p_g$. Luckily, we wrote a function to do that above and took it into account in our likelihood function!)**

**(f) Now we have all of the components for computing the posterior. So, compute the posterior for each $T=t$ and plot it using a bar chart.**

**(g) Interpret the posterior distribution from the previous part. What value of $T$ would you guess and why? How does did your belief update from your prior?**

**(h) Another strategy, as you know, is maximum likelihood estimation (MLE). MLE says that our best guess about $T$ should be the value of $T$ that maximizes the likelihood function over all values of $T$ ($1,...,6$). Calculate the MLE for the data $\mathbf{x}$. In the first (markdown) cell, give a mathematical description of MLE for this problem. In the second (R) cell, compute the MLE.**


Luckily, we've already found the likelihood function. So MLE is just a matter of computing $P(\mathbf{x} \, ; \, T = t)$ for $t = 1,...,6$, and choosing the "argmax", i.e., the value of $T = t$ that gives the highest probability. We can do this easily in R:

**(i) Now change your prior to be `prior = c(1/6, 1/6, 2/6, 1/6,1/6, 0)` and recompute the posterior distribution. Do you have a different guess? What might justify this prior? Can maximum likelihood estimation take into account this kind of adjustment?**

- Our guess is now $T = 3$.
- This prior may be justified, if, for example, we have prior information that $T = 3$ is more likely.
- Maximum likelihood estimation does not formally take into account prior information, so it cannot take into account this adjustment.