Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your collaborators below:

In [None]:
COLLABORATORS = ""

---

In this problem, you will be deriving the *maximum likelihood estimate* (MLE) for the binomial distribution, the *maximum a posteriori* (MAP) estimate for the beta-binomial model, and the *posterior mean* (PM) estimate for the beta-binomial model. 

**You must show all your work, and format your derivations using LaTeX. Equations that do not use LaTeX will not receive full credit.** 

To align your equations, you may find the `align*` LaTeX environment useful. Take a look at the source of the following cell for an example of how to use it (note that the `&` symbol is how it determines where to line up the equations, and the `\\` is a line break):

$$
\begin{align*}
f(x) &= \int x^2\ \mathrm{d}x\\
&= \frac{1}{3} x^3
\end{align*}
$$

---
## Part A (0.3 points)

First, we will derive the MLE for the binomial distribution. The equation for the binomial likelihood is:

$$
\mathcal{L}(\theta; n, k) = \frac{n!}{k!(n-k)!} \theta^k (1 - \theta)^{(n-k)}
$$

where $\theta$ is the probability of observing $X=1$, $n$ is the number of times we have observed a value of $X$, and $k$ is the number of times we have observed $X=1$.

<div class="alert alert-success">In order to derive the maximum likelihood estimate for the binomial likelihood, we will need to take the derivative of the *log*-likelihood function, $\log\mathcal{L}(\theta; n, k)$. Write down the equation for this log-likelihood.</div>

YOUR ANSWER HERE

<div class="alert alert-success">
Now, find the value of $\theta$ which maximizes this log-likelihood. Call this variable $\theta_{MLE}$. To do this, you will need to compute the derivative of $\log\mathcal{L}(\theta; n, k)$ with respect to $\theta$. Make sure you write down all your steps! The last step of your derivation should be an equation for $\theta_{MLE}$ in terms of $k$ and $n$.
</div>

YOUR ANSWER HERE

---
## Part B (0.4 points)

Now, we will derive the MAP estimate of the beta-binomial model. The equation for the beta distribution is:

$$
p(\theta; \alpha, \beta) = \frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha, \beta)}
$$

where $B(\alpha, \beta)$ is the [Beta function](http://en.wikipedia.org/wiki/Beta_function). Combined with the binomial likelihood, the posterior distribution is given by:

$$
p(\theta\ |\ n, k; \alpha, \beta)= \frac{1}{Z}p(n, k\ |\ \theta)p(\theta; \alpha, \beta)
$$

where $Z$ is a normalizing constant.

To find the MAP estimate of $\theta$ (which we will call $\theta_{MAP}$), we need to find the maximum of this posterior distribution. To do this, we can again compute the log-posterior, and then take the derivative with respect to $\theta$, similar to how we computed the MLE estimate of the binomial distribution.

<div class="alert alert-success">
Find the value for $\theta_{MAP}$ in terms of $\alpha$, $\beta$, $n$, and $k$. Make sure you show all your steps!
</div>

<div class="alert alert-warning">
Note: the $\alpha$ and $\beta$ parameters used here are NOT the same as $V_H$ and $V_T$ in the lecture slides. So, your final answer will not look identical to the MAP equation given in lecture. If you look at how $V_H$ and $V_T$ are defined with respect to $\alpha$ and $\beta$ in the slides, though, you should find that your answer here is actually the same as the one in lecture if you substitute the variables in correctly.
</div>

YOUR ANSWER HERE

<div class="alert alert-success">
As we have seen, MLE and MAP are two different ways to estimate the parameter $\theta$ based on data that we have seen. Explain what the difference between these two methods is. What does $\theta_{MAP}$ take into account that $\theta_{MLE}$ does not?
</div>

YOUR ANSWER HERE

---
## Part C (0.3 points)

Now, we will derive the posterior mean of the beta-binomial model. The beta-binomial model has a nice property called *conjugacy*, which means that the posterior distribution has the same form as the prior distribution. In this case, our prior is a beta distribution, so the posterior is also beta. So, our first task is to determine the actual form (i.e., the parameters) of this new beta distribution.

<div class="alert alert-success">
Use Bayes' rule to compute the posterior over $\theta$, which you should call $p(\theta\ |\ n, k; \alpha, \beta)$. Any constant term (i.e. that doesn't include $\theta$) you can lump together into a single normalizing constant $\frac{1}{Z}$. Your final answer should be in terms of $Z$, $\theta$, $\alpha$, $\beta$, $n$, and $k$.
</div>

YOUR ANSWER HERE

<div class="alert alert-success">
Compare your equation for the posterior distribution with the general form of the beta distribution above. What are the parameters of this new beta distribution, which we'll denote $\alpha^\prime$ and $\beta^\prime$? What is $Z$? Write the equations for $\alpha^\prime$, $\beta^\prime$, and $Z$ in the cell below (they should be in terms of $\alpha$, $\beta$, $n$, and/or $k$):
</div>

YOUR ANSWER HERE

<div class="alert alert-success">
Now that we have the parameters of the posterior distribution, we can compute the posterior mean, $\theta_{PM}$. Look up the mean of the beta distribution [on Wikipedia](http://en.wikipedia.org/wiki/Beta_distribution), and report two equations: first, what $\theta_{PM}$ is in terms of $\alpha^\prime$ and $\beta^\prime$; and second, what $\theta_{PM}$ is in terms of $\alpha$, $\beta$, $k$, and $n$.
</div>

<div class="alert alert-warning">
Note: as above, the $\alpha$ and $\beta$ parameters used here are NOT the same as $V_H$ and $V_T$ in the lecture slides. So, your final answer will not look identical to the PM equation given in lecture. If you look at how $V_H$ and $V_T$ are defined with respect to $\alpha$ and $\beta$ in the slides, though, you should find that your answer here is actually the same as the one in lecture if you substitute the variables in correctly.
</div>

YOUR ANSWER HERE

---

Before turning this problem in remember to do the following steps:

1. **Restart the kernel** (Kernel$\rightarrow$Restart)
2. **Run all cells** (Cell$\rightarrow$Run All)
3. **Save** (File$\rightarrow$Save and Checkpoint)

<div class="alert alert-danger">After you have completed these three steps, ensure that the following cell has printed "No errors". If it has <b>not</b> printed "No errors", then your code has a bug in it and has thrown an error! Make sure you fix this error before turning in your problem set.</div>

In [None]:
print("No errors!")