# APMTH 207: Advanced Scientific Computing: 
## Stochastic Methods for Data Analysis, Inference and Optimization
## Homework #8
**Harvard University**<br>
**Spring 2017**<br>
**Instructors: Rahul Dave**<br>
**Due Date: ** Friday, March 31st, 2017 at 11:59pm

**Instructions:**

- Upload your final answers as well as your iPython notebook containing all work to Canvas.

- Structure your notebook and your work to maximize readability.

## Problem 1: Application of Data Augmentation

A plant nursery in Cambridge is exprimentally cross-breeding two types of hibiscus flowers: blue and pink. The goal is to create an exotic flower whose petals are pink with a ring of blue on each. 

There are four types of child plant that can result from this cross-breeding: 

  - Type 1: blue petals
  - Type 2: pink petals 
  - Type 3: purple petals
  - Type 4: pink petals with a blue ring on each (the desired effect). 

Out of 197 initial cross-breedings, the nursery obtained the following distribution over the four types of child plants: 
$$Y = (y_1, y_2, y_3, y_4) = (125, 18, 20, 34)$$
where $y_i$ represents the number of child plants that are of type $i$.

The nursery then consulted a famed Harvard plant geneticist, who informed them that the probability of obtaining each type of child plant in any single breeding experiment is as follows:
$$ \frac{\theta+2}{4}, \frac{1-\theta}{4}, \frac{1-\theta}{4}, \frac{\theta}{4}.$$
Unfortunately, the geneticist did not specify the quantity $\theta$.

Clearly, the nursery is interested in understanding how many cross-breeding they must perform, on average, in order to obtain a certain number of child plants with the exotic blue rings. To do this they must be able to compute $\theta$. 

The owners of the nursery, being top students in AM207, decided to model the experiment in hopes of discovering $\theta$ using the results from their 197 initial experiments. 

They chose to model the observed data using a multinomial model and thus calculated the likelihood to be:
$$ p(y  \vert  \theta) \propto (2+\theta)^{y_1} (1-\theta)^{y_2+y_3}  \, \theta^{y_4}
$$

Being good Bayesians, they also imposed a prior on $\theta$, $\rm{Beta}(a, b)$.

Thus, the posterior is:
$$ p(\theta \vert  Y) \propto \left( 2+\theta \right)^{y_1} (1-\theta)^{y_2+y_3} \, \theta^{
y_4} \, \theta^{a-1} \, (1-\theta)^{b-1}. $$

If the nursery owners are able to sample from the posterior, they would be understand the distribution of $\theta$ and make appropriate estimates.

### Part A: Sampling using data augmentation

Realizing that it would be difficult to sample from the posterior directly and after being repeatedly frustrated by attempts of Metropolis-Hastings and Gibbs sampling for this model, the nursery owners decided to augment their model and hopefully obtain a friendlier looking distribution that allows for easy sampling.

They augment the data with a new variable $z$ such that:
$$z + (y_1 - z) = y_1.$$
That is, using $z$, we are breaking $y_1$, the number of type I child plants, into two subtypes. Let the probability of obtain the two subtype be $1/2$ and $\theta/4$, respectively. Now, we can interpret $y_1$ to be the total number of trials in a binomial trial. Thus, the new likelihood can be written as
$$ p(y, z  \vert  \theta) \propto \binom{y_{1}}{z} \left (\frac{1}{2} \right )^{y_1-z} \left(\frac{\theta}{4} \right )^{z}  (1-\theta)^{y_2+y_3}  \, \theta^{y_4}
$$


Derived the joint posterior $p(\theta, z  \vert  y)$ and sample from it using Gibbs sampling.

Visualize the distribution of theta and, from this distribution, estimate the probability of obtaining a type 4 child plant (with the blue rings) in any cross-breeding experiment.

**Solutions for TFS**

In the first instance, where you have four outcomes each with some probability, we're setting up the observed counts of each of the four outcomes (the $y_i$'s) as a multinomial trial, so then the likelihood is just the product of the probability of the first outcome raised to the power of how many times that outcome happend ($y_1$) etc...with some normalizing constants (all involving just $y_i$'s)
$$ 
p(y  \vert  \theta) \propto (2+\theta)^{y_1} (1-\theta)^{y_2+y_3}  \, \theta^{y_4}
$$
Now, since the normalizing constant contain only $y$'s we can just drop them (if we're only interested in sampling).

Now, when we introduce a latent var $Z$, this is essentially breaking the first outcome into two separate outcomes. To put it simply, we now have ***five*** outcomes instead of four. We still set up the likelihood as a multinomial trial (now with five categories instead of four) and so the likelihood is still proprotional to the product of the probability of the first outcome raised to the power of how many times that outcome happend ($y_1$) etc.

$$ p(y, z  \vert  \theta) \propto \binom{y_{1}}{z} \left (\frac{1}{2} \right )^{y_1-z} \left(\frac{\theta}{4} \right )^{z}  (1-\theta)^{y_2+y_3}  \, \theta^{y_4}
$$

So where did the $\binom{y_{1}}{z}$ come from? Well, write out the normalizing constant for the new multinomial model with five categories, some parts of this constants will involve $z$ and $y_1$, since $z$ is not a constant (we'll be sampling from it later) we can't drop these parts of the normalizing constant. If you keep the parts of the normalizing constant involving $z$ (and a bit extra) then you'll get the coefficient $\binom{y_{1}}{z}$.


Ok, so to get the joint $$ p(y, z, \theta)$$ we multiply the likelihood by the prior
$$ p(y, z  \vert  \theta) \propto \binom{y_{1}}{z} \left (\frac{1}{2} \right )^{y_1-z} \left(\frac{\theta}{4} \right )^{z}  (1-\theta)^{y_2+y_3}  \, \theta^{y_4} \theta^{a-1} (1-\theta)^{b-1}
$$

To get each conditional, you just write down the factors from the joint that include the relevant variable!

$$
p(z | \theta, y)\propto \binom{y_{1}}{z} \left (\frac{1}{2} \right )^{y_1-z} \left(\frac{\theta}{4} \right )^{z} 
$$

and 

$$
p(\theta | z, y)\propto (1-\theta)^{y_2+y_3}  \, \theta^{y_4} \theta^{a-1} (1-\theta)^{b-1} = (1-\theta)^{y_2+y_3 + b - 1}  \, \theta^{y_4 + a -1}
$$

So $p(\theta | z, y)$ looks like a beta pdf! I.e. $\theta | z, y \sim Beta(y_4 + a -1, y_2+y_3 + b - 1)$.

And $p(z | \theta, y)$ looks like a binomial pdf except $\frac{1}{2} + \frac{\theta}{4} = \frac{2 + \theta}{4}$ is not equal to 1 (in a binomial pdf, the bases of the exponential factors need to sum to 1). Ok, so we just multiply the expression by $\frac{[4/(2 + \theta)]^{(y_1 - z) + z}}{[4/(2 + \theta)]^{y_1}}$ and get

$$
p(z | \theta, y)\propto \binom{y_{1}}{z} \left (\frac{1}{2} \right )^{y_1-z} \left(\frac{\theta}{4} \right )^{z} \frac{[4/(2 + \theta)]^{(y_1 - z) + z}}{[4/(2 + \theta)]^{y_1}}= \left(\frac{2 + \theta}{4}\right)^{y_1} \binom{y_{1}}{z} \left (\frac{2}{2 + \theta} \right )^{y_1-z} \left(\frac{\theta}{2 + \theta} \right )^{z}
$$

But again $\left(\frac{2 + \theta}{4}\right)^{y_1}$ is constant with respect to $z$ so when we're sampling $z$ we can just drop this factor. That is, we just have

$$
p(z | \theta, y)\propto  \binom{y_{1}}{z} \left (\frac{2}{2 + \theta} \right )^{y_1-z} \left(\frac{\theta}{2 + \theta} \right )^{z}
$$

and that is a binomial distribution $Bin\left(y_1, \frac{\theta}{2 + \theta} \right )$.


So your Gibbs sampler:

0. randomly initialize $\theta$ and $z$
1. sample $\theta | z, y \sim Beta(y_4 + a -1, y_2+y_3 + b - 1)$
2. sample $z | \theta, y \sim Bin\left(y_1, \frac{\theta}{2 + \theta} \right )$.

After burn-in and thinning. Plot the samples of $\theta$ (ignore the $z$'s - we don't care about them). 

Finally, make some point estimate from the samples of $\theta$, i.e. compute the mean or mode.

Use this estimate of $\theta$ to get the probability of obtaining the desired outcome in the cross breeding experiment.