In [1]:
import numpy as np
from matplotlib import pyplot as plt
#import pandas as pd
import bokeh
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import HoverTool, Label
output_notebook()

David Diaz

SEFS 590F (Bayesian Models) - Winter 2016-17

## Homework 2

### 1. You have 5 data points drawn from a normal distribution (the data are below, or you can simulate these data using the included R code). Use these data to answer the following questions:

```
set.seed(11)
y=rnorm(5,20,5)
#data points:
17.04484, 20.13297, 12.41723, 13.186723, 25.89245
```

#### a. Following a method similar to what we covered in class, create a likelihood profile for the data given they come from a Normal distribution.  Think about how to do this for two parameters. Using the likelihood profile that you come up with, determine the parameter (mean and standard deviation/variance) values by finding the maximum values of the profile.  Use your estimated parameter values to conduct a likelihood ratio test with the data generating values (mean=20, sd=5). 

In [6]:
# Generate samples drawing from a normal distribution
mean, stdev, num_samples = 20, 5, 5
observations = np.random.normal(loc=mean, scale=stdev, size=num_samples)
print("observations: ", observations)

observations:  [ 15.70263134  13.22032908  25.27322987  18.54415547  14.4493287 ]


In [7]:
from scipy.stats import norm

#ranges of means and standard deviations to evaluate
mus = np.arange(0, 40, 0.01)
sigmas = np.arange(0.01, 20, 0.01)

# likelihood profile for range of means given the observations
# holding stdev constant, varying the means
mu_likelihoods = [np.prod(norm.pdf(observations, mu, stdev)) for mu in mus]
most_likely_mean = mus[mu_likelihoods.index(np.max(mu_likelihoods))]

# likelihood profile for the stdev given the observations
# while varying stdevs and using the most likely mean
sigma_likelihoods = [np.prod(norm.pdf(observations, most_likely_mean, sigma)) for sigma in sigmas]
most_likely_sigma = sigmas[sigma_likelihoods.index(np.max(sigma_likelihoods))]

In [8]:
# create an interactive plot using bokeh
# subplot 1: Means
s1 = figure(title="Likelihood of Means Given Observations", x_axis_label='Mu', y_axis_label='Likelihood')
s1.line(mus, mu_likelihoods, line_width=2) # line of likelihood
s1.circle(most_likely_mean, np.max(mu_likelihoods), size=10) # most likely mean
# subbplot 2: StDevs
s2 = figure(title="Likelihood of StDevs Given Observations", x_axis_label='Sigma', y_axis_label='Likelihood')
s2.line(sigmas, sigma_likelihoods, line_width=2)
s2.circle(most_likely_sigma, np.max(sigma_likelihoods), size=10)
p = bokeh.layouts.row(s1,s2, sizing_mode="scale_width")
show(p)

In [9]:
# likelihood ratio
np.prod(norm.pdf(observations, most_likely_mean, most_likely_sigma))/ \
np.prod(norm.pdf(observations, mean, stdev))

2.1388284411768903

#### b. Find the maximum likelihood estimator by taking the derivative of the likelihood (or the log likelihood as we discussed in class). 

#### c. Write an R function to estimate the parameters through optimization (or a “solver”) – you can use one similar to what we did for the binomial in class, slide 25 of Lecture 4 notes. 

#### d. Did you get the same answer using these three methods or did you find similar answers but not exactly the same? Explain why you might get some differences in the methods, and what is the exact answer you think is correct (did one of your methods provide that exact answer?).

### 2. Coates and Burton (1999) studied the influence of light availability on growth rate of species of conifers in northwestern interior cedar-hemlock forests of British Columbia. They used the model, 

$\mu_i = \frac{\alpha (L_i - c)}{\frac{\alpha}{\gamma} + (L_i - c)}$

where:  
$\mu_i$ = prediction of growth rate of i<sup>th</sup> hemlock tree (cm/year)  
$\alpha$ = maximum growth rate (cm/year)  
$\gamma$ = slope of curve at low light (cm/year)  
$c$ = light index where growth = 0 (unitless)  
$L_i$ = measured index of light availability for the i<sup>th</sup> hemlock tree, i.e. the proportion of the hemisphere above canopy open to light × 100, unitless)  

The light limitation data are included in the datasheet (<b>HemlockLight.csv</b>). We will work through the likelihood for this problem in a slightly different approach.

#### a. Create an R script (or similar) to carry out the following actions:  Set $\alpha, \gamma, c = 2$ and predict the mean growth rate based on those values and the observed light using the above model. Plot your predicted values and the observed growth rate against the observed light, use different colors or symbols to distinguish the predicted versus observed growth rates. 

#### b.  You can now calculate the likelihood of your parameter estimates by using the observed and predicted values.  Use the ```dnorm``` command to calculate the probability density of the observed value given the predicted value for each of your data points.  To do so, you will need to estimate the standard deviation, start with  $\sigma=2$.  What is the likelihood of your model (with the specified parameters) given the observed data?  Repeat the predictions and calculate the likelihood (or logLiklihood) with 2 different sets of values for $\alpha, \gamma, c, \sigma$.  Plot the predictions for each of your set of parameters on the same graph with the data.  Which one is the best based on the likelihood and is that consistent with what you see in the results of the predictions?

#### c. This trial and error method is not very efficient.  You can find the parameter estimates using the ```nls``` function in R, which does non-linear estimation for normally distributed data.  I’m including code here for you to follow (below), but you will have to set up $x$ and $y$ appropriately with the dataset you read in, and you should look at the help on ```nls``` and look at its methods—summary, predict, coef, and residuals.  The start values I have provided in the code below are not good, you will have to do some trial and error with the start values. Report the parameter estimates you determine based on this approach including $\sigma$.

#### d. Lastly, suppose that a previous study estimated $\alpha$ as 35 with a standard deviation of 4.25. Incorporate these prior data in your MLE estimate of $\alpha$. Hint: create a likelihood function for the probability of the new value of $\alpha$ conditional on the previous value and its standard deviation. Take the log of this prior likelihood and add this to the log likelihood of the new value of $\alpha$ given the data. This produces a total log likelihood including the prior and current data. Describe what happens to the estimate of $\alpha$ relative to the one you obtained earlier. What is the effect of increasing the prior standard deviation on the new estimate? What happens when it shrinks? 

```
#Rcode
x = Light
y = ObservedGrowthRate
model = nls(y ~ a * (x - c)/(a/s + x - c), trace = TRUE, start = c(a = 5, s = 5, c = 5))
summary(model) p = (coef(model))
a.hat = p[1]
s.hat = p[2]
c.hat = p[3]
yhat = predict(model)
logLik(model)

plot(x, y, ylab = "Growth rate (cm/yr)", xlab = ("Light Availability")) # plot data
points(x, yhat, col = "red") # plot predictions 
```

<b>Reference:</b>  
Coates, K.D., Burton, P.J., 1999. Growth of planted tree seedlings in response to ambient light levels in northwestern interior cedar-hemlock forests of British Columbia. <i>Canadian Journal of Forest Research</i>: <b>29</b>(9): 1374-1382. doi: 10.1139/x99-091