In [1]:
# This R environment comes with many helpful analytics packages installed
# It is defined by the kaggle/rstats Docker image: https://github.com/kaggle/docker-rstats
# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**University of Edinburgh**

**School of Mathematics**

**Bayesian Data Analysis, 2020/2021, Semester 2**

**Daniel Paulin & Nicolò Margaritella**

**Workshop 2: Introduction to JAGS**

**Note**: Before starting this practical, you might want to spend
some time looking at the JAGS examples we have discussed during Lecture 2. If
you already did it, then go directly to question 1.
The code below loads JAGS.

In [2]:
#This code loads a compiled version of JAGS and rjags from a zip file on Google Drive, and loads rjags. It should only take a few seconds.
#IMPORTANT: Go to the Kaggle Settings (right hand side) and enable the Internet option before running this.
system("wget --no-check-certificate -r 'https://docs.google.com/uc?export=download&id=1i7BlQ21kT4ZnYUjAxa8P-eCbe-Zfiz8j' -O /kaggle/working/kaggle_JAGS.zip")
system("unzip /kaggle/working/kaggle_JAGS.zip")
system("rm /kaggle/working/kaggle_JAGS.zip")
system("cd /kaggle/working/JAGS-4.3.0")
system("make install")
library(rjags,lib.loc="/kaggle/working")
#If it ran correctly, you should see 
#Loading required package: coda
#Linked to JAGS 4.3.0
#Loaded modules: basemod,bugs

Loading required package: coda

Linked to JAGS 4.3.0

Loaded modules: basemod,bugs



**1.  Analysis of binomial data: revisiting the drug example**

**The aim of this question is to re-do most parts of questions 1 and 2
from the first practical but now using `JAGS`. Remember the context:
a new drug is being considered for relief of chronic pain, with the
success rate $\theta$ being the proportion of patients experiencing
pain relief. According to past information, a Beta(9.2, 13.8) prior
distribution was suggested. This drug had 15 successes out of 20
patients.**

**(i) Compute the posterior mean, standard deviation and a $95\%$
credible interval. Compare with the exact results.**

**(ii) What is the probability that the true success rate is greater
than 0.6. Compare with the exact result.**

**(iii) Suppose 40 more patients were entered into the study. What is
the chance that at least 25 of them experience pain relief?
Compare with the exact result.**

**(iv) Conduct the 'prior/data compatibility check', i.e., calculate
   the predictive probability of observing at least $15$ successes
   under this prior. Compare with the exact result.**

**(v) In practical 1 we have then considered a mixture prior, where it
was supposed that most drugs (95%) are assumed to come from the
stated Beta(9.2, 13.8) prior, but there is a small chance that
the drug might be a 'winner'. 'Winners' were assumed to have a
$\text{Beta}(12,3)$ prior distribution. What is now the chance
that the response rate is greater than 0.6? Compare with the
exact result.**

**(vi) Under this mixture prior, what is the posterior predictive
probability that at least 25 out of 40 new patients experience
pain relief?**

**(vii) For this mixture prior, repeat the prior/data compatibility
    test performed previously. Are the data more compatible with
    this mixture prior? Compare with the exact result.**

2.  **Simple linear regression with robustification**

**Winning Olympic Men's Long Jump Distances (adapted from
 Witmer, 2017)**
 
**The data are the winning men's long jump distances (m) from 1900
through 2008. You will fit a linear regression of the distances as a
function of Olympic year: 
$$\begin{aligned}
Jump & = & \beta_0 + \beta_1 \mbox{Year} + \epsilon\end{aligned}$$
three different ways: standard frequentist approach, a Bayesian
approach assuming normal errors, and a Bayesian approach assuming a
$t$ distribution for errors.**

**Run the following commands in `R` to begin (this will install and load the package Stat2Data and load the Long Jump dataset).**

In [3]:
install.packages("Stat2Data", lib="/kaggle/working")
library(Stat2Data,lib.loc="/kaggle/working")
data("LongJumpOlympics")   #Makes the dataset available in this R session
Jump <- LongJumpOlympics$Gold
Year <- LongJumpOlympics$Year

**1.  Carry out some exploratory data analysis:**

**1.1.  Plot Jump vs Year. What does the relationship look like?**

**1.2.  Fit a simple linear regression on Jump against Year using the `lm` function,
and make a plot of the data with the fitted line overlaid using the `abline` function.**

**1.3.  Based on this model, every 4 years we would expect the jump
distance to change by what amount?**


**1.4.  Plot the residuals against Year (using the `residuals`
function). One year stands out, which one is it?**

**1.5.  For a more detailed residual analysis, type `par(mfrow=c(2,2))`, and use the `plot` function operating
on the `lm` object (you'll see 4 plots).**

**1.6.  Remove the outlier from the data set and refit the model,
then recompute the above residual diagnostics. What do you
observe?**

**2.  Carry out a Bayesian linear regression analysis using `rjags`.
As in the frequentist case assume $\epsilon$ $\sim$ Normal(0,
$\sigma^2$). Use the following priors for the three parameters:
$$\begin{aligned}
\beta_0, \beta_1 & \sim & \mbox{Normal} \left ( \mu_0=0, \tau_0=0.001 \right ) \\
\tau & \sim & \mbox{Gamma} \left ( a=0.1, b=0.1 \right )
\end{aligned}$$**

**2.1.  Write the *model* statement, which includes the likelihood
calculation and the prior distribution. Include a
calculation of $\sigma = 1/\sqrt{\tau}$.**

**2.2.  Create an `R` object for the data, which includes Jump,
   Year, $n$=26 and the values of the prior hyperparameters
   $\mu_0$, $\tau_0$, $a$ and $b$.**

**2.3.  Create an `R` object for 3 sets of initial values; e.g.,**

In [4]:
my.inits <- list(c(b0=0.1,b1=0.2,tau=0.1),
           c(b0=-1,b1=3,tau=0.3),
           c(b0=1,b1=0,tau=.8))

**2.4.  Execute `jags.model` using the above objects. Note
`n.chains` should be set equal to 3. How many unobserved
stochastic nodes were there? How many observed?**

**2.5.  Use `update` to carry out an initial MCMC run (burn-in) with
1,000 iterations.**

**2.6.  Now make a longer MCMC run using the `coda.samples` function
with 10,000 iterations and have the results for $\beta_0$,
$\beta_1$, and $\sigma$ returned.**

**2.7.  Plot the results from `coda.samples`. These are the trace
plots. Do you think that the chains have converged for each
of the 3 parameters?**

**2.8.  You may have noticed from the trace plots that $\beta_0$ and
$\beta_1$ are mixing slowly. That's indicative of
significant autocorrelation. Use the `acf` function to see
how much correlation there is. For example, if the results
from `coda.samples` are called `res`, for a parameter named
beta0:, you can write `acf(res[[1]][,"beta0"],lag.max=100)`.**

**2.9.  Also take a look at the effective sample sizes per
parameter, e.g.,`effectiveSize(res[[1]][,"beta0"])`**

**2.10. In Lecture 2, the Gelman-Rubin (Brooks-Gelman-Rubin)
statistic was discussed. This is a quantitative measure of
apparent convergence that is based upon the degree of
overlap of 2 or more chains after each iteration. The BGR
statistic roughly approximates the ratio of the variability
between chains to the variability within chains (like an F
statistic in ANOVA). The general idea of the statistic is
that the the ratio of those two measures should be around 1
at convergence, thus BGR=1 is "good". Use the `coda` package
function called `gelman.plot` to plot the BGR statistic for
each of the parameters against the MCMC iteration. And use
`gelman.diag` for numerical summaries. What do you think
about convergence now?**

**2.11. Centring the covariate, in this case Year, sometimes helps
convergence. Modify your Model statement slightly by
creating a new variable `meanY`, and then subtract that from
the `Year[i]` values in the for loop. Repeat the above
steps. How does convergence now look? Use the `summary`
function on the JAGS output to examine the posterior means
and standard deviations for $\beta_0$, $\beta_1$, and
$\sigma$. How do the posterior mean for $\beta_1$ and
$\sigma$ compare to the maximum likelihood estimates
obtained in 1.2?**

**3.1.  *Robustifying the regression.* As was noted in Lecture 2, the
effects of extreme observations or "outliers" on regressing
results can be diminished by using a $t$ distribution for the
observations. For simplicity, assume a $t$ distribution with 3
df for the distribution of errors. Revise the JAGS model code
accordingly (continuing to work with the centred covariate) and
re-run. Recall from Lecture 3 that the necessary change to the
code is to replace `dnorm` with `dt` and add an additional
argument to `data` for the df (=3). How did the posterior mean
of $\beta_1$ change? Compare it to the estimate in
1.2 when the extreme observation is removed.**

**3.  Nonlinear Regression. Newton's law of cooling, from Bates and Watts (2008)**

**The following data are measurements over a 41 minute period of the
temperature of a bore after that bore had been rubbed inside "a
stationary cylinder and pressed against the bottom by means of a
screw". The bore was turned by a team of horses (this is an
experiment with friction from 1798 by a Count Rumford).**


In [None]:
 #minutes
 elapsed.time <- c(4,5,7,12,14,16,20,24,28,31,34,37.5,41) 
 #Fahrenheit
 temperature <- c(126,125,123,120,119,118,116,115,114,113,112,111,110) 

**An underlying theoretical model based on Newton's law of cooling
suggests that temperature should decline over time according to the
following model. 
$$\begin{aligned}
\mbox{temperature} & = & 60 +70e^{-\theta \;\mbox{elapsed.time}}
\end{aligned}$$
You are to evaluate this model, i.e., make
estimates of $\theta$ using classical and Bayesian techniques.**

**(i)  Plot temperature against time (use the `scatter.smooth` function
to draw a nonparametric regression line through the points).**

**(ii). Fit the model in (i) using a classical approach that assumes that 
observations have model errors are iid
Normal(0,$\sigma^2$). 
$$\begin{aligned}
 \mbox{temperature} & \sim & \mbox{Normal} \left ( 60 +70e^{-\theta \; \mbox{elapsed.time}}, \sigma^2 \right )
\end{aligned}$$
Use the `nls` function in R. The format of
`nls` in this case: `nl.1 <- nls(formula= temperature ~ 60 + 70*exp(-theta*elapsed.time), start=list(theta=initial.theta))`**
**where `initial.theta` is an initial guess as to what $\theta$ is.**

**One way to get an estimate of $\theta$ is to "linearize"
Newton's law of cooling as follows: 
$$\begin{aligned}
-\ln \left ( \frac{(\mbox{temperature}-60)}{70} \right )  & = &  \theta*\mbox{elapsed.time}
\end{aligned}$$
and then fit the resulting linear model
with the `lm` function.**

In [None]:
y <- -log((temperature-60)/70)
out <- lm(y ~ -1 + elapsed.time)  

**Use the estimated coefficient in `out` as the value of
`initial.theta`. After fitting the model, plot the fit and
the observations.**

In [None]:
plot(temperature ~ elapsed.time,xlab="Time",ylab="",
main="Friction Experiment Data")
lines(elapsed.time,fitted(nl.1),col="red")

**How does the fit look?**

**(iii) Instead of using 60 and 70 in Newton's law of cooling as known values, refit the model
estimating the coefficients.**

In [None]:
nl.2 <- nls(formula=temperature ~ beta0 + beta1*exp(-theta*elapsed.time),
start=list(beta0=50,beta1=50, theta=initial.theta))

**Compare the estimated coefficients to the assumed values and
plot the fitted line over the top of the previous plot. Has the
fit improved?**

**(iv) Use JAGS to fit two Bayesian nonlinear regression models: one
based on Newton's law of cooling, as in (ii), and another where all three
coefficients are estimated, as in (iii). Assume that temperatures are
normally distributed in the likelihood model.**

**In both cases use exponential distributions for the priors for
$\theta$ and then for $\beta_0$ and $\beta_1$ (to ensure that
the posterior distributions are positive valued). To pick the
exponential distribution hyperparameter, say $\alpha$, note that
if $X \sim$Exp$(\alpha)$, $\mathbb{E}[X]$ = $1/\alpha$. Pick a
large value for the hyperparameter for $\theta$ such that the
expected value of $\theta$ is less than 1. For the 2nd model (as in (iii)), select hyperparameter values for  $\beta_0$ and $\beta_1$ such that the expected values are 60 and
70, respectively. Note: in JAGS, the exponential density is
written `theta ~ dexp(a)` given hyperparameter $a$.**

**Compare the posterior means for $\theta$ in both models with the
frequentist estimates.\
Likewise compare the posterior means for
$\beta_0$ and $\beta_1$ for the second model.**


**4.  Multiple Linear Regression\
Factors Affecting Extinction Times of 62 Land Bird Species,
adapted from Albert, 2009\
The data are taken from Ramsey and Schafer (1997), who took them
from Pimm et al. 1988, and are available in the `LearnBayes` package
as the object `birdextinct`. Land birds on 16 small islands had been
observed annually during breeding surveys over a period of several
decades. Some 62 species went extinct at some point and the
objective is to examine the relationship between the years till
extinction and three different covariates: the initial average
number of nesting pairs observed (`nesting`), the physical size of
the birds (an indicator variable `size` with 1=small and 0=large),
and migratory status (an indicator variable `status` with
1=resident, 0=migratory).**

**To begin, do the following in `R`.**



In [None]:
 library(LearnBayes)
   data(birdextinct)
   n   <- nrow(birdextinct)
   extinct.time <- birdextinct$time
   avg.no.nests <- birdextinct$nesting
   size.ind  <- birdextinct$size   # 0 = large, 1= small
   mig.ind   <- birdextinct$status # 0 = mig, 1=resident

**1.  Exploratory data analysis and data transformation**

**1.1.  Look at the histogram of `extinct.time`. It is strongly
right skewed (there are few species with times till
extinction that are long relative to most species).
Therefore make the response variable the natural log of
exinct.time:**

In [None]:
log.extinct <- log(extinct.time)

**1.2.  Make 4 plots of $y$=`log.extinct`: histogram of $y$,
scatterplot of $y$ against `avg.nests`, side-by-side
boxplots of $y$ for small and large birds, and side-by-side
boxplots of resident and migratory birds. Hint:  To make side-by-side boxplots use the `split` function; e.g.,`boxplot(split(log.extinct,size),main=’vs Size’)`**

**1.3.  How would you describe the relationships between the 3
covariates and time till extinction?**

**2. Fit a classical multiple linear regression of the `log.extinct`
on the three covariates,**

In [None]:
extinct.mlr <- lm(log.extinct ~ avg.no.nests + size.ind + mig.ind)

**2.1 Examine the estimated coefficients. How do they compare to your
 conclusions from the EDA (Exploratory Data Analysis)?**

**3. Use JAGS to fit a Bayesian multiple regression analysis.**

**3.1. Centre the *avg.no.nests* covariate. Also, use 3 sets of
 initial values for the parameters.**

**3.2. Plot the JAGS output to see the trace plots. (Type
 `par(ask=TRUE)` in order to see all 5 plots. Then type
 `par(ask=FALSE)` to turn the option off.)**

**3.3. Use the Gelman-Rubin diagnostics to check for convergence.**

**3.4. Plot the autocorrelation functions.**

**3.5. Examine the effective sample sizes.**

**3.6. Calculate studentised residuals, draw a QQ-plot to check
normality, plot posterior mean fitted values, and carry out
posterior predictive checks for the minimum and maximum
log.extinct times. (See the mtcars example in the R code for Lecture 1 on the Learn site for example code to do this.)**