**University of Edinburgh**\
**School of Mathematics**\
**Bayesian Data Analysis, 2020/2021, Semester 2**\
**Daniel Paulin & Nicolò Margaritella**

**Solutions for Workshop 4: Bayesian Generalised
Linear Models (GLMs) for count and binary data**


In [None]:
#This code loads a compiled version of JAGS and rjags from a zip file on Onedrive, and loads rjags. It should only take a few seconds.
#IMPORTANT: Go to the Kaggle Settings (right hand side click on K icon) and enable the Internet option in Settings before running this.
system("wget --no-check-certificate -r 'https://uoe-my.sharepoint.com/:u:/g/personal/dpaulin_ed_ac_uk/EX_-yUc-bIZJhLXHcZxpOj8Ba6dwC15X_MjYoox-xM2KlQ?download=1' -O /kaggle/working/kaggle_JAGS.zip")
system("unzip /kaggle/working/kaggle_JAGS.zip")
system("rm /kaggle/working/kaggle_JAGS.zip")
system("cd /kaggle/working/JAGS-4.3.0")
system("make install")
library(rjags,lib.loc="/kaggle/working")
#If it ran correctly, you should see 
#Loading required package: coda
#Linked to JAGS 4.3.0
#Loaded modules: basemod,bugs

#In case you are still experiencing difficulties with this, please use the following code (this compiles and installs JAGS from the source, it takes 6-7 minutes):
#system("wget https://sourceforge.net/projects/mcmc-jags/files/JAGS/4.x/Source/JAGS-4.3.0.tar.gz -P /kaggle/working")
#system("tar xvfz /kaggle/working/JAGS-4.3.0.tar.gz")
#system("cd /kaggle/working/JAGS-4.3.0")
#system("/kaggle/working/JAGS-4.3.0/configure")
#system("make")
#system("make install")
#install.packages("rjags", lib="/kaggle/working")
#library(rjags,lib.loc="/kaggle/working")

# 1.  **Modelling fatal airline accidents from 1976 through 2001.**

   **This exercise has been taken largely from a shortcourse at the University of Copenhagen which occurred in
January 2013 and notes from Gurrin, Carstensen, Hojsgaard, and Ekstrom. The dataset `airline.RData` is available on
    Learn but it will be automatically uploaded by the code below.**

   **The fields are:**  
- **Year1975 (number of years after 1975),**
- **Year,**
- **Fatal (number of fatal airline accidents),**
- **Miles (total passenger miles,in $10^{11}$ miles, e.g., $3.863 = 3.683*10^{11} \text{miles} = 368.3$ Billion miles),**
- **Rate (fatalities per $10^{11}$ passenger miles).**
    
   **You will be fitting 3 separate Poisson models to Fatal.**
    

**(i) Conduct some exploratory data analysis:**
- **Plot fatalities against year. Which year had the most fatalities?**
- **Plot miles flown against year. What do you see?**
- **Now plot the rate against year. What do you think about how dangerous flying is?**



In [None]:
system("wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=19pnVieqVqqxnPLjvb-pYSEVCWoYs5rHr' -O /kaggle/working/airline.RData")
# You need to enable the Internet in Settings in Kaggle (right hand side menu) before running this
#airlines<-read.csv("/kaggle/working/airline.csv",header=TRUE,sep="\t")
load("/kaggle/working/airline.RData")


In [None]:
par(mfrow=c(2,2))
hist(airlines$fatal,main="Annual airline fatalities: 1976-2001")
plot(airlines$year,airlines$fatal,type="b",main="Fatalities by Year")
plot(airlines$year,airlines$miles,type="b",main="Miles (10^11) by Year")
plot(airlines$year,airlines$rate,type="b",
main="Fatalities/Mile (10^11) by Year")


**<u>Constant Expected Fatality Model</u>. Assume that the number of fatalities each year
comes from a single Poisson distribution with unknown mean parameter.**

**(ii a) Carry out a frequentist analysis using the `glm` function, the Poisson family and
the default log link function (Hint: the formula $y\sim1$ fits a model with constant
mean). Report the mle in the original, non-transformed scale.**



In [None]:
# fits a constant mu
m1.log <- glm(fatal ~ 1,family=poisson(link="log"),data=airlines)
round(exp(m1.log$coefficients),5)

**(ii b) Fit the model with the sqrt link function. Square the estimated coeffcient to see
how much the mle changed.**

In [None]:
m1.sqrt <- glm(fatal ~ 1,family=poisson(link="sqrt"),data=airlines)
round((m1.sqrt$coefficients)^2,5)

**(ii c) Use JAGS to carry out a Bayesian analysis of this constant mortality model using a
Gamma (a,b) prior for $\mu$ with parameters a=1 and b=0.02.**

**Use 3 initial values for $\mu$ of 10, 50, and 100. Check for convergence and mixing in 5 ways:
trace plots, BGR statistic, autocorrelation plots, effective sample size calculations, and
seeing if the Monte Carlo error < 1/20th the standard deviation.**

**What is the posterior mean for $\mu$? Obtain the 95\% symmetric Credible Interval for $\mu$**

In [None]:
# Create data input for JAGS
airlines.data <- list(n=nrow(airlines),fatal=airlines$fatal)
# Create initial values for JAGS
num.chains <- 3
airlines.inits <- list(list(mu=10),list(mu=50),list(mu=100))
# Create model block for JAGS
airlines.model <- "model {
# data that will be read in are n and fatal
#Hyperparameters
gamma.a <- 1
gamma.b <- 0.02
# prior
mu ~ dgamma(gamma.a,gamma.b)
#Likelihood
for(i in 1:n){fatal[i] ~ dpois(mu) }
}"
# Run JAGS
burnin <- 2000
inference.length <- 10000
results.const.A <- jags.model(file=textConnection(airlines.model),
data=airlines.data, inits=airlines.inits,
n.chains=num.chains, quiet = TRUE)
#
update(results.const.A, n.iter=burnin)
#
results.const.B <- coda.samples(results.const.A, variable.names=c("mu"),
n.iter=inference.length)
#
# Check for convergence before looking at posterior dist and summaries
plot(results.const.B)
gelman.plot(results.const.B)
gelman.diag(results.const.B)
autocorr.plot(results.const.B[[1]][,"mu"],main="Poisson mu")
effectiveSize(results.const.B[[1]][,"mu"])
summary(results.const.B) 
# posterior mean = 24.4
# 95% CI [22.55, 26.34]

**(ii d) Modify your JAGS code to predict the 2002 fatal accidents and produce a 95\% predictive
interval. To do this, modify the input data by increasing n to n+1 and append an NA
value to the vector of fatal values, and then request that `fatal[27]` be output in the
`coda` call:**

`airlines.data <- list(n=nrow(airlines)+1,fatal=c(airlines$fatal,NA))
length(airlines.data$fatal
...
results.const.B <- coda.samples(results.const.A,
variable.names=c("mu","fatal[27]"),n.iter=10000)`

In [None]:
airlines.data <- list(n=nrow(airlines)+1,fatal=c(airlines$fatal,NA))
#
results.const.A <- jags.model(file=textConnection(airlines.model),
data=airlines.data, inits=airlines.inits,n.chains=num.chains, quiet = TRUE)
#
update(results.const.A, n.iter=burnin)
#
results.const.B <- coda.samples(results.const.A,
variable.names=c("mu","fatal[27]"),n.iter=10000)
# Usual way of summarising
summary(results.const.B) # The interval is [15, 35]

**<u>Constant Fatality Rate, per mile, Model</u>. Use JAGS to fit a Poisson model where $\mu$
is proportional to the number of passenger miles, namely, $\mu_{i} = \lambda*\text{miles}_{i}$.**

**(iii a) In the JAGS code, mu[i] <- lambda * miles[i] and fatal[i]~dpois(mu[i]). Thus $\lambda$ is the new
parameter with its own prior, hyperparameters, and initial values, and you need to add miles
to the data block. For hyperparameters, multiply those calculated in (ii c) by 0.1. Use initial
values for $\lambda$ of 1, 5, and 50. As before, check for convergence and mixing.**

**Predict the number of fatal accidents for 2002 assuming there were 20 passenger miles ($10^{11}$)
flown. Look at the trace plots and BGR statistic. What is the posterior mean for $\lambda$? The
95% predictive interval for number of accidents in 2002?**

In [None]:
# --- (iii) Constant Fatality Rate (per mile)
# --- Model is now Poisson(mu[i] = lambda*miles[i])
airlines.miles.data <- list(n=nrow(airlines)+1,fatal=c(airlines$fatal,NA),
miles=c(airlines$miles,20))
# Create initial values for JAGS
num.chains <- 3
airlines.miles.inits <- list(list(lambda=1),list(lambda=5), list(lambda=50))
# Create model block for JAGS
airlines.miles.model <- "model {
# data that will be read in are n, fatal, and miles
#Hyperparameters
gamma.a <- 0.1
gamma.b <- 0.002
# prior
lambda ~ dgamma(gamma.a,gamma.b)
#Likelihood
for(i in 1:n){
mu[i] <- lambda*miles[i]
fatal[i] ~ dpois(mu[i]) }
}"
results.miles.A <- jags.model(file=textConnection(airlines.miles.model),
data=airlines.miles.data,
inits=airlines.miles.inits,
n.chains=num.chains, quiet = TRUE)
#
update(results.miles.A, n.iter=burnin)
#
results.miles.B <- coda.samples(results.miles.A,
variable.names=c("lambda","fatal[27]"),
n.iter=inference.length)
# (Convergence checks)
plot(results.miles.B)
gelman.plot(results.miles.B)
gelman.diag(results.miles.B)
autocorr.plot(results.miles.B[[1]][,"lambda"],main="Poisson rate lambda")
effectiveSize(results.miles.B[[1]][,"lambda"])
summary(results.miles.B)
#Posterior Mean Lambda = 2.30
#Predictive interval fatal[27]= [33,60]

**<u>Rate as a Function of Time Model</u>. What if you modeled the mean parameter $\mu$ as a linear function of time, i.e., for year t:$\mu(t) = \beta_0 + \beta_1 t$. $\beta_1$ is presumably a negative number as fatal accidents are decreasing with
time. What could be a problem?**

**To avoid this potential problem but allow for a time effect on $\mu$, we will now model the rate
parameter $\lambda$, as an exponentiated linear function of (centred) time:
$$\lambda(t)=\exp{(\beta_0 + \beta_1 (t-\bar{t}))}$$**

**The resulting Poisson parameter for year t:
$$\mu(t)=\lambda(t)*\text{miles}(t)=\exp{(\beta_0 + \beta_1 (t-\bar{t}))}*\text{miles}(t)$$**

**With a log link function for $\mu(t)$, the resulting transformation:
$$log(\mu(t))=\beta_0 + \beta_1 (t-\bar{t})+log(\text{miles}(t))$$**

**which is not "entirely" a linear function of $t$ due to the $log(\text{miles}(t))$ term. However, the log
transformed rate parameter is linear in time:$ ln(\lambda(t)) = \beta_0+\beta_1(t-\bar{t})$. When the link function
of the expected value is the sum of a linear combination of covariates and a known constant,
in this case $log(\text{miles}(t))$, that constant is called an *offset*.**

**(iv a) Calculate crude estimates of $\beta_0$ and $\beta_1$ by using `lm` to regress $log(fatal(t))$ on year $t$
(Year1975) with an offset. Here's one way to do this in R:**




In [None]:
airlines$ctrd.time <- airlines$year1975-mean(airlines$year1975)
m2 <- lm(log(fatal) ~ ctrd.time + offset(log(miles)),data=airlines)

**What is the estimated year effect (on the log scale)? How does the expected rate change
from year t to t + 1 (what is the multiplicative effect)?**

In [None]:
airlines$ctrd.time <- airlines$year1975-mean(airlines$year1975)
m2 <- lm(log(fatal) ~ ctrd.time + offset(log(miles)),data=airlines)
exp(coef(m2)[2]) # The multiplicative factor between t and t+1
## ctrd.time
## 0.9325075
# ~7% decrease x year

**(iv b) Use the `glm` function to calculate the MLEs for $\beta_0$ and $\beta_1$ in this model. What is the estimated year effect now? And how does the expected rate change each year?**

In [None]:
m2.glm <- glm(fatal ~ ctrd.time + offset(log(miles)), data=airlines,
family = poisson)
coef(m2.glm)
## (Intercept) ctrd.time
## 0.93551488 -0.06874189
exp(coef(m2.glm)[2])
## ctrd.time
## 0.9335676
#m2.glm$deviance
## [1] 22.54528

**(iv c) Carry out a Bayesian analysis of this model. You will need to specify priors for $\beta_0$ and
$\beta_1$. For simplicity, use wide Normal priors with mean 0 and variance 1000. Use 3 sets of
initial values for $\beta_0$ and $\beta_1$ randomly drawn from normal distributions with variance 100. Check for convergence and mixing as before.**

**Again predict 2002 fatal accidents using miles = 20, and plug in time=27. How does
the posterior mean for $\beta_1$ compare to the MLE? How does the 95% prediction interval
for 2002 differ from the previous results?**

In [None]:
# ---- Bayesian model (iv c)
# Model is now Poisson(mu[i] = exp(b0+b1*time[i])miles[i])
airlines.time.data <- list(n=nrow(airlines)+1,fatal=c(airlines$fatal,NA),
time=c(airlines$year1975,27),
miles=c(airlines$miles,20))
# Create initial values for JAGS
num.chains <- 3
airlines.time.inits <- function(){ 
beta0 <- rnorm(1,0,10)
beta1 <- rnorm(1,0,10)
return( list(beta0=beta0, beta1=beta1) )
}
# Create model block for JAGS
airlines.time.model <- "model {
# data that will be read in are n, fatal, time and miles
# prior
beta0 ~ dnorm(0,0.001)
beta1 ~ dnorm(0,0.001)
#Likelihood
for(i in 1:n) {
log(mu[i]) <- beta0+beta1*(time[i]-mean(time[])) + log(miles[i])
fatal[i] ~ dpois(mu[i]) }
}"
# Run JAGS to the completion of the "adaption" stage
results.time.A <- jags.model(file=textConnection(airlines.time.model),
data=airlines.time.data,
inits=airlines.time.inits,
n.chains=num.chains, quiet = TRUE)
#
update(results.time.A, n.iter=burnin)
#
results.time.B <- coda.samples(results.time.A,
variable.names=c("beta0","beta1",
"fatal[27]"),n.iter=inference.length)
# (Convergence checks not shown in the document)
plot(results.time.B)
gelman.plot(results.time.B)
gelman.diag(results.time.B)
autocorr.plot(results.time.B[[1]][,"beta0"],main="Poisson rate b0")
autocorr.plot(results.time.B[[1]][,"beta1"],main="Poisson rate b1")
effectiveSize(results.time.B[[1]][,"beta0"])
effectiveSize(results.time.B[[1]][,"beta1"])
summary(results.time.B)


# 2.  **Binary data: Low Birth Weights.**

**These birth weight data for 189 infants born in Massachusetts,
USA, are from Hosmer and Lemeshow (2000; Applied Logistic Regression). The dataset `lowbwt.RData` is available on
    Learn but it will be automatically uploaded by the code below. The primary response
variable, `LowBwt`, is an indicator for whether or not infant's birth weight was less than 2500g
(LowBwt = 1 if `Bwt`<2500g, 0 otherwise). There are several potential covariates, including:**

- **`Mother.age`**
- **`Mother.wt`**
- **`Race`(1,2,3 for white, black, and other)**
- **`Smoke`(1 for yes, 0 for no)**

In [None]:
system("wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=1ptk7a1NN2pUqeWV-afUEKetMIwVB76xU' -O /kaggle/working/lowbwt.RData")
# You need to enable the Internet in Settings in Kaggle (right hand side menu) before running this
load("/kaggle/working/lowbwt.RData")
#The loaded data is contained in the bwt dataframe
head(bwt)

**(i) Perform some exploratory data analysis and comment your results.**

In [None]:
par(mfrow=c(2,3))
scatter.smooth(bwt$Bwt ~ bwt$Mother.age,main="Bwt vs Age")
scatter.smooth(bwt$Bwt ~ bwt$Mother.wt,main="Bwt vs Mother
s Wt")
boxplot(split(bwt$Mother.age,bwt$LowBwt),main="Age of LowBwt vs Age of Normal",
names=c("LowBwt","Normal"))
boxplot(split(bwt$Mother.wt,bwt$LowBwt),main="Mother
s Wt for LowBwt vs Normal",
names=c("LowBwt","Normal"))
boxplot(split(bwt$Bwt,bwt$Race),main="Bwt vs Race",
names=c("White","Black","Other"))
boxplot(split(bwt$Bwt,bwt$Smoke),main="Bwt vs Smoking Status",
names=c("Not Smoke","Smoke"))

**(ii) Use `glm` to fit the following 3 binomial models for the logistic transformation of the probability
of a low birthweight, $p$. The continuous covariates are being standardized, not just centred.**

- **(A) $\log(p/(1-p))= \beta_0 + \beta_1 \dfrac{\text{Mother.age}-\overline{\text{Mother.age}}}{sd_\text{Mother.age}}$**
- **(B) $\log(p/(1-p))= \beta_0 + \beta_1 \dfrac{\text{Mother.wt}-\overline{\text{Mother.wt}}}{sd_\text{Mother.wt}}$**
- **(C) $\log(p/(1-p))= \beta_0 + \beta_1 I_\text{Smoke}$**

**Here's example R code for the model (A):**

In [None]:
bwt$age.std <- scale(bwt$Mother.age)[,1]
m.age <- glm(LowBwt ~ age.std,family=binomial(link="logit"),data=bwt)
coef(m.age)

Note: when the data are Bernoulli (n=1), then a vector of 1's and 0's can be used as the
response variable in the `glm` function. Interpret the slope coeffcients for the 3 models.
E.g., as mother's age increases of one standard deviation what
happens, on average, to the odds of low birthweight infant?

In [None]:
bwt$age.std <- scale(bwt$Mother.age)[,1]
bwt$wt.std <- scale(bwt$Mother.wt)[,1]
m.age <- glm(LowBwt ~ age.std,family=binomial(link="logit"),data=bwt)
coef(m.age)
## (Intercept) age.std
## -0.804115 -0.271043
m.wt <- glm(LowBwt ~ wt.std,family=binomial(link="logit"),data=bwt)
coef(m.wt)
## (Intercept) wt.std
## -0.8266562 -0.4298929
m.smoke <- glm(LowBwt ~ Smoke,family=binomial(link="logit"),data=bwt)
coef(m.smoke)
## (Intercept) Smoke
## -1.0870515 0.7040592

**(iii) Carry out a Bayesian analysis for the Mother's age model (A) using JAGS.**

**Use Gaussian prior distributions for $\beta_1$ with mean 0. Place prior precisions small enough so that the
change of probability in the inverse-logit curve can happen in an interval ($x_u-x_l$) of 0.5 units of the
standardized covariate and the centre of the probability change ($x_m$) can happen anywhere between
-3 and 3 (Hint: Look at the relations between ($x_u-x_l$), $\beta_1$, $x_m$ and $\beta_0$ in slide 56 of Lecture 4).**

**Check sensitivity to priors by trying smaller or larger precisions for $\beta_1$ and $\beta_0$.
Use at least 3 sets of initial values of $\beta_0$ and $\beta_1$ so that the BGR statistic can be calculated.
Compare the results to the classical point estimates from (ii).**

In [None]:
lowbwt.model <- "model {
# data that will be read in are n, LowBwt, x=bwt$Mother.age
tau0 <- ###
tau1 <- ###
beta0 ~ dnorm(0,tau0)
beta1 ~ dnorm(0,tau1)
#Likelihood
for(i in 1:n){
logit(p[i]) <- beta0+(beta1*x[i]-mean(x[]))/sd(x[])
# LowBwt[i] ~ dbern(p[i]) # alternative distribution when Bernoulli
LowBwt[i] ~ dbin(p[i],1)}
}"

In [None]:
# Create data list
lowbwt.data <- list(n=nrow(bwt),LowBwt=bwt$LowBwt,x=bwt$Mother.age)
# Create initial values for JAGS
num.chains <- 3
lowbwt.inits <- list(list(beta0=-1,beta1=0),
list(beta0=1,beta1=-1),
list(beta0=2,beta1=1))
# Create model block for JAGS
lowbwt.model <- "model {
# data that will be read in are n, LowBwt, x
# prior
tau0 <- 0.00028 # 1/60^2
tau1 <- 0.0025 # 1/20^2
beta0 ~ dnorm(0,tau0)
beta1 ~ dnorm(0,tau1)
#Likelihood
for(i in 1:n) {
logit(p[i]) <- beta0+beta1*(x[i]-mean(x[]))/sd(x[])
# LowBwt[i] ~ dbern(p[i]) #an alternative when #trials=1
LowBwt[i] ~ dbin(p[i],1)}
}"
#Run JAGS to the completion of the "adaption" stage
burnin <- 5000
inference.length <- 10000
results.A <- jags.model(file=textConnection(lowbwt.model),
data=lowbwt.data, inits=lowbwt.inits,
n.chains=num.chains, quiet = TRUE)
#
update(results.A, n.iter=burnin)
#
results.B <- coda.samples(results.A,
variable.names=c("beta0","beta1"),n.iter=inference.length)
# Convergence checks 
plot(results.B)
gelman.plot(results.B)
gelman.diag(results.B)
autocorr.plot(results.B[[1]][,"beta0"],main="Intercept")
autocorr.plot(results.B[[1]][,"beta1"],main="Slope")
effn.b0 <- effectiveSize(results.B[[1]][,"beta0"])
effn.b1 <- effectiveSize(results.B[[1]][,"beta1"])
cat("Given a chain of length", inference.length,"Effective n=",
round(effn.b0),round(effn.b1),"nn")
#
summary(results.B)
# compare to glm results
print(coef(m.age))

**(iv) This exercise is to give you experience creating indicator variables in JAGS when the covariate
is categorical. Model the logit transform of $p$, the probability of low birthweight, as a function
of race. Create 2 indicator variables, one for White and one for Black, thus when both
indicators equal 0, the race is Other. You can use the following code lines to include this
indicator variables into you data frame.**

In [None]:
n <- nrow(bwt)
bwt$White.Ind <- bwt$Black.Ind <- numeric(n)
bwt$White.Ind[bwt$Race==1] <- 1
bwt$Black.Ind[bwt$Race==2] <- 1

**(iv a) Run the JAGS model with random initial values. Choose non-informative prior distributions
(in case of doubt, perform a sensitivity to priors analysis). Conduct the usual convergence diagnostics.**

**Use the posterior samples to compute the expected posterior odds ratios for each race; e.g.,**
$$\dfrac{Pr(\text{LowBwt|Other})}{1-Pr(\text{LowBwt|Other})}=\exp{(\beta_0)}$$

$$\dfrac{Pr(\text{LowBwt|White})}{1-Pr(\text{LowBwt|White})}=\exp{(\beta_0+\beta_\text{White})}$$

**Calculate also $90\%$ credible intervals for these odds ratios. You can use the function
`do.call(rbind.data.frame, results.race.B)`, where `results.race.B` is the object containing the MCMC chains, to combine the simulations of the $\beta$s and directly
manipulate them or modify your JAGS model to include these computed variables and
re-run the code.**


In [None]:
lowbwt.race.data <- list(n=dim(bwt)[1], LowBwt=bwt$LowBwt,
White.Ind=bwt$White.Ind, Black.Ind=bwt$Black.Ind)
# Create initial values for JAGS
num.chains <- 3
lowbwt.race.inits <- function(){list(beta0= rnorm(0,2),
b.White= rnorm(0,2),
b.Black= rnorm(0,2)) }
# Create model block for JAGS
lowbwt.race.model <- "model{ 
# prior
tau <- 0.001
beta0 ~ dnorm(0,tau)
b.White ~ dnorm(0,tau)
b.Black ~ dnorm(0,tau)
#Likelihood
for(i in 1:n) {
logit(p[i]) <- beta0 + b.White*White.Ind[i] + b.Black*Black.Ind[i]
LowBwt[i] ~ dbin(p[i],1)
}
}"
# Run JAGS to the completion of the "adaption" stage
burnin <- 5000
inference.length <- 10000
results.race.A <- jags.model(file=textConnection(lowbwt.race.model),
data=lowbwt.race.data, inits=lowbwt.race.inits,
n.chains=num.chains, quiet = TRUE)
update(results.race.A, n.iter=burnin)
results.race.B <- coda.samples(results.race.A,
variable.names=c("beta0","b.White","b.Black"),n.iter=inference.length)
# (Convergence checks not shown in the document)

In [None]:
fit.const<-do.call(rbind.data.frame, results.race.B)
ORothers <- exp(fit.const$beta0)
ORblack <- exp(fit.const$beta0 + fit.const$b.Black)
ORwhite <- exp(fit.const$beta0 + fit.const$b.White)
mean(ORothers)
## [1] 0.608
quantile(ORothers, c(0.05,0.95))
## 5% 95%
## 0.390 0.880
mean(ORblack)
## [1] 0.785
quantile(ORblack, c(0.05,0.95))
## 5% 95%
## 0.367 1.397
mean(ORwhite)
## [1] 0.320
quantile(ORwhite, c(0.05,0.95))
## 5% 95%
## 0.207 0.457

**(iv b) Knowing the actual probabilities of an outcome is also of interest. Compute the following quantities and the relative $95\%$ CI by modifying your JAGS code or directly manipulating the MCMC
simulations:**
$$Pr(\text{LowBwt|Other})=inv.logit(\beta_0)$$

$$Pr(\text{LowBwt|Black})=inv.logit(\beta_0+ \beta_\text{Black})$$

$$Pr(\text{LowBwt|White})=inv.logit(\beta_0+ \beta_\text{White})$$

**Note: In JAGS `ilogit` is the inverse of the logit function returning a value in (0,1). In
`R` you should define the function yourself.**

In [None]:
invlogit <- function(x){1/(1+exp(-x))}
Pothers <- invlogit(fit.const$beta0)
Pblack <- invlogit(fit.const$beta0 + fit.const$b.Black)
Pwhite <- invlogit(fit.const$beta0 + fit.const$b.White)
mean(Pothers)
## [1] 0.373
quantile(Pothers, c(0.05,0.95))
## 5% 95%
## 0.281 0.458
mean(Pblack)
## [1] 0.423
quantile(Pblack, c(0.05,0.95))
## 5% 95%
## 0.268 0.583
mean(Pwhite)
## [1] 0.240
quantile(Pwhite, c(0.05,0.95))
## 5% 95%
## 0.171 0.314

**(v) Implement the Bayesian analysis of the mother's age model (A) in INLA using the same priors as in (iii). Compare the results with what you have obtained in (iii).
Discuss how would you be able to also include the covariate `Mother.age` and the categorical covariates `Race` and `Smoke` in an alternative model. Compare how well these two models fit the data using marginal likelihood, DIC, and NLSCPO criteria (the first one only using the mother's weight, while the second one using all 4 covariates).
<br>
Hint: in INLA, binary data with logistic link function can be handled by the call
<br>
inla(formula,family="binomial", control.family=list(link="logit"),data=data,...)
<br>
If there are multiple trials in each row (which is not the case here), the Ntrials argument has to be used to indicate the number of trials (see the code for Lecture 4 for an example on the Beetles dataset).**

The code below loads INLA.

In [None]:
#This code unzips an installation of R-INLA from an online source, and loads INLA
#IMPORTANT: Go to the Kaggle Settings (right hand side) and enable the Internet option before running this.
system("wget --no-check-certificate -r 'https://uoe-my.sharepoint.com/:u:/g/personal/dpaulin_ed_ac_uk/EUNBvDg_EJVFqSZJA3Xz7LsB5cVgqYk0HWWnOp74_Dr28A?download=1' -O /kaggle/working/kaggle_INLA.zip")
system("unzip /kaggle/working/kaggle_INLA.zip")
system("rm /kaggle/working/kaggle_INLA.zip")
library(INLA,lib.loc="/kaggle/working")
#If INLA has been successfully loaded, you should see the following:
#This is INLA_20.03.17 built 2021-01-02 20:27:47 UTC.
#See www.r-inla.org/contact-us for how to get help.
#To enable PARDISO sparse library; see inla.pardiso()

#The following code does the full installation. You can try it if the previous code fails, but this takes longer.
#install.packages("INLA",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE,lib="/kaggle/working")
#library(INLA,lib.loc="/kaggle/working")

In [None]:
#### Bayesian analysis for the Low Birth Weights data with INLA####

#This list encodes the means and precisions of the Gaussian prior for the regression coefficients beta
#This is the same prior that we used for JAGS
prior.beta <- list(mean.intercept = 0, prec.intercept = 0.00028,
                    mean = 0, prec = 0.0025)

#We create standardized covariates for mother's age and weight
bwt$age.std <- scale(bwt$Mother.age)[,1]
bwt$wt.std <- scale(bwt$Mother.wt)[,1]

#We fit the model (A) based on the mother's age with INLA

m.age.I <- inla(LowBwt ~ age.std,family="binomial", control.family=list(link="logit"), control.fixed=prior.beta, data = bwt,control.compute=list(cpo=TRUE,dic=TRUE))
summary(m.age.I)

The results are very similar to what we have obtained in (iii).
The regression coefficient for mother's age (standardized) has posterior mean of -0.273, meaning that older mothers are less likely to give birth to children with low birth weights.
This is somewhat contrary to the common knowledge that the chance of birth defects increases with the mother's age, which suggests that other age related factors might be at play for this particular birth defect (such as economic deprivation).

In [None]:
#We fit another model using the mother's age, weight, race, and whether they smoke
n <- nrow(bwt)
bwt$Race.fct=vector(mode="character",length=n)
bwt$Race.fct[bwt$Race==1] <- 'White'
bwt$Race.fct[bwt$Race==2] <- 'Black'
bwt$Race.fct[bwt$Race==3] <- 'Other'
bwt$Race.fct=as.factor(bwt$Race.fct)

prior.beta <- list(mean.intercept = 0, prec.intercept = 0.00028,
                    mean = 0, prec = 0.0025)


m.4.cov.I <- inla(LowBwt ~ age.std+wt.std+Race.fct+Smoke,family="binomial", control.family=list(link="logit"), control.fixed=prior.beta, data = bwt,control.compute=list(cpo=TRUE,dic=TRUE))
summary(m.4.cov.I)

In this case, we can see that the effect of the age is much less significant (posterior mean of regression coefficient becomes -0.121 compared to 0.273). The mother's weight is an important factor, with increased mother's weight reduces the chances of low birth weight. Smoking increases the chance of low birth weight significantly, while having white race reduces the chances of low birth weight dramatically when compared to black or other races.

Finally, we print out the model comparison criteria below.

In [None]:
cat("Marginal log-likelihood of model 1:",m.age.I$mlik[1],"\n")
cat("Marginal log-likelihood of model 2:",m.4.cov.I$mlik[1],"\n")

cat("DIC of model 1:",m.age.I$dic$dic,"\n")
cat("DIC of model 2:",m.4.cov.I$dic$dic,"\n")

cat("NSLCPO of model 1:",-sum(log(m.age.I$cpo$cpo)),"\n")
cat("NSLCPO of model 2:",-sum(log(m.4.cov.I$cpo$cpo)),"\n")

So according to DIC and NLSCPO, the second model including 4 covariates gives a better fit on the data. On the contrary, according to the marginal likelihood, the first model is better than the second.
However, marginal likelihood can be more sensitive to the choice of the prior than the other two criteria, and less sensitive to the fit on the data, so the other two criteria should be given higher importance when comparing these models,
especially when our goal is prediction (i.e. NSLCPO is a cross validation type criteria, and DIC is also closer to cross validation than the marginal likelihood).