-
Notifications
You must be signed in to change notification settings - Fork 12
/
asme_prac05_poisson.Rmd
287 lines (198 loc) · 9.82 KB
/
asme_prac05_poisson.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
---
title: "7: Poisson regression in cohort studies"
subtitle: "R 4 ASME"
authors: Authors – Andrea Mazzella, Lakmal Mudalige, & Daniel J Carter
---
-------------------------------------------------------------------------------
## Contents
* Poisson regression models
* simple
* adjusting for covariates
* checking for interaction
* checking for linear associations
-------------------------------------------------------------------------------
## 0. Packages and options
```{r message=FALSE, warning=FALSE}
# Load packages
library(haven)
library(magrittr)
library(survival)
library(summarytools)
library(epiDisplay)
library(tidyverse)
# Limit significant digits to 3, reduce scientific notation
options(digits = 3, scipen = 9)
```
-------------------------------------------------------------------------------
## 1. Data import and exploration
Get ondrate.dta in the same folder as this .Rmd notebook, import it, and explore it.
This dataset contains information from a cohort study on ~1500 people living in Nigeria, on the effect of onchocerciasis (river blindness, a parasitic infection) on optic nerve disease.
```{r}
ond <- read_dta("ondrate.dta") %>% mutate_if(is.labelled, as_factor)
glimpse(ond)
```
* Outcome: `disc2` (optic nerve disease)
* Exposure: `mfpermg` (microfilariae per mg, a measure of severity of the onchocercal infection)
* (Possible) Confounders: `age` (at start), `sex`
* Time:
- `pyrs` (person-years, already calculated)
- `start` (date of entry into study)
- `end` (date of exit from study)
Data management:
```{r}
ond %<>%
mutate(id = as.factor(id),
disc2 = as.integer(disc2),
sex = as.factor(case_when(sex == 1 ~ "male", sex == 2 ~ "female")))
summary(ond)
```
-------------------------------------------------------------------------------
## 2. Calculation of incidence rates
Calculate incidence rates of Optic Nerve Disease by age, sex, and microfilarial load.
```{r}
# Create survival object
ond_surv <- ond %$% Surv(time = as.numeric(start) / 365.25,
time2 = as.numeric(end) / 365.25,
event = disc2)
# Rates by age
pyears(ond_surv ~ age, ond, scale = 1) %>% summary(n = F, rate = T, ci.r = T, scale = 1000)
# Rates by sex
pyears(ond_surv ~ sex, ond, scale = 1) %>% summary(n = F, rate = T, ci.r = T, scale = 1000)
# Rates by microfilarial load
pyears(ond_surv ~ mfpermg, ond, scale = 1) %>% summary(n = F, rate = T, ci.r = T, scale = 1000)
```
We can see that the incidenceL
- increases dramatically with age
- is about the same regardless of gender
- increases when microfilarial load is above 10.
-------------------------------------------------------------------------------
## 3. Poisson regression
Now perform the same three analyses but with Poisson regression.
Poisson regression in R uses the `glm()` formula – but it also needs to include `offset(log(<person-years>))` as a covariate. This offset term is included because a Poisson regression assumes an underlying Poisson distribution, which models count outcome data.
A rate is just a count / personyears. To model just the count when we have a rate, we would therefore have to multiply each side of the Poisson model by the number of personyears. Since Poisson regression uses a log link as part of the generalised linear model, we have to take the log of this new term on the RHS of the model. Hence we end up with this offset term to ensure the assumptions of Poisson regression still apply.
{epiDisplay} has a function to simplify the Poisson regression output: `idr.display()`.
```{r}
# Age
ond %>%
glm(disc2 ~ age + offset(log(pyrs)), family = "poisson", .) %>%
idr.display()
# Sex
ond %>% glm(disc2 ~ sex + offset(log(pyrs)), family = "poisson", data = .) %>%
idr.display()
# Microfilarial load
ond %>% glm(disc2 ~ mfpermg + offset(log(pyrs)), family = "poisson", data = .) %>%
idr.display()
```
-------------------------------------------------------------------------------
## 4. Potential confounders
Is either age or sex associated with microfilarial load? Cross-tabulate them and check the percentages.
```{r}
ond %$% ctable(age, mfpermg)
ond %$% ctable(sex, mfpermg)
```
Older people tend to have higher infection markers: age is a potential confounder.
The sex distribution is balanced, so sex is unlikely to be a confounder.
Note that at this point in ASME, you now know that confounding is a phenomenon outside of the data and is better determined from an appropriate causal diagram. Why is it that just because the microfilarial load appears to be the same across gender we cannot rule out gender as a potential confounder? Looking at the dataset description for this study, why might we still expect a causal role for gender in incidence of onchocerciasis? (Hint: who is and is not eligible to receive ivermectin?)
-------------------------------------------------------------------------------
## 5. Poisson regression with covariates
Adjust for age, sex, and both with Poisson regression. Is there any indication of confounding?
```{r}
# Adjusting for age only
glm(disc2 ~ mfpermg + age + offset(log(pyrs)),
family = "poisson",
data = ond) %>% idr.display()
# Adjusting for sex only
glm(disc2 ~ mfpermg + sex + offset(log(pyrs)),
family = "poisson",
data = ond) %>% idr.display()
# Adjusting for both age and sex
pois_full <- glm(disc2 ~ mfpermg + age + sex + offset(log(pyrs)),
family = "poisson",
data = ond)
idr.display(pois_full)
```
The age-adjusted rate ratios for microfilarial loads are reduced: it is possible that age is a confounder. The RRs remain almost the same after adjusting for sex.
-------------------------------------------------------------------------------
## 6. Poisson and Interaction
As logistic regression, Poisson models also assume that there is no interaction, so you need to check this assumption with a likelihood ratio test. You do this exactly like you do with logistic regression.
Check for interaction between microfilarial load and 1. age, 2. sex.
Note also that you should be thinking epidemiologically - why are we looking for this interaction term? Is there a reason to believe that interaction or effect modification is present?
```{r}
# Interaction between MF and age
mf_age_simple <- glm(disc2 ~ mfpermg + age + offset(log(pyrs)),
family = "poisson",
data = ond)
mf_age_intera <- glm(disc2 ~ mfpermg * age + offset(log(pyrs)),
family = "poisson",
data = ond)
lrtest(mf_age_intera, mf_age_simple)
```
No evidence against the null hypothesis of no interaction from this test - note the large difference in degrees of freedom though - the LR test generally has low power.
```{r}
# Interaction between MF and sex
mf_sex_simple <- glm(disc2 ~ mfpermg + sex + offset(log(pyrs)),
family = "poisson",
data = ond)
mf_sex_intera <- glm(disc2 ~ mfpermg * sex + offset(log(pyrs)),
family = "poisson",
data = ond)
lrtest(mf_sex_intera, mf_sex_simple)
```
No evidence against the null hypothesis of no interaction between MF and sex.
-------------------------------------------------------------------------------
## 7. Poisson and linearity
Assess for a linear relationship with age by treating it as a numerical variaboes (as opposed to categorical) in a simple Poisson model.
```{r}
# Age as numeric
ond$age.n <- as.integer(ond$age)
# Continuous specification
age_cont <- glm(disc2 ~ age.n + offset(log(pyrs)), family = "poisson", data = ond)
idr.display(age_cont)
# Categorical specification
age_catego <- glm(disc2 ~ age + offset(log(pyrs)), family = "poisson", data = ond)
# LRT
lrtest(age_catego, age_cont)
```
There is no evidence against the null hypothesis of a linear increase in the log(rate) from one age group to the next, so we conclude that age could be used as a continuous variable.
-------------------------------------------------------------------------------
## 8. Back to Whitehall
Open whitehal.dta and use Poisson regression to examine the effect of job grade on cardiac mortality, adjusting for ageband and smoking status simultaneously.
```{r include=FALSE}
whitehall <- read_stata("whitehall17.dta")
glimpse(whitehall)
```
The whitehall dataset requires some data management (see session 6) but we'll now do the bare minimum for these Poisson models (not recommended for an actual dataset!):
- age bands as a factor variable;
- job grade as a binary factor variable;
- number of person-years for each person;
- smoking status as a 3-level factor variable.
```{r}
# Explore age distribution
whitehall %>% ggplot(aes(agein)) + geom_histogram() + theme_bw()
# Categorise age in 5-year bands
whitehall$ageband <- cut(whitehall$agein, seq(40, 70, 5)) %>% as.factor()
# Factorise job grade
whitehall$grade <- as.factor(whitehall$grade)
# Create person-years of follow up
whitehall$pyrs <- whitehall %$% as.numeric(timeout - timein)
# Group all current smokers together
whitehall$smok3 <- ifelse(whitehall$smok == 1, 1,
ifelse(whitehall$smok == 2, 2,
3)) %>% as.factor()
```
Now we can fit the models.
```{r}
# Model including job grade
with_grade <- glm(chd ~ offset(log(pyrs)) + grade + smok3 + ageband,
family = poisson(),
data = whitehall)
idr.display(with_grade)
# Model without job grade
wout_grade <- glm(chd ~ offset(log(pyrs)) + smok3 + ageband,
family = poisson(),
data = whitehall)
idr.display(wout_grade)
# LRT
lrtest(with_grade, wout_grade)
```
--------------------------------------------------------------------------------