forked from statOmics/PSLS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
07-Anova.Rmd
616 lines (457 loc) · 23.5 KB
/
07-Anova.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
---
title: "7. Analysis of Variance"
author: "Lieven Clement"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
<a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/4.0"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
```{r setup, include=FALSE, cache=FALSE}
knitr::opts_chunk$set(
include = TRUE, comment = NA, echo = TRUE,
message = FALSE, warning = FALSE, cache = TRUE
)
library(Rmisc)
library(tidyverse)
```
# Prostacyclin Example
Researchers study the effect of arachidonic acid on prostacyclin level in blood plasma. They use 3 different concentrations of arachidonic acid:
- low,
- medium and
- high dose
Each treatment is adopted to 12 rats. They measure the prostacyclin levels in blood plasma using an elisa fluorescence measurement.
```{r}
prostacyclin <- read_tsv("https://raw.githubusercontent.com/GTPB/PSLS20/master/data/prostacyclin.txt")
prostacyclin$dose <- as.factor(prostacyclin$dose)
head(prostacyclin)
```
---
## Data exploration
```{r}
prostacyclin %>%
ggplot(aes(x = dose, y = prostac, fill = dose)) +
geom_boxplot() +
geom_point(position = "jitter") +
ylab("prostacyclin (ng/ml)")
prostacyclin %>%
ggplot(aes(sample = prostac)) +
geom_qq() +
geom_qq_line() +
facet_grid(~dose)
```
The data in the three groups is approximately Normally distributed with equal variance:
\[Y_i \vert \text{group j} \sim N(\mu_j,\sigma^2),\]
with $j= \text{1, 2, 3}$
## Research Question
Research question can translated in the following hypotheses
- $H_0$: the arachidonic acid concentration has no effect on the mean prostacyclin level in blood plasma in rats
\[
H_0:\mu_1=\mu_2 = \mu_3
\]
- $H_1$: the arachidonic acid concentration has an effect on the mean prostacyclin level in blood plasma in rats, which implies that the at least two means are different.
In terms of the model parameters this becomes
\[
H_0:\mu_1=\mu_2 = \mu_3
\]
and
\[H_1: \exists\ j,k \in \{1,\ldots,g\} : \mu_j\neq\mu_k\]
Alternative approach: split null hypothesis in partial hypotheses:
\[
H_{0jk}: \mu_j=\mu_k \text{ versus } H_{1jk}: \mu_j \neq \mu_k
\]
Each hypothesis can be tested via two-sample t-tests $\rightarrow$ multiple testing problem + loss of power.
$\rightarrow$ assess $H_0:\mu_1=\mu_2=\mu_3$ with a *single test*.
# Analyse of Variance
Correct solution for testing problem: ANalysis Of VAriance (ANOVA)
- We develop the method for 3 groups (prostacyclin example)
- Model data with linear model by using dummy variables.
- 1 dummy variable less than the number of groups. Here we need 2 dummy variables.
- Generalizing to g groups $g>3$ is trivial (extra dummy variables)
## Model
\begin{eqnarray}
Y_i &=& g(x_{i1},x_{i2}) + \epsilon_i\\
Y_i &=& \beta_0+\beta_1 x_{i1} +\beta_2 x_{i2} +\epsilon_i
\end{eqnarray}
- $Y_i$ de outcome for observation $i$ ($i=1,\ldots, n$)
\vspace{7pt}
- $\epsilon_i\text{i.i.d.} N(0,\sigma^2)$
\vspace{7pt}
- and dummy variables
$$x_{i1} = \left\{ \begin{array}{ll}
1 & \text{ if observation $i$ belongs to middle dose group (M)} \\
0 & \text{ if observation $i$ belongs to other dose group} \end{array}\right.$$
$$x_{i2} = \left\{ \begin{array}{ll}
1 & \text{ if observation $i$ belongs to high dose group (H)} \\
0 & \text{if observation $i$ belongs to other dose group} \end{array}\right. .$$
\vspace{7pt}
- Low dose group (L) with $x_{i1}=x_{i2}=0$ is *reference group*
Regression-model can be rewritten as a model for each group :
\vspace{-20pt}
\begin{eqnarray*}
Y_{i\vert \text{dose=L}} &=& \beta_0+\epsilon_i \\
Y_{i\vert \text{dose=M}} &=& \beta_0+\beta_1+ \epsilon_i \\
Y_{i\vert \text{dose=H}} &=& \beta_0+\beta_2 + \epsilon_i
\end{eqnarray*}
with $\epsilon_i \sim N(0,\sigma^2)$
\vspace{10pt}
Interpretation of model parameters:
\vspace{-20pt}
\begin{eqnarray*}
\beta_0 &=& \text{E}\left[Y_i \mid \text{treatment with low dose group L}\right] \\
\beta_1 &=& (\beta_0+\beta_1)-\beta_0 = \text{E}\left[Y_i \mid \text{treatment M}\right] - \text{E}\left[Y_i \mid \text{treatment L}\right] \\
\beta_2 &=& (\beta_0+\beta_2)-\beta_0 = \text{E}\left[Y_i \mid \text{treatment H}\right]-\text{E}\left[Y_i \mid \text{treatment L}\right].
\end{eqnarray*}
1. $\beta_0$ is the mean outcome for group L
\vspace{7pt}
2. $\beta_1$ is effect (difference in mean concentration) of group M vs group L
\vspace{7pt}
3. $\beta_2$ is effect of group H vs group L
We reformulate the model by using $\mu$-notations:
\vspace{-7pt}
\begin{eqnarray*}
Y_{i\vert \text{dose=L}} &=& \beta_0+\epsilon_i = \mu_1+\epsilon_i \\
Y_{i\vert \text{dose=M}} &=& \beta_0+\beta_1+ \epsilon_i = \mu_2+\epsilon_i \\
Y_{i\vert \text{dose=H}} &=& \beta_0+\beta_2 + \epsilon_i = \mu_3+\epsilon_i .
\end{eqnarray*}
with $\epsilon_i \sim N(0,\sigma^2)$ and
$$ \mu_j = \text{E}\left[Y_i \mid \text{treatment group } j\right].$$
Original null hypothese
$$H_0:\mu_1=\mu_2=\mu_3$$
can be formulated as
$$H_0: \beta_1=\beta_2=0.$$
Model allows us to use all methods from linear regression.
- Parameter estimators for means, variances and standard errors
\vspace{10pt}
- Inference: Confidence intervals, hypothesis tests
\vspace{10pt}
- Test $H_0: \beta_1=\beta_2=0$ with $F$-test.
## Prostacyclin example
```{r}
model1 <- lm(prostac ~ dose, data = prostacyclin)
summary(model1)
```
# Sum of squares and Anova
Similar to simple linear regression we will use sum of squares to derive the F-test.
\vspace{-10pt}
\begin{eqnarray*}
\text{SSR}&=&\sum\limits_{i=1}^n (\hat Y_i -\bar Y)^2\\
&=& \sum\limits_{i=1}^n (\hat{g} (x_{i1},x_{i2}) - \bar Y)^2\\
&=& \sum\limits_{i=1}^n (\hat\beta_0+\hat\beta_1x_{i1}+\hat\beta_2x_{i2}) - \bar Y)^2\\
&=& \sum\limits_{i=1}^{n_1} (\hat\beta_0 - \bar Y)^2 +\sum\limits_{i=1}^{n_2} (\hat\beta_0 + \hat\beta_1 - \bar Y)^2+\sum\limits_{i=1}^{n_3} (\hat\beta_0 + \hat\beta_2 - \bar Y)^2\\
&=& \sum\limits_{i=1}^{n_1} (\bar Y_1- \bar Y)^2 +\sum\limits_{i=1}^{n_2} (\bar Y_2- \bar Y)^2+\sum\limits_{i=1}^{n_3} (\bar Y_3 - \bar Y)^2\\
\end{eqnarray*}
with $n_1$, $n_2$ en $n_3$ the number of observations in each group (here $n-1=n_2=n_3=12$).
\begin{eqnarray*}
\text{SSR}&=&\sum\limits_{i=1}^n (\hat Y_i -\bar Y)^2
\end{eqnarray*}
- Sum of squares is again equivalent with comparison of model (1) and reduced model with an intercept, only.
- For reduced model the intercept is estimated by the sample mean.
- This sum of squares has g-1=2 degrees of freedom:
- g=3 model parameters - 1 parameter to estimate overall sample mean or
- g=3 par. in complex model - 1 par. in reduced model.
## Decomposition of Total Sum of Squares
- The convention in the Anova setting is to denote the sum of squares as SST, the **Sum of Squares of the Treatment (treatment)** or as SSBetween.
- The sum of squares of the regression indeed reflects the variability between the groups.
- The corresponding mean sum of squares becomes $\text{MST}=\text{SST}/2$.
The decomposition of SSTot can be written as
\[
\text{SSTot} = \text{SST} + \text{SSE}
\]
### SSTot
\vspace{10pt}
```{r echo=FALSE, warning=FALSE}
par(mfrow = c(1, 2))
jitIk <- runif(36, -.2, .2) + rep(1:3, each = 12)
plot(prostac ~ dose, data = prostacyclin, xlab = "Arachidonic acid dose ", ylab = "Prostacyclin (ng/ml)", cex.axis = 1.5, cex.lab = 1.5, cex.main = 1.5)
points(jitIk, prostacyclin$prostac, col = col, pch = 19)
points(jitIk, prostacyclin$prostac, col = 4)
points(1:3, predict(model1, data.frame(dose = factor(c(10, 25, 50)))), pch = 17, col = c("bisque", "coral", "darkcyan"), cex = 1.5)
points(1:3, predict(model1, data.frame(dose = factor(c(10, 25, 50)))), pch = 2, col = 1, cex = 1.5)
abline(h = mean(prostacyclin$prostac), lty = 1)
for (i in 1:36) lines(rep(jitIk[i], 2), c(mean(prostacyclin$prostac), prostacyclin$prostac[i]), col = 4, lty = 2)
jitIk <- runif(36, -.2, .2) + rep(1:3, each = 12)
plot(rep(1, 36), prostacyclin$prostac - mean(prostacyclin$prostac), xaxt = "none", ylab = "Deviations", cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.5, col = as.character(prostacyclin$col), xlim = c(1, 3), pch = 19, xlab = "")
points(rep(1, 36), prostacyclin$prostac - mean(prostacyclin$prostac), pch = 1, col = 4)
axis(at = 1:3, labels = c(expression(paste(y[i], " - ", bar(y))), expression(paste(bar(y)[j], " - ", bar(y))), expression(paste(y[i], " - ", bar(y)[j]))), side = 1, cex.axis = 1.5)
```
### SST
\vspace{10pt}
```{r echo=FALSE, warning=FALSE}
par(mfrow = c(1, 2))
plot(prostac ~ dose, data = prostacyclin, xlab = "Arachidonic acid dose ", ylab = "Prostacyclin (ng/ml)", cex.axis = 1.5, cex.lab = 1.5, cex.main = 1.5)
points(jitIk, prostacyclin$prostac, col = col, pch = 19)
points(jitIk, prostacyclin$prostac, col = 4)
points(1:3, predict(model1, data.frame(dose = factor(c(10, 25, 50)))), pch = 17, col = c("bisque", "coral", "darkcyan"), cex = 1.5)
points(1:3, predict(model1, data.frame(dose = factor(c(10, 25, 50)))), pch = 2, col = 2, cex = 1.5)
abline(h = mean(prostacyclin$prostac), lty = 1)
for (i in 1:3) lines(rep(i, 2), c(mean(prostacyclin$prostac), predict(model1, data.frame(dose = levels(prostacyclin$dose)[i]))), col = 2, lty = 2)
plot(rep(1, 36), prostacyclin$prostac - mean(prostacyclin$prostac), xaxt = "none", ylab = "Deviations", cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.5, col = as.character(prostacyclin$col), xlim = c(1, 3), pch = 19, xlab = "")
points(rep(1, 36), prostacyclin$prostac - mean(prostacyclin$prostac), pch = 1, col = 4)
points(rep(2, 3), predict(model1, data.frame(dose = factor(c(10, 25, 50)))) - mean(prostacyclin$prostac), pch = 17, col = unique(prostacyclin$col), cex = 1.5)
points(rep(2, 3), predict(model1, data.frame(dose = factor(c(10, 25, 50)))) - mean(prostacyclin$prostac), pch = 2, cex = 1.5, col = 2)
axis(at = 1:3, labels = c(expression(paste(y[i], " - ", bar(y))), expression(paste(bar(y)[j], " - ", bar(y))), expression(paste(y[i], " - ", bar(y)[j]))), side = 1, cex.axis = 1.5)
```
### SSE
\vspace{10pt}
```{r echo=FALSE, warning=FALSE}
par(mfrow = c(1, 2))
plot(prostac ~ dose, data = prostacyclin, xlab = "Arachidonic acid dose ", ylab = "Prostacyclin (ng/ml)", cex.axis = 1.5, cex.lab = 1.5, cex.main = 1.5)
points(jitIk, prostacyclin$prostac, col = col, pch = 19)
points(jitIk, prostacyclin$prostac, col = 1)
points(1:3, predict(model1, data.frame(dose = factor(c(10, 25, 50)))), pch = 17, col = c("bisque", "coral", "darkcyan"), cex = 1.5)
points(1:3, predict(model1, data.frame(dose = factor(c(10, 25, 50)))), pch = 2, col = 2, cex = 1.5)
for (i in 1:3) lines(c(i - .2, i + .2), rep(predict(model1, data.frame(dose = levels(prostacyclin$dose)[i])), 2), col = c("bisque", "coral", "darkcyan")[i])
abline(h = mean(prostacyclin$prostac), lty = 1)
for (i in 1:36) lines(rep(jitIk[i], 2), c(prostacyclin$prostac[i], model1$fitted[i]), col = 1, lty = 2)
plot(rep(1, 36), prostacyclin$prostac - mean(prostacyclin$prostac), xaxt = "none", ylab = "Deviations", cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.5, col = as.character(prostacyclin$col), xlim = c(1, 3), pch = 19, xlab = "")
points(rep(1, 36), prostacyclin$prostac - mean(prostacyclin$prostac), pch = 1, col = 4)
points(rep(2, 3), predict(model1, data.frame(dose = factor(c(10, 25, 50)))) - mean(prostacyclin$prostac), pch = 17, col = unique(prostacyclin$col), cex = 1.5)
points(rep(2, 3), predict(model1, data.frame(dose = factor(c(10, 25, 50)))) - mean(prostacyclin$prostac), pch = 2, col = 2, cex = 1.5)
points(rep(3, 36), model1$res, pch = 19, col = as.character(prostacyclin$col))
points(rep(3, 36), model1$res, pch = 1)
axis(at = 1:3, labels = c(expression(paste(y[i], " - ", bar(y))), expression(paste(bar(y)[j], " - ", bar(y))), expression(paste(y[i], " - ", bar(y)[j]))), side = 1, cex.axis = 1.5)
```
## Anova test
Test $H_0: \beta_1=\beta_2=0$ with $F$-test.
\[
F = \frac{\text{MST}}{\text{MSE}}
\]
with
- $\text{MST}=\text{SST}/(g-1)$
\vspace{10pt}
- $\text{MSE}=\text{SSE}/(n-p)$
\vspace{10pt}
- Test statistic compares the variability explained by model (MST) with the residual variability (MSE)
or
- Variability between groups (MST) to variability within groups (MSE)
\vspace{10pt}
- Under $H_0$: $F \sim F_{g-1,n-g}$, with g=3.
## Anova Table
| |Df|Sum Sq|Mean Sq|F value|Pr(>F)|
|---|---|---|---|---|---|
|Treatment|d.f. SST|SST|MST|F-statistiek|p-waarde|
|Error|d.f. SSE|SSE|MSE| | |
```{r}
anova(model1)
```
### F-distribution with critical value ($\alpha$=5%) and observed F-statistic for prostacyclin example
```{r prostacF, out.width='100%', fig.asp=.8, fig.align='center',echo=FALSE}
grid <- seq(0, 17, .01)
df1 <- anova(model1)[1, 1]
df2 <- anova(model1)[2, 1]
fval <- anova(model1)[1, 4]
crit <- qf(0.95, df1, df2)
reject <- c(crit, grid[which(grid > crit)])
accept <- c(grid[which(grid < crit)], crit)
plot(grid, df(grid, df1, df2), type = "l", ylab = "Density", xlab = "F-statistic", cex.axis = 1.5, cex.lab = 1.5)
polygon(c(0, accept, crit, 0), c(0, df(accept, df1, df2), 0, 0), col = "blue", border = "blue")
text(crit / 2, .97, labels = "accept\n95%", col = "blue", cex = 1.5)
polygon(c(crit, reject, 15, crit), c(0, df(reject, df1, df2), 0, 0), col = "red", border = "red")
abline(v = crit, col = "red", lwd = 2)
text(crit + (fval - crit) / 2, .97, labels = "reject\n5%", col = "red", cex = 1.5)
text(pos = 4, crit, df(crit, 2, 33), labels = paste0("F(0.05,", df1, ",", df2, ")"), col = "red", cex = 1.5)
text(pos = 4, fval, df(crit, df1, df2), labels = paste0("f=", round(fval, 1)), col = "darkorange", cex = 1.5)
abline(v = fval, col = "darkorange", lwd = 2, lty = 2)
text(15.5, .97, labels = paste0("p-value\n", format(anova(model1)[1, 5], digits = 2)), col = "darkorange", cex = 1.5)
arrows(x0 = 17.5, x1 = fval, y0 = .9, y1 = .9, col = "darkorange")
```
### F-distributions with different number of degrees of freedom in the nominator and denominator
```{r ftheo, out.width='100%', fig.asp=.8, fig.align='center',echo=FALSE}
plot(grid, df(grid, 1, 5), type = "l", ylab = "Density", xlab = "F-statistic", xlim = c(0, 5), ylim = c(0, 1.5), lwd = 2, cex.axis = 1.5, cex.lab = 1.5)
lines(grid, df(grid, 5, 5), type = "l", col = 2, lwd = 2)
lines(grid, df(grid, 10, 30), type = "l", col = 3, lwd = 2)
lines(grid, df(grid, 20, 30), type = "l", col = 4, lwd = 2)
lines(grid, df(grid, 50, 50), type = "l", col = 5, lwd = 2)
legend("topright", lty = 1, col = c(1, 2, 3, 4, 5), legend = c("F(1,5)", "F(5,5)", "F(10,30)", "F(20,30)", "F(50,50)"), lwd = 2, cex = 1.5)
```
### Prostacyclin example: which groups are different?
```{r}
summary(model1)
```
With model output we can assess if the mean prostacyclin concentration differs between middle and low dose group ($\beta_1$: dose25), and, between high and low dose group ($\beta_2$: dose50).
The p-values do not account for multiple testing.
# Post hoc analysis: Multiple comparisons of means
## Naive method
In the first part we developed the $F$-test to assess
$$ H_0: \mu_1=\cdots = \mu_g \text{ versus } H_1: H_1: \exists\ j,k \in \{1,\ldots,g\} : \mu_j\neq\mu_k$$
- If we reject $H_0$ we conclude that at least two means are different
- The method does not allow to identify which means are different.
A first naive method is to split $H_0$ in partial hypotheses
$$H_{0jk}: \mu_j=\mu_k \text{ versus } H_{1jk}: \mu_j \neq \mu_k$$
- Falsify partial null hypotheses with two-sample $t$-testen
- Comparison of group $j$ with group $k$ with two-sample $t$-test under the equality of means:
$$T_{jk} = \frac{\bar{Y}_j-\bar{Y}_k}{S_p\sqrt{\frac{1}{n_j}+\frac{1}{n_k}}} \sim t_{n-2}$$
With
- $S_p^2$ the pooled variance estimator,
$$S_p^2 = \frac{(n_j-1)S_j^2 + (n_k-1)S_k^2}{n_j+n_k-2}$$
- with $S_j^2$ and $S_k^2$ the sample variances of group $j$ en $k$, respectively.
In ANOVA context we assume that the variance of **all** $g$ groups is equal, i.e. the residual variance $\sigma^2$.
- Use of $S_p^2$ is not efficient because it does not make use of all data
- We can gain efficiency by using MSE
$$\text{MSE}= \sum_{j=1}^g \frac{(n_j-1)S_j^2}{n-g}$$
- The $t$-tests are thus best based on
$$T_{jk} = \frac{\bar{Y}_j-\bar{Y}_k}{\text{MSE}\sqrt{\frac{1}{n_j}+\frac{1}{n_k}}} \sim t_{n-g}.$$
---
```{r}
with(prostacyclin, pairwise.t.test(prostac, dose, "none"))
```
When we perform $m$-tests on the $\alpha$ significance level we cannot correctly control the type I error.
## We show with simulation that naive method does not work
1. We simulate from an ANOVA model with $g=3$ groups.
2. The means in the ANOVA model are equal to each other, so that $$H_0: \mu_1=\mu_2=\mu_3$$.
3. For each simulated dataset we conduct $m=3$ pairwise two-sample $t$-test
4. As soon as one of the $p$-values is below significance level $\alpha=5\%$, we reject $H_0: \mu_1=\mu_2=\mu_3$ because two means are different according to the $t$-tests.
5. We rapport the relative frequency of rejection of the global null hypothesis, i.e. the probability on a type I error $H_0: \mu_1=\mu_2=\mu_3$.
```{r}
g <- 3 # number of treatments (g=3)
ni <- 12 # number of observations in each group
n <- g * ni # total number of observation
alpha <- 0.05 # significance level of individual test
N <- 10000 # number of simulations
set.seed(302) # seed to reproduce results exactly
trt <- factor(rep(1:g, ni)) # factor
cnt <- 0 # counter for erroneous rejections
for (i in 1:N) {
# if (i%%1000==0) cat(i,"/",N,"\n")
y <- rnorm(n)
tests <- pairwise.t.test(y, trt, "none")
reject <- min(tests$p.value, na.rm = T) < alpha
if (reject) cnt <- cnt + 1
}
cnt / N
```
---
- Probability on the type I error equals `r round(cnt/N,3)*100`%
- It is more then twice $\alpha=5$%.
- If we repeat the simulation with g = 5 groups (i.e. 10 pairwise t-tests) we find a type I error of 28.0% instead of the desired 5%.
- The simulation study illustrates the **multiplicity** problem
- Classical p-values can only be compared with the significance level $\alpha$, if the conclusion is based on a single p-value.
- Here the final decision is based on $m=g\times(g-1)/2$ $p$-values.
- We first discuss on the extension of the concept of type I errors and then introduce some solutions
## Family-wise error rate
- When $m>1$ tests are used to make one decision it is necessary to correct for the risk on false positive results (type I errors).
- Most procedures for multiple testing assume that *all $m$ null hypotheses are true*.
- So one tries to control the *risk on at least 1 false positive* on the **family wise error rate (FWER) $\alpha_F$**, typical $\alpha_F=0.05$.
## Bonferroni correction
When we conduct $m$ independent test each on the significance level $\alpha$, then
\begin{eqnarray*}
\alpha_F&=&\text{P}[\text{at least 1 Type I fout}]\\
&=&1-(1-\alpha)^m \leq m\alpha
\end{eqnarray*}
- If we assess 5 tests on the 5% significance level then the FWER $\approx 25\%$. {10pt}
- By conducting them at the 1% significance level the FWER $\approx 5\%$.
- The Bonferroni correction controls the FWER on $\alpha_F$ by setting $$\alpha=\alpha_F/m$$ for each of the $m$ pairwise comparisons
An alternative approach is to report
1. *adjusted p-values* that can be compared to the FWER $\alpha_F$ level: $$\tilde{p}=min(m\times p,1)$$
2. and $(1-\alpha_F/m)100\%$ confidence intervals.
### Prostacyclin example
```{r}
with(prostacyclin, pairwise.t.test(prostac, dose, data = prostacyclin, p.adjust.method = "bonferroni"))
```
- The conclusions remain similar, except that the FWER is now controlled at $\alpha_F=5\%$ and that the $\tilde{p}$-values are larger with a factor 3.
The same analysis can be conducted in the `multcomp` R package that is developed for multiple testing in linear models.
```{r}
library(multcomp)
```
```{r}
model1.mcp <- glht(model1, linfct = mcp(dose = "Tukey"))
summary(model1.mcp, test = adjusted("bonferroni"))
```
Note, that the user has to define custum functions to obtain Bonferonni adjusted confidence intervals.
- Bonferonni confidence intervals are not implemented because better methods exist for multiple testing.
- The function below is added here merely for completeness, but we will generally use the default method for multiple testing in multcomp.
```{r}
calpha_bon_t <- function(object, level) {
abs(
qt(
(1 - level) / 2 / nrow(object$linfct),
object$df
)
)
}
```
```{r}
confint(model1.mcp, calpha = calpha_bon_t)
```
### Evaluate Bonferroni method via simulation
```{r}
g <- 3 # number of treatments (g=3)
ni <- 12 # number of observations in each group
n <- g * ni # totaal number observation
alpha <- 0.05 # significance level of individual test
N <- 10000 # number of simulaties
set.seed(302) # seed to reproduce results exactly
trt <- factor(rep(1:g, ni)) # factor
cnt <- 0 # counter for erroneous rejections
for (i in 1:N) {
# if (i%%1000==0) cat(i,"/",N,"\n")
y <- rnorm(n)
tests <- pairwise.t.test(y, trt, "bonferroni")
reject <- min(tests$p.value, na.rm = T) < alpha
if (reject) cnt <- cnt + 1
}
cnt / N
```
- We find an FWER of `r round(cnt/N*100,1)`%, which is slightly conservative.
- For simulations of $g=5$ group the FWER is $4.1\%$ (more conservative).
- By using Bonferroni the probability on at least one false positive result is lower than $< \alpha_F$.
- Power loss because the real FWER is smaller than 5%
## Tukey Method
- Less conservatieve
- Implementation approximates the null distribution of posthoc tests via simulations
- Results can change slightly if the posthoc analysis is repeated
- Details on the method falls outside the scope of the short course
- Is the default method in the multcomp package:
- adjusted p-values
- adjusted confidence intervals
### Captopril example
```{r}
model1.mcp <- glht(model1, linfct = mcp(dose = "Tukey"))
summary(model1.mcp)
```
```{r}
confint(model1.mcp)
```
```{r out.width='100%', fig.asp=.8, fig.align='center'}
plot(confint(model1.mcp))
```
### Evaluate Tukey method
```{r}
g <- 3 # number of treatments (g=3)
ni <- 12 # number of observations in each group
n <- g * ni # totaal number observation
alpha <- 0.05 # significance level of individual test
N <- 10000 # number of simulations
set.seed(302) # seed to reproduce results exactly
trt <- factor(rep(1:g, ni)) # factor
cnt <- 0 # counter for erroneous rejections
for (i in 1:N) {
# if (i%%1000==0) cat(i,"/",N,"\n")
y <- rnorm(n)
m <- lm(y ~ trt)
m.mcp <- glht(m, linfct = mcp(trt = "Tukey"))
tests <- summary(m.mcp)$test
reject <- min(as.numeric(tests$pvalues), na.rm = T) < alpha
if (reject) cnt <- cnt + 1
}
cnt / N
```
# Conclusions: Prostacyclin example
Entire analysis for prostacyclin example
1. Anova before posthoc tests: F-test has a higher power than pairwise t-test
- F-test uses all data
- For F-test we do not need to correct for multiple testing: one test is conducted for the general omnibus hypothesis
```{r}
model1 <- lm(prostac ~ dose, data = prostacyclin)
anova(model1)
```
```{r}
model1.mcp <- glht(model1, linfct = mcp(dose = "Tukey"))
summary(model1.mcp)
```
```{r}
confint(model1.mcp)
```
- There is an extreme significant effect of arachidonic acid on the average prostacyclin blood concentration in rats ($p<0.001$).
The average prostacyclin concentration is higher in the high dose group than in the low and moderate dose group (both p-values are smaller than $p<0.001$).
- The average concentration in the high dose group is `r round(confint(model1.mcp)$confint[2,1],1)`ng/ml (95% CI [`r paste(round(confint(model1.mcp)$confint[2,2:3],1),collapse=",")`]ng/ml) and `r round(confint(model1.mcp)$confint[3,1],1)`ng/ml (95% BI [`r paste(round(confint(model1.mcp)$confint[3,2:3],1),collapse=",")`]ng/ml) higher than in the low and middle dose group, respectively.
- The difference in average prostacyclin concentration between the moderate and low dose group is not significant (p=`r round(summary(model1.mcp)$test$pvalues[1],2)`).
(All p-values and confidence intervals for post-hoc tests are corrected for multiple testing using the Tukey method).