-
Notifications
You must be signed in to change notification settings - Fork 4
/
2019-04-15-custom-contrasts-in-emmeans.Rmd
244 lines (160 loc) · 11.3 KB
/
2019-04-15-custom-contrasts-in-emmeans.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
---
title: Custom contrasts in emmeans
author: Ariel Muldoon
date: '2019-04-15'
slug: custom-contrasts-emmeans
categories:
- r
- statistics
tags:
- analysis
- teaching
- emmeans
draft: FALSE
description: "One of the nice things about emmeans is that you can build custom comparisons for any groups or combinations of groups. I give an example showing how to set these up."
---
```{r setup, include = FALSE, message = FALSE, purl = FALSE}
knitr::opts_chunk$set(comment = "#")
devtools::source_gist("2500a85297b742c6f2fb3a14549f5851",
filename = 'render_toc.R')
# Make dataset here that will use later
# Two factors plus one log-normal response
set.seed(16)
dat = data.frame(sub.rate = rep(c("A.1", "A.2", "B.1", "B.2", "control"), each = 5) )
y = with(dat, 5 +
-1*(sub.rate == "A.2") +
.5*(sub.rate == "B.1") +
1*(sub.rate == "B.2") +
2*(sub.rate == "control") +
rnorm(n = 25, mean = 0, sd = 1) )
dat$resp = round(y, 1)
```
Following up on a [previous post](https://aosmith.rbind.io/2019/03/25/getting-started-with-emmeans/), where I demonstrated the basic usage of package **emmeans** for doing post hoc comparisons, here I'll demonstrate how to make custom comparisons (aka *contrasts*). These are comparisons that aren't encompassed by the built-in functions in the package.
Remember that you can explore the available built-in **emmeans** functions for doing comparisons via `?"contrast-methods"`.
## Table of Contents
```{r toc, echo = FALSE, purl = FALSE}
render_toc("2019-04-15-custom-contrasts-in-emmeans.Rmd")
```
# Reasons for custom comparisons
There are a variety of reasons you might need custom comparisons instead of some of the standard, built-in ones. One common scenario that I see a lot is when we have a single control group for multiple factors, so the factors aren't perfectly crossed. This comes up, e.g., when doing experiments that involve applying different substances (like fertilizers) at varying rates. One factor is the different substances applied and the other is different application rates. However, the control is applying nothing or water or something like that. There aren't different rates of the control to apply, so there is a single control group for both factors.
Rather than trying to fit a model with multiple factors, focusing on main effects and the interaction, such data can be analyzed with a *simple effects* model. This is where the two (or more) factors of interest have been combined into a single factor for analysis. Such an analysis focuses on the effect of the two factors combined. We can use post hoc comparisons to estimate the overall effects of individual factors.
# R packages
I will load **magrittr** for the pipe in addition to **emmeans**.
```{r}
library(emmeans) # v. 1.3.3
library(magrittr) # v. 1.5
```
# The dataset and model
I've made a small dataset to use as an example. The response variable is `resp` and the two factors of interest have been combined into a single factor `sub.rate` that has 5 levels: `A.1`, `A.2`, `B.1`, `B.2`, and `control`.
One factor, which I'm thinking of as the *substance* factor, is represented by `A` and `B` (and the control). The second, the *rate* factor, is represented by `1` and `2`.
```{r}
dat = structure(list(sub.rate = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L), .Label = c("A.1", "A.2", "B.1", "B.2", "control"), class = "factor"),
resp = c(5.5, 4.9, 6.1, 3.6, 6.1, 3.5, 3, 4.1, 5, 4.6, 7.3,
5.6, 4.8, 7.2, 6.2, 4.3, 6.6, 6.5, 5.5, 7.1, 5.4, 6.7, 6.8,
8.5, 6.1)), row.names = c(NA, -25L), class = "data.frame")
str(dat)
```
I will use a simple, linear model for analysis.
```{r}
fit1 = lm(resp ~ sub.rate, data = dat)
```
# Treatment vs control comparisons
The simple effects model makes it easy to get comparisons for each factor combination vs the control group with `emmeans()`. I'll use `trt.vs.ctrlk` to do this since the control is the last level of the factor.
```{r}
emmeans(fit1, specs = trt.vs.ctrlk ~ sub.rate)
```
We may also be interested in some other comparisons, though. In particular, we might want to do some overall comparisons across the two factors. We will need custom contrasts for this.
# Building custom contrasts
Custom contrasts are based on the estimated marginal means output from `emmeans()`. The first step to building custom contrasts is to calculate the estimated marginal means so we have them to work with. I will name this output `emm1`.
```{r}
emm1 = emmeans(fit1, specs = ~ sub.rate)
emm1
```
I'm going to start with a relatively simple example. I will compare mean `resp` of the `A.2` group to the `B.2` group via custom contrasts.
Building a custom contrast involves pulling out specific group means of interest from the `emmeans()` output. We *pull out* a group mean by making a vector to represent the specific mean of interest. In this vector we assign a `1` to the mean of the group of interest and a `0` to the other groups.
For example, to pull out the mean of `A.2` from `emm1` we will make a vector with 5 values in it, one for each row of the output. The second value will be a `1`, since the mean of `A.2` is on the second row of `emm1`. All the other values in the vector will be `0`.
Below is the vector that represents the `A.2` mean.
```{r}
A2 = c(0, 1, 0, 0, 0)
```
Similarly, to pull out the mean of `B.2` we'll have a vector of 5 values with a `1` as the fourth value. The `B.2` group is on the fourth row in `emm1`.
```{r}
B2 = c(0, 0, 0, 1, 0)
```
When building custom contrasts via vectors like this, the vectors will always be the same length as the number of rows in the `emmeans()` output. I always calculate and print the estimated marginal means prior to building the vectors so I am certain of the number of rows and the order of the groups in the output.
# The contrast() function for custom comparisons
Once we have the vectors that represent the means we are interested in comparing, we actually do the comparisons via the `contrast()` function. Since we are interested in a *difference* in mean response, we take the difference between the vectors that represent the means.
Taking the difference between vectors can be done inside or outside `contrast()`. In this example I'm doing it inside.
The `contrast()` function takes an `emmGrid` object (i.e., output from `emmeans()`) as the first argument. We give the comparison we want to do via a list passed to the `method` argument.
Here I want to calculate the difference in mean `resp` of `A.2` and `B.2`. I subtract the `B2` vector from the `A2` vector. The output is the difference in mean `resp` between these groups.
```{r}
contrast(emm1, method = list(A2 - B2) )
```
# Using named lists for better output
Unfortunately you can't tell what comparisons was done in the output above `r emo::ji("thinking")`. We can use a named list in `method` to make the output more understandable.
```{r}
contrast(emm1, method = list("A2 - B2" = A2 - B2) )
```
# Using "at" for simple comparisons
Note that I didn't need to do a custom contrast to do this particular comparison. I could have gotten the comparison I wanted by using the `at` argument with `pairwise` in `emmeans()` and choosing just the two groups I was interested in.
```{r}
emmeans(fit1, specs = pairwise ~ sub.rate,
at = list(sub.rate = c("A.2", "B.2") ) )
```
# Multiple custom contrasts at once
Multiple custom contrasts can be done simultaneously in `contrast()` by adding more comparisons to the `method` list. I'll demonstrate this by doing the simple example comparison twice, changing only which group mean is subtracted from the other.
I name both elements of the list for ease of interpretation. I find naming the list of comparisons to be a key part of doing these custom contrasts.
```{r}
contrast(emm1, method = list("A2 - B2" = A2 - B2,
"B2 - A2" = B2 - A2) )
```
Doing multiple comparisons at once means a multiple comparisons adjustment can be done as needed. In addition, we can use the `confint()` function do get confidence intervals for the comparisons.
I'll add a multivariate-$t$ adjustment via `adjust = "mvt"` and then get confidence intervals for the comparisons. Remember we can get both confidence intervals and tests for comparisons via `summary()` with `infer = TRUE`.
```{r}
twocomp = contrast(emm1, method = list("A2 minus B2" = A2 - B2,
"B2 minus A2" = B2 - A2),
adjust = "mvt") %>%
confint()
twocomp
```
# More complicated custom contrasts
Now that we've seen a simple case, let's do something slightly more complicated (and realistic). What if we want to compare the `A` group to the `B` group overall, regardless of the application rate?
This is a *main effect* comparison, so I need to average over the effect of the rate factor in order to estimate the overall effect of the levels of the substance factor.
To do this comparison I need the means for all four non-control factor levels. I'll print `emm1` again here so I remember the order of the output before starting to write out the vectors that represent the group means.
```{r}
emm1
```
I'll need all means that involve `A` or `B`, which is the first four group means in `emm1`. I'll make a vector to represent each of these group means.
While typing these vectors out isn't too hard, since they only contain 5 values, when I have many groups and so really long vectors I sometimes use `rep()` to repeat all the 0 values.
```{r}
A1 = c(1, 0, 0, 0, 0)
A2 = c(0, 1, 0, 0, 0)
B1 = c(0, 0, 1, 0, 0)
B2 = c(0, 0, 0, 1, 0)
```
The vectors I made represent means for combinations of substance and rate. I want to compare the *overall* substance group means, though. This can be done by *averaging over* the two rates. This involves literally taking the average of, e.g., `A1` and `A2` vectors to get a vector that represents the overall `A` mean.
```{r}
Aoverall = (A1 + A2)/2
Boverall = (B1 + B2)/2
```
Now that we have vectors to represent the overall means we can do comparison of mean `resp` of the `A` group vs `B` group overall in `contrast()`.
```{r}
contrast(emm1, method = list("A - B" = Aoverall - Boverall) )
```
Custom contrasts are all built in this same basic way. You can also build your own contrast function if there is some contrast you do all the time that is not part of **emmeans**. See the [custom contrasts section](https://cran.r-project.org/web/packages/emmeans/vignettes/comparisons.html#linfcns) of the **emmeans** vignette for more info.
# Just the code, please
```{r getlabels, echo = FALSE, purl = FALSE}
labs = knitr::all_labels()
labs = labs[!labs %in% c("setup", "toc", "getlabels", "allcode", "makescript")]
```
```{r makescript, include = FALSE, purl = FALSE}
options(knitr.duplicate.label = "allow") # Needed to purl like this
input = knitr::current_input() # filename of input document
output = here::here("static", "script", paste(tools::file_path_sans_ext(input), "R", sep = ".") )
knitr::purl(input, output, documentation = 0, quiet = TRUE)
```
Here's the code without all the discussion. Copy and paste the code below or you can download an R script of uncommented code [from here](/script/2019-04-15-custom-contrasts-in-emmeans.R).
```{r allcode, ref.label = labs, eval = FALSE, purl = FALSE}
```