Consistency of emmeans tidiers (with other tidiers) #692

crsh · 2019-04-11T19:44:33Z

While working on a PR to add a new emmeans-tidier, I noticed that the emmeans-tidiers have some internal and external inconsistencies:

The lsmobj-method uses the common arguments conf.int and conf.level as defined in the param_confint template. The other methods (e.g., emmGrid) do not provide these arguments and instead rely on the argument names native to the emmeans summary()-methods (e.g., infer and level).

I haven't looked exhaustively at other methods, but I additionally noticed some inconsistencies compared to other contrast tidiers, specifically tidy.TukeyHSD():

tidy.TukeyHSD() reports the contrasted conditions in a column labelled comparison in the form of a-b. In contrast, the emmeans tidiers return the same information in two columns labelled level1 and level2 (containing a and b).

fm1 <- aov(breaks ~ wool + tension, data = warpbreaks)
thsd <- TukeyHSD(fm1, "tension", ordered = TRUE)
tidy(thsd)

emmp <- pairs(emmeans(fm1, ~ tension))
tidy(emmp)

In the tibble returned by tidy.TukeyHSD(), the column containing p-values is labelled adj.p.value. In contrast, the emmeans tidiers label this column p.value regardless of whether it has been adjusted for multiple comparisons or not (see code above). Unless I missed something, the use of adj.p.value is currently unique to tidy.TukeyHSD().

It seems desirable to try to keep things consistent across methods where possible but particularly within the set of tidiers for a given package. I would, therefore, suggest the following changes, that I'd be willing to implement in a PR:

Add the arguments conf.int and conf.level to all emmeans tidiers.
Change reporting of contrast pairs in either tidy.TukeyHSD() or the emmeans-methods. I'm not sure which of the two is preferable here.
Either use adj.p.value in emmeans tidiers whenever p-values are adjusted for multiple comparisons or use p.value in tidy.TukeyHSD(). Again I'm not sure which is preferable.

Any thoughts?

The text was updated successfully, but these errors were encountered:

alexpghayes · 2019-04-21T20:53:01Z

Yes! Please do!
Your call. If you are uncertain why don't you post example output from both here and we can discuss more.
Use adj.p.value in the emmeans tidiers.

Thanks for taking this on! Let me know how I can help!

crsh · 2019-05-02T22:48:42Z

Your call. If you are uncertain why don't you post example output from both here and we can discuss more.

Sure. I've looked around a little and it seems tidy.TukeyHSD() used to have an argument to separate the comparison column into level1 and level2 but it was removed recently to be consistent with the multcomp tidiers. So might make sense to also look at those tidiers. Doing so, I've noticed a couple of other things:

fm1 <- aov(breaks ~ wool + tension, data = warpbreaks)
thsd <- TukeyHSD(fm1, "tension")
as.data.frame(tidy(thsd))

     term comparison  estimate   conf.low  conf.high adj.p.value
1 tension        M-L -10.000000 -19.35342 -0.6465793 0.447421021
2 tension        H-L -14.722222 -24.07564 -5.3688015 0.001121788
3 tension        H-M  -4.722222 -14.07564  4.6311985 0.033626219

library("emmeans")
emmp <- pairs(emmeans(fm1, ~ tension))
as.data.frame(tidy(emmp, infer = c(T, T), level = 0.95))

  level1 level2  estimate std.error df   conf.low  conf.high statistic     p.value
1      M      L -10.000000  3.872378 50 -19.35342 -0.6465793 -2.582393 0.033626219
2      H      L -14.722222  3.872378 50 -24.07564 -5.3688015 -3.801856 0.001121788
3      H      M  -4.722222  3.872378 50 -14.07564  4.6311985 -1.219463 0.447421021

Unless I'm mistaken, I think the column df should be renamed to parameter, as is the case in other methods, right?
In contrast to the tidy.TukeyHSD()-output, there is no term column. Should there be one?

library("multcomp")
glhs <- summary(glht(fm1, linfct = mcp(tension = "Tukey")))
as.data.frame(tidy(glhs))

    lhs rhs   estimate std.error statistic     p.value
1 M - L   0 -10.000000  3.872378 -2.582393 0.033699296
2 H - L   0 -14.722222  3.872378 -3.801856 0.001109948
3 H - M   0  -4.722222  3.872378 -1.219463 0.447461036

The multcomp tidiers require manually calling summary() prior to tidying, whereas emmeans tidiers do so for you. Consequently, it's not possible to obtain test statistics and confidence interval (tidiers are separate) and there are no conf.int or conf.level arguments. tidy.glht seems to be what quick = TRUE should do. It returns only lhs, rhs, and estimate.
As discussed with respect to emmeans, these tidiers should probably also use adj.p.value, if applicable, rather than p.value.
In contrast to the tidy.TukeyHSD()-output, there is no term column. Should there be one?

Regarding point 2.: I like the multcomp approach here. It specifies the contrast and the null value. I'm not sure about the column names. Are lhs and rhsused like this in other tidiers? Otherwise contrast and null.value might be more descriptive?

crsh · 2019-06-12T12:07:06Z

Sorry to bother, @alexpghayes, but do you have any thoughts on this? I'd like to tackle this soon; some developments in my own package depend on this.

alexpghayes · 2019-06-14T21:39:15Z

Apologies!

Your call. If you are uncertain why don't you post example output from both here and we can discuss more.

...it seems tidy.TukeyHSD() used to have an argument to separate the comparison column into level1 and level2 but it was removed recently to be consistent with the multcomp tidiers.

I am happy with or without this argument, although I think providing it would be nice. My main concern is a consistent interface.

Unless I'm mistaken, I think the column df should be renamed to parameter, as is the case in other methods, right?

I don't think so. We typically do this for htest objects but not regression / contrast tables.

In contrast to the tidy.TukeyHSD()-output, there is no term column. Should there be one?

I'm not sure. If there is a way to select between multiple contrasts, or in general get contrasts for multiple categories features, then yes.

The multcomp tidiers require manually calling summary() prior to tidying, whereas emmeans tidiers do so for you. Consequently, it's not possible to obtain test statistics and confidence interval (tidiers are separate) and there are no conf.int or conf.level arguments. tidy.glht seems to be what quick = TRUE should do. It returns only lhs, rhs, and estimate.

Let's move over to the emmeans approach then! Also, we're getting rid of quick, so no need to implement special behavior there.

As discussed with respect to emmeans, these tidiers should probably also use adj.p.value, if applicable, rather than p.value.

Agree.

Regarding point 2.: I like the multcomp approach here. It specifies the contrast and the null value. I'm not sure about the column names. Are lhs and rhsused like this in other tidiers? Otherwise contrast and null.value might be more descriptive?

I like contrast and null.value as well!

crsh · 2019-07-04T20:15:15Z

Thanks, so to summarize I'll do the following:

`emmeans`

add the arguments conf.int and conf.level
if applicable, change p.value to adj.p.value
if applicable, add term column
use contrast (with A - B) and null.value columns

`multcomp`

include summary()-call
if applicable, change p.value to adj.p.value
if applicable, add term column
use contrast (with A - B) and null.value columns

`TukeyHSD`

use contrast (with A - B) and null.value columns

alexpghayes · 2019-07-19T19:36:28Z

Exactly!

@alexpghayes

* Adds tidy-method form summary_emm-objects. * Updates NEWS * Apply suggestions from code review Co-Authored-By: alex hayes <alexpghayes@gmail.com> * Makes requested changes for emmeans tidiers. (#691) - Adds joint_tests()-example - Removes strict = FALSE from tests * Roxygenizes documentation. * Improves consistency of post-hoc comparison tidies (i.e. glht, stats::TukeyHSD, and emmeans). See #692 * Fixes failing tests and roxygenizes. * Adds NEWS entry. * Fixes more failing tests. * Fix bug in emmeans and multcomp examples. * Implement revision suggested by @alexpghayes Co-Authored-By: alex hayes <alexpghayes@gmail.com> Co-authored-by: alex hayes <alexpghayes@gmail.com>

github-actions · 2021-03-07T00:23:38Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

alexpghayes added consistency/specification feature-request new-tidiers labels Aug 7, 2019

alexpghayes mentioned this issue Aug 15, 2019

Adds tidy-method form summary_emm-objects. #691

Merged

crsh mentioned this issue Nov 21, 2019

Improvements of consistency for post-hoc hypothesis test tidiers #788

Merged

9 tasks

crsh closed this as completed Jun 16, 2020

crsh mentioned this issue Sep 27, 2020

Map effect.size to estimate in emmeans tidiers #941

Merged

crsh mentioned this issue Dec 21, 2020

emmeans tidier can return both std.error and conf.low at the same time #962

Merged

github-actions bot locked and limited conversation to collaborators Mar 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistency of emmeans tidiers (with other tidiers) #692

Consistency of emmeans tidiers (with other tidiers) #692

crsh commented Apr 11, 2019

alexpghayes commented Apr 21, 2019

crsh commented May 2, 2019

crsh commented Jun 12, 2019

alexpghayes commented Jun 14, 2019

crsh commented Jul 4, 2019 •

edited

alexpghayes commented Jul 19, 2019

github-actions bot commented Mar 7, 2021

Consistency of emmeans tidiers (with other tidiers) #692

Consistency of emmeans tidiers (with other tidiers) #692

Comments

crsh commented Apr 11, 2019

alexpghayes commented Apr 21, 2019

crsh commented May 2, 2019

crsh commented Jun 12, 2019

alexpghayes commented Jun 14, 2019

crsh commented Jul 4, 2019 • edited

emmeans

multcomp

TukeyHSD

alexpghayes commented Jul 19, 2019

github-actions bot commented Mar 7, 2021

crsh commented Jul 4, 2019 •

edited

`emmeans`

`multcomp`

`TukeyHSD`