Column name '.pred' for prediction intervals? #921

bwiernik · 2020-08-31T19:06:47Z

The interval column names in augment() are always called .conf.low and .conf.high, regardless of whether "confidence" or "prediction" is specified for interval. I think this is potentially confusing and may be misleading in reports if the user doesn't rename the columns. I suggest changing the column names to .pred.low and .pred.high when interval = "prediction" (cf. in the metafor package, confidence intervals are labeled ci and prediction intervals labeled pi: https://github.com/wviechtb/metafor/blob/acacdcaa56ce6ed9c7c3239c15007b14d03012c8/man/predict.rma.Rd#L47).

broom::augment(lm(mpg ~ hp + disp, data = mtcars), interval = "prediction")
#> # A tibble: 32 x 12
#>    .rownames   mpg    hp  disp .fitted .conf.low .conf.high .resid .std.resid
#>    <chr>     <dbl> <dbl> <dbl>   <dbl>     <dbl>      <dbl>  <dbl>      <dbl>
#>  1 Mazda RX4  21     110  160     23.1     16.6        29.7 -2.15      -0.702
#>  2 Mazda RX~  21     110  160     23.1     16.6        29.7 -2.15      -0.702
#>  3 Datsun 7~  22.8    93  108     25.1     18.6        31.7 -2.35      -0.776
#>  4 Hornet 4~  21.4   110  258     20.2     13.5        26.8  1.23       0.408
#>  5 Hornet S~  18.7   175  360     15.5      8.82       22.1  3.24       1.08 
#>  6 Valiant    18.1   105  225     21.3     14.7        27.9 -3.20      -1.06 
#>  7 Duster 3~  14.3   245  360     13.7      7.02       20.4  0.575      0.194
#>  8 Merc 240D  24.4    62  147.    24.7     18.1        31.4 -0.344     -0.115
#>  9 Merc 230   22.8    95  141.    24.1     17.5        30.7 -1.30      -0.428
#> 10 Merc 280   19.2   123  168.    22.6     16.1        29.1 -3.39      -1.11 
#> # ... with 22 more rows, and 3 more variables: .hat <dbl>, .sigma <dbl>,
#> #   .cooksd <dbl>

^{Created on 2020-08-31 by the reprex package (v0.3.0)}

The text was updated successfully, but these errors were encountered:

simonpcouch · 2020-08-31T19:24:23Z

Thanks for the suggestion! I think this is a great discussion to have now, before prediction intervals make it into a CRAN version.

I think I agree here, except that the interval argument is just passed to the relevant predict() method in the current implementation. I'm not sure about the best way to generalize this column renaming to accommodate an arbitrary interval argument other than "confidence" or "prediction".

@grantmcdermott Do you have a sense for how common the interval argument is among predict methods? Have you come across any options other than confidence/prediction/none?

Related to #908.

bwiernik · 2020-08-31T19:36:42Z

Edited with more information.

In the stats predict methods, only lm and nls have interval arguments, both taking c("none", "confidence", "prediction") as choices.

In terms of other possible types that might exist, the other one that comes to mind is credibility/credible ("cred") used in Bayesian analyses (but I'm not aware of any Bayesian package using a predict method). Perhaps a good general solution would be to take the first four characters of the interval argument?

grantmcdermott · 2020-08-31T21:21:47Z

Yeah, I tend to agree.

As it stands, it should be fairly straightforward to implement @bwiernik's suggestion and take the first four letters of the interval argument.

However, a potential concern I have is that you'd need to check compatibility with — and adjust output for — any other augment.x methods that accept a "interval" argument. For example, augment.rq() (see here). EDIT: My phrasing here was a little confusing, so let me say "consistency" instead of "compatibility".

(It's something of an aside, but this seems as good a place as any to mention... the augment.lm() method that I created in #908 differs slightly in its implementation to augment.rq() above. I think the former is cleaner and, while I may not be an unbiased spectator, it's probably a good idea to choose one approach and stick with it going forward.)

Another, even simpler option, is to follow with the predict() naming conventions and use .upr and .lwr as the column names regardless of the interval argument.

bwiernik · 2020-09-01T04:19:14Z

I like .upr and .lwr or .lo and .hi a lot versus "high" and "low".

alexpghayes · 2020-09-01T21:54:47Z

We had this discussion a while back and basically decided to use conf.low and conf.high as generic names for upper and lower bounds for intervals, largely for backwards compatibility reasons. I understand the appeal of new column names but my rather strong preference at the moment is to follow this prior precedent.

…

On Mon, Aug 31, 2020, 11:19 PM Brenton M. Wiernik ***@***.***> wrote: I like .upr and .lwr a lot versus "high" and "low". — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#921 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADTBG277WJSLSC623MBGIQTSDRY5DANCNFSM4QQ22LCQ> .

bwiernik · 2020-09-01T23:20:11Z

Backwards compatibility with what though? Most other packages use something like .lwr and .upr; cl.lb and cl.ub; or ci.lb and ci.ub. I think broom might be one of the only packages using “conf”.

simonpcouch · 2020-09-02T02:45:00Z

Thanks for the note on previous discussions, @alexpghayes.

@bwiernik Backwards compatibility with tidiers from previous versions of the broom package. See the tidy.boot method, for example. :-)

Closing for now--thanks again for bringing this up.

bwiernik · 2020-09-03T17:56:26Z

I understand that backwards compatibility can be important, just wanted to chime in that I had several students who were confused in my office hour this morning asking "why does it always compute a confidence interval when I ask for a prediction interval". They didn't notice that the numbers were changing, only that the column labels were the same.

grantmcdermott · 2020-09-03T20:47:01Z

My $0.02:

I'm not sure if backwards compatibility is quite the issue here because we are dealing with different tidiers (i.e. tidy vs augment). So the interval column names attached to these two tidiers could, in principle, differ. Just like we have "estimate" (tidy) vs ".fitted" (augment).

Of course, it doesn't make sense to change "conf.low" and "conf.high" for tidy. But AFAIK there are very few augment methods that currently support interval columns. In fact, augment.rq and brand new (dev version only) augment.lm are the only two that I can think of off the top of my head.

TL;DR The more I think about this, the more .lwr and .upr seem like the most sensible and consistent names for augment intervals.

simonpcouch · 2020-09-03T23:07:20Z

Thanks for yalls input. I'd be glad to review a PR, and agree that .lwr and .upr are sensible names for interval columns in augment() output. :-)

alexpghayes · 2020-09-03T23:35:53Z

If we're going to revisit this I'd say I'm also strongly opposed to abbreviations and would prefer .lower and .upper here. @simonpcouch Let's make sure to sync before a final decision happens.

bwiernik · 2020-09-04T20:53:30Z

I've opened a pull request changing the columns to .lower and .upper

github-actions · 2021-03-06T00:25:06Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch closed this as completed Sep 2, 2020

simonpcouch reopened this Sep 3, 2020

bwiernik mentioned this issue Sep 4, 2020

Change names of interval columns in augment() for clarity #925

Merged

simonpcouch closed this as completed in #925 Sep 18, 2020

github-actions bot locked and limited conversation to collaborators Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column name '.pred' for prediction intervals? #921

Column name '.pred' for prediction intervals? #921

bwiernik commented Aug 31, 2020

simonpcouch commented Aug 31, 2020

bwiernik commented Aug 31, 2020 •

edited

grantmcdermott commented Aug 31, 2020 •

edited

bwiernik commented Sep 1, 2020 •

edited

alexpghayes commented Sep 1, 2020 via email

bwiernik commented Sep 1, 2020 •

edited

simonpcouch commented Sep 2, 2020

bwiernik commented Sep 3, 2020

grantmcdermott commented Sep 3, 2020

simonpcouch commented Sep 3, 2020

alexpghayes commented Sep 3, 2020

bwiernik commented Sep 4, 2020

github-actions bot commented Mar 6, 2021

Column name '.pred' for prediction intervals? #921

Column name '.pred' for prediction intervals? #921

Comments

bwiernik commented Aug 31, 2020

simonpcouch commented Aug 31, 2020

bwiernik commented Aug 31, 2020 • edited

grantmcdermott commented Aug 31, 2020 • edited

bwiernik commented Sep 1, 2020 • edited

alexpghayes commented Sep 1, 2020 via email

bwiernik commented Sep 1, 2020 • edited

simonpcouch commented Sep 2, 2020

bwiernik commented Sep 3, 2020

grantmcdermott commented Sep 3, 2020

simonpcouch commented Sep 3, 2020

alexpghayes commented Sep 3, 2020

bwiernik commented Sep 4, 2020

github-actions bot commented Mar 6, 2021

bwiernik commented Aug 31, 2020 •

edited

grantmcdermott commented Aug 31, 2020 •

edited

bwiernik commented Sep 1, 2020 •

edited

bwiernik commented Sep 1, 2020 •

edited