Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Augment lm interval #908

Merged
merged 7 commits into from Aug 12, 2020
Merged

Augment lm interval #908

merged 7 commits into from Aug 12, 2020

Conversation

grantmcdermott
Copy link
Contributor

Fixes #467 and #199

I haven't tested it thoroughly (will let CI do its magic and also build the docs) but looks good on my system. Reprex from the new doc examples:

devtools::load_all('~/Documents/Projects/broom/')
#> Loading broom

library(ggplot2)
library(dplyr, warn.conflicts = FALSE)

mod <- lm(mpg ~ wt + qsec, data = mtcars)

augment(mod, mtcars, interval = "confidence") %>%
  select(.fitted, contains(".conf"), everything())
#> # A tibble: 32 x 20
#>    .fitted .conf.low .conf.high .rownames   mpg   cyl  disp    hp  drat    wt
#>      <dbl>     <dbl>      <dbl> <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1    21.8      20.4       23.2 Mazda RX4  21       6  160    110  3.9   2.62
#>  2    21.0      19.9       22.2 Mazda RX…  21       6  160    110  3.9   2.88
#>  3    25.3      24.0       26.6 Datsun 7…  22.8     4  108     93  3.85  2.32
#>  4    21.6      20.3       22.9 Hornet 4…  21.4     6  258    110  3.08  3.22
#>  5    18.2      17.1       19.2 Hornet S…  18.7     8  360    175  3.15  3.44
#>  6    21.1      19.4       22.7 Valiant    18.1     6  225    105  2.76  3.46
#>  7    16.4      15.0       17.9 Duster 3…  14.3     8  360    245  3.21  3.57
#>  8    22.2      20.7       23.7 Merc 240D  24.4     4  147.    62  3.69  3.19
#>  9    25.1      22.2       28.0 Merc 230   22.8     4  141.    95  3.92  3.15
#> 10    19.4      18.4       20.4 Merc 280   19.2     6  168.   123  3.92  3.44
#> # … with 22 more rows, and 10 more variables: qsec <dbl>, vs <dbl>, am <dbl>,
#> #   gear <dbl>, carb <dbl>, .resid <dbl>, .std.resid <dbl>, .hat <dbl>,
#> #   .sigma <dbl>, .cooksd <dbl>

# predict on new data
newdata <- mtcars %>%
  head(6) %>%
  mutate(wt = wt + 1)

# ggplot2 example where we also construct 95% prediction interval
mod2 <- lm(mpg ~ wt, data = mtcars) ## simpler bivariate model since we're plotting in 2D

au <- augment(mod2, newdata = newdata, interval = "prediction")

ggplot(au, aes(wt, mpg)) + 
  geom_point() +
  geom_line(aes(y = .fitted)) + 
  geom_ribbon(aes(ymin = .conf.low, ymax = .conf.high), col = NA, alpha = 0.3)

Created on 2020-08-05 by the reprex package (v0.3.0)

Minor notes:

  • I also fixed a typo in the augment.lm() docs and changed the coefficient plot example. (No ways this should be reflecting fit +/-se rather than the actual conf.low and conf.high bounds.)

@simonpcouch
Copy link
Collaborator

Thank you so much @grantmcdermott!

A few thoughts..

  • In the GH actions builds, it looks like a few tidiers are failing because their call to augment_newdata() doesn't include an interval argument. Could you set the default for the interval argument to FALSE in augment_newdata()?
  • I think the API looks good, but want to confirm it aligns with other broom functionality. @alexpghayes, does an interval argument with c("none", "confidence", "prediction") options look good to you?

@alexpghayes
Copy link
Collaborator

Yeah that sounds good to me.

@simonpcouch
Copy link
Collaborator

Okay, great. @grantmcdermott, whenever you get the chance, if you could make the change mentioned in the above comment and re document(), this looks good to merge. :-)

@grantmcdermott
Copy link
Contributor Author

grantmcdermott commented Aug 9, 2020

Thanks gents, I'll try to get around to this later today.

I'm a little reticent to set interval=FALSE, though b/c it's mixing argument types (logical vs chars). If this were my own project I'd probably try to pass it through the ellipsis .... Are you fine with me trying that instead?

EDIT: Okay, I decided to go with interval=NULL instead as a middle ground. (Using the ellipsis would have required quite a bit more refactoring of other tidiers and documentation, which I'd rather avoid.)

- Avoids it getting taken up by methods that don't support an interval prediction argument (e.g. augment.loess)
@grantmcdermott
Copy link
Contributor Author

@simonpcouch We'll see what the CI turn up. But running devtools::tests() locally I only encounter one failing test related to the missing "interval" glossary term. That should be fixed by this related PR, which is currently pending.

@grantmcdermott grantmcdermott mentioned this pull request Aug 11, 2020
@simonpcouch
Copy link
Collaborator

I totally see your point on the argument types—thanks for checking me there.

The only remaining failure (besides the modeltests one) is just due to a need for one more devtools::document(), so I'll go ahead and merge and then run it. Thanks so much!

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prediction intervals in augment()?
3 participants