Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_predicted(): features and caveats #315

Open
8 of 15 tasks
DominiqueMakowski opened this issue Mar 8, 2021 · 18 comments
Open
8 of 15 tasks

get_predicted(): features and caveats #315

DominiqueMakowski opened this issue Mar 8, 2021 · 18 comments
Labels
enhancement 💥 Implemented features can be improved or revised get_predicted Function specific issues

Comments

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Mar 8, 2021

Here's a list of all features / issues etc., to have a global view and avoid opening multiple issues.

@DominiqueMakowski DominiqueMakowski added the enhancement 💥 Implemented features can be improved or revised label Mar 8, 2021
@IndrajeetPatil
Copy link
Member

Do you think this function should ultimately be assigned to its own package?

A package like prediction, predicted, etc. (not sure which names are available on CRAN).

I would also recommend having an alias for this function called model_predictions.

This would mean the ecosystem would have a consistent function naming schemas and also mirror functions for broom functions:

broom::tidy     <->  parameters::model_parameters
broom::glance   <->  performance::model_performance
broom::augment  <->  prediction::model_prediction

What do you think? cc @strengejacke, @mattansb

@strengejacke
Copy link
Member

Well, we have the modelbased package. I think @DominiqueMakowski just wanted to put the workhorse into insight, and then the "visible" user function will be in modelbased.

@IndrajeetPatil
Copy link
Member

the "visible" user function will be in modelbased

I see, gotcha!

What do you think about calling this function model_prediction?

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Mar 8, 2021

Yes correct, the idea was to have the heavy lifting done at the insight level, mostly cos it is useful elsewhere (performance indices for instance or such). Then, modelbased will be a user-facing package focusing mostly on visualization of models (including visualization of links, contrasts, marginal means etc). I plan to add nice vignettes to modelbased dicsussing a model-based approach to stats, how to make inference and derive indices from models etc.

the function are currently named estimate_link and estimate_response but given the new API of get_predicted I might change to something like estimate_relationship() and estimate_response() or something like this

@strengejacke
Copy link
Member

But we would still use both emmeans and predict, right?

@DominiqueMakowski
Copy link
Member Author

yes yes for estimate_means and estimate_contrasts as the internals of that are definitly beyond us

@strengejacke
Copy link
Member

@DominiqueMakowski
Copy link
Member Author

mmh we'll see, but I feel like for modelbased we could really merge your and indra's experience with model visualization to make something really neat and powerful.

@strengejacke
Copy link
Member

Support for other models, here I list some special cases:

@strengejacke
Copy link
Member

"clmm" is only supported by emmeans, has not predict.

@mattansb
Copy link
Member

I've added prediction intervals for Binomial and Poisson models:

library(insight)
library(magrittr)

glm(am ~ cyl + mpg + hp, data = mtcars,
    family = binomial()) %>% 
  get_predicted(predict = "prediction") %>% 
  as.data.frame() %>% head() %>% zapsmall()
#>                   Predicted CI_low CI_high
#> Mazda RX4         0.1762632      0       1
#> Mazda RX4 Wag     0.1762632      0       1
#> Datsun 710        0.6499998      0       1
#> Hornet 4 Drive    0.2489592      0       1
#> Hornet Sportabout 0.2268359      0       1
#> Valiant           0.0064952      0       0

glm(gear ~ ., data = mtcars,
    family = poisson()) %>% 
  get_predicted(predict = "prediction") %>% 
  as.data.frame() %>% head() %>% zapsmall()
#>                   Predicted CI_low CI_high
#> Mazda RX4          4.266709      1       9
#> Mazda RX4 Wag      4.192287      1       9
#> Datsun 710         4.031290      1       8
#> Hornet 4 Drive     3.043687      0       7
#> Hornet Sportabout  2.967043      0       7
#> Valiant            2.890863      0       7

Created on 2021-03-10 by the reprex package (v1.0.0)

The Gaussian PIs still work as they use to (:

As I noted somewhere prior, each family has its own PI method, so adding each these analytical solutions is somewhat tedious... It might be easier to simulate population values and get the 95% ETI from there instead.


Also, these columns should really be PI_low and PI_high...

mattansb added a commit that referenced this issue Mar 10, 2021
@DominiqueMakowski
Copy link
Member Author

@mattansb I hope you found the code clear and logical :)

One thing to make sure of:

Following your comments, currently the PIs for Bayesian models are obtained via their bespoke function:

out <- as.data.frame(rstantools::predictive_interval(x, newdata = data, prob = ci))

instead of manually computing them from the draws. But does their function does something else than that? Or is it unnecessarily re-sampling draws from the posterior? (note that the iterations were obtained via posterior_predict).

@mattansb
Copy link
Member

instead of manually computing them from the draws. But does their function does something else than that?

Nope - that's exactly what it does:

https://github.com/stan-dev/rstanarm/blob/dee0a2d45bf42b2df791072041151b753edd6af9/R/predictive_interval.R#L82-L91

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Mar 10, 2021

thanks for addressing my source-code-checking laziness 😁 - I'll drop that exception then

@mattansb
Copy link
Member

@strengejacke @mattansb is it possible to get PIs for GAMs? Can we use the same method that for lm/glms?

The code I wrote should also work with GLM GAMs. Is there any reason the same won't apply for gaussian models?

@strengejacke
Copy link
Member

@DominiqueMakowski Currently, some methods like get_predicted.glmmTMB() directly check the predict argument, while others like get_predicted.lmerMod() call .get_predicted_args() to check the input. This leads to the situation that we have methods that have predict = c("expectation", "link", "prediction", "response", "relation") in their usage, while others just have predict = "expectation". Can this be harmonized?

@strengejacke
Copy link
Member

strengejacke commented Jul 26, 2021

See 4aa8e44, else

rez <- as.data.frame(get_predicted(x, iterations = 5))
fails

@bwiernik
Copy link
Contributor

Prediction intervals for glmmTMB: Also, why are confidence intervals so different in respect to lme4 (could be related to issue above)

Note that by default, glmmTMB uses Wald CIs on fixed effects/log variance components, whereas lme4 uses profile likelihood CIs on fixed effects/variance components

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 💥 Implemented features can be improved or revised get_predicted Function specific issues
Projects
None yet
Development

No branches or pull requests

5 participants