Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explained Variance for random slopes model #601

Open
a-difabio opened this issue Aug 1, 2023 · 4 comments
Open

Explained Variance for random slopes model #601

a-difabio opened this issue Aug 1, 2023 · 4 comments
Labels
docs 📚 Something to be adressed in docs and/or vignettes

Comments

@a-difabio
Copy link

I am trying to plot the proportion of explained variance from a linear model with random effects on intercept and slopes. However, the function r2_nakagawa() gives me two warnings regarding the random slopes.
I am attaching a minimal reproducible example, showing the same warnings that I get when using my own data.

  • As for the first, what exactly does "not accurate" mean? Is it impossible to obtain a realiable measure of the explained variance at all? Reading the Nakagawa et al. 2017 paper, I got the impression that gettting an R2 for random slope models was not a problem:

The descriptions were initially limited to random-intercept GLMMs, but have later been extended to random-slope GLMMs

  • Regarding the second warning, the "Time" random slope is already in the model as fixed effect. Is this a bug in r2_nakagawa(), or am I missing something?

Thank you for your work on this super useful package, by the way :)

library(lme4)
#> Loading required package: Matrix
library(performance)
m <- lmer(weight ~ Time * Diet + (1 + Time | Chick), data = ChickWeight)
r2_nakagawa(m, by_group = T)
#> Warning: Model contains random slopes. Explained variance by levels is not
#>   accurate.
#> Warning: Random slopes not present as fixed effects. This artificially inflates
#>   the conditional random effect variances.
#>   Solution: Respecify fixed structure!
#> # Explained Variance by Level
#> 
#> Level   |        R2
#> -------------------
#> Level 1 | 9.180e-04
#> Chick   |     0.861

Created on 2023-08-01 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31)
#>  os       Ubuntu 22.04.1 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en_US.UTF-8
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2023-08-01
#>  pandoc   2.19.2 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version  date (UTC) lib source
#>  boot          1.3-28   2021-05-03 [3] CRAN (R 4.2.2)
#>  cli           3.6.1    2023-03-23 [1] CRAN (R 4.2.2)
#>  digest        0.6.31   2022-12-11 [1] CRAN (R 4.2.1)
#>  evaluate      0.20     2023-01-17 [1] CRAN (R 4.2.1)
#>  fastmap       1.1.1    2023-02-24 [1] CRAN (R 4.2.1)
#>  fs            1.6.1    2023-02-06 [1] CRAN (R 4.2.1)
#>  glue          1.6.2    2022-02-24 [2] RSPM (R 4.2.0)
#>  htmltools     0.5.4    2022-12-07 [1] CRAN (R 4.2.1)
#>  insight       0.19.3   2023-06-29 [1] CRAN (R 4.2.2)
#>  knitr         1.42     2023-01-25 [1] CRAN (R 4.2.2)
#>  lattice       0.20-45  2021-09-22 [3] CRAN (R 4.2.2)
#>  lifecycle     1.0.3    2022-10-07 [1] CRAN (R 4.2.1)
#>  lme4        * 1.1-34   2023-07-04 [1] CRAN (R 4.2.2)
#>  magrittr      2.0.3    2022-03-30 [2] RSPM (R 4.2.0)
#>  MASS          7.3-58.1 2022-08-03 [3] CRAN (R 4.2.2)
#>  Matrix      * 1.5-3    2022-11-11 [1] CRAN (R 4.2.1)
#>  minqa         1.2.5    2022-10-19 [1] CRAN (R 4.2.1)
#>  nlme          3.1-160  2022-10-10 [3] CRAN (R 4.2.2)
#>  nloptr        1.2.2.3  2021-11-02 [1] CRAN (R 4.2.1)
#>  performance * 0.10.4.1 2023-08-01 [1] https://easystats.r-universe.dev (R 4.2.2)
#>  purrr         1.0.1    2023-01-10 [2] RSPM (R 4.2.0)
#>  R.cache       0.16.0   2022-07-21 [1] CRAN (R 4.2.1)
#>  R.methodsS3   1.8.2    2022-06-13 [1] CRAN (R 4.2.1)
#>  R.oo          1.25.0   2022-06-12 [1] CRAN (R 4.2.1)
#>  R.utils       2.12.1   2022-10-30 [1] CRAN (R 4.2.1)
#>  Rcpp          1.0.10   2023-01-22 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2    2022-08-17 [2] RSPM (R 4.2.0)
#>  rlang         1.1.1    2023-04-28 [1] CRAN (R 4.2.2)
#>  rmarkdown     2.20     2023-01-19 [1] CRAN (R 4.2.1)
#>  rstudioapi    0.14     2022-08-22 [2] RSPM (R 4.2.0)
#>  sessioninfo   1.2.2    2021-12-06 [2] RSPM (R 4.2.0)
#>  styler        1.8.1    2022-11-07 [1] CRAN (R 4.2.1)
#>  vctrs         0.6.2    2023-04-19 [2] RSPM (R 4.2.0)
#>  withr         2.5.0    2022-03-03 [2] RSPM (R 4.2.0)
#>  xfun          0.37     2023-01-31 [1] CRAN (R 4.2.1)
#>  yaml          2.3.7    2023-01-23 [1] CRAN (R 4.2.1)
#> 
#>  [1] /home/rstudio/.R4.2_libs
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/local/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@bwiernik
Copy link
Contributor

bwiernik commented Aug 1, 2023

With random slopes, the value of R2 varies as a function of the slope value, so there is no single R2 value for the model.

I would recommend against using any sort of R2 for this sort of model. Or, if you really need to have one, using a generic R2 formula based on the predicted values for the model. Note that such a summary is still a poor index of model performance because it doesn't really describe predictive power for any specific person and is dependent on the exact cases in your sample.

resp <- datawizard::to_numeric(
  insight::get_response(model, verbose = FALSE),
  dummy_factors = FALSE,
  preserve_levels = TRUE
)
mean_resp <- mean(resp, na.rm = TRUE)
pred <- insight::get_predicted(model, ci = NULL, verbose = FALSE)
R2 <- 1 - sum((resp - pred)^2) / sum((resp - mean_resp)^2)

@a-difabio
Copy link
Author

I see your point, but actually I am not dead set on obtaining an R2 value, especially if it is not a good measure for this kind of model. I tried out the performance::icc() function, but I see that it has the same problem with random slopes in the model.

In general, what I would like to obtain is some kind of measure of the importance of the grouping levels (ideally shown with a bar plot). Is there a way to do this king of analysis (which I see people call "variance decomposition") on random slopes mixed models, using the performance package?

@bwiernik
Copy link
Contributor

bwiernik commented Aug 2, 2023

Any variance decomposition statistic (R2, ICC, etc) has the same basic issue--there isn't any single value that describes the importance of the grouping variable for all cases. Instead it depends on the value of the predictor with the random slope.

This post describes the issue well. https://stats.stackexchange.com/a/318418/364001

Sometimes you see ICC curves that plot the ICC for varying levels of the predictor, like this image

Personally, I suggest just interpreting the values of the SD for the random intercepts and slopes in your model

@a-difabio
Copy link
Author

In the end I changed my approach a bit, and now I have a model without random slopes, so I think it should work fine.

Do you know why in the help for the r2_nakagawa function, under "Details", it says

The random effect variances are actually the mean random effect variances, thus the r-squared value is also appropriate for mixed models with random slopes or nested random effects (see Johnson, 2014).

even if the R2 is not a "good" measure when there are random slopes? I'm not being argumentative, just trying to understand better the problem.
Thank you very much for your reply, in any case :)

@strengejacke strengejacke added the docs 📚 Something to be adressed in docs and/or vignettes label Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs 📚 Something to be adressed in docs and/or vignettes
Projects
None yet
Development

No branches or pull requests

3 participants