-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICC above 1 for glmmTMB
with beta_family
#664
Comments
Just to add some information (in case it's helpful) If I try to do the same thing with a gaussian model I do not seem to get the error. And it seems to be the same thing that affects the R2 (or at least they're both affected). glmm_gaussian <- glmmTMB::glmmTMB(total_scaled ~ time_of_day + sex + (1 | animal_id),
data = paired_long_ld,
family = gaussian)
glmm_beta <- glmmTMB::glmmTMB(total_scaled ~ time_of_day + sex + (1 | animal_id),
data = paired_long_ld,
family = beta_family)
performance::compare_performance(glmm_long_dd_a, glmm_long_dd_b)
# Some of the nested models seem to be identical and probably only vary in their random effects.
# # Comparison of Model Performance Indices
#
# Name | Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma
# --------------------------------------------------------------------------------------------------------------------------
# glmm_gaussian | glmmTMB | -51.4 (<.001) | -50.8 (<.001) | -37.8 (<.001) | 0.658 | 0.134 | 0.605 | 0.128 | 0.144
# glmm_beta | glmmTMB | -68.9 (>.999) | -68.3 (>.999) | -55.3 (>.999) | 1.037 | 0.200 | 1.047 | 0.129 | 9.221 As this is not-yet-published data I can't share it publicly, but if one of the devs shoot me a message/email I'll be happy to share along with some reproducible code. |
Alright, I'll have to share some data (here). Now I'm getting negative library(glmmTMB)
library(performance)
# See distributions in image below
df_weird <- read_csv('minimal_data.csv')
ggplot(df_weird, aes(value)) +
geom_histogram() +
facet_grid(
rows = vars(condition_b),
cols = vars(condition_a)
)
# Fit model
glmm_weird <- glmmTMB::glmmTMB(value ~ condition_a + condition_b + (1 | id),
data = df_weird,
family = beta_family)
# Check model
performance::check_model(glmm_weird)
# See model performance
performance::model_performance(glmm_weird)
# # Indices of model performance
#
# AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma
# ------------------------------------------------------------------------------------
# -1140.006 | -1139.720 | -1106.099 | 1.230 | 1.095 | -1.419 | 0.139 | 6.207 The data doesn't look too crazy: And the model check doesn't look completely awful - though notice the missing subplot: |
Hey there! Just wanted to check whether this is on the radar? :-) I know there's probably plenty of stuff to work on, I just think it would be worth some attention as this currently seems to be giving actual wrong output. |
Yes, it is :-) But indeed needs some more thorough digging into details. |
Perfect, just wanted to make sure. :-) Do you have any clue when it will be dug into roughly? Month, quarter, half a year? |
Seems that some issues could be fixed with #883, results look better now (though still, beta-family isn't well tested, also not in other packages) library(glmmTMB)
library(performance)
# See distributions in image below
df_weird <- readr::read_csv("c:/Users/Daniel/Downloads/minimal_data.csv")
#> Rows: 512 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): condition_a, condition_b
#> dbl (2): id, value
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. glmm_weird <- glmmTMB::glmmTMB(value ~ condition_a + condition_b + (1 | id),
data = df_weird,
family = beta_family
)
# Check model
performance::check_model(glmm_weird)
#> `check_outliers()` does not yet support models of class `glmmTMB`. # See model performance
performance::model_performance(glmm_weird)
#> # Indices of model performance
#>
#> AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma
#> -----------------------------------------------------------------------------------
#> -1140.006 | -1139.720 | -1106.099 | 1.000 | 0.890 | 1.000 | 0.139 | 6.207 Created on 2024-06-12 with reprex v2.1.0 |
Sounds great, looking forward to testing it out! Are the R2 and ICC in your new run-through realistic? I.e. wouldn't an ICC of 1.0 still be unrealistic? |
Yes, still found some minor issues. Now it's AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma
-----------------------------------------------------------------------------------
-1140.006 | -1139.720 | -1106.099 | 0.799 | 0.712 | 0.305 | 0.139 | 6.207 However, the same model with brms is: > brms::bayes_R2(m2)
Estimate Est.Error Q2.5 Q97.5
R2 0.380742 0.02984203 0.3199481 0.4369323 Thus, results still look too optimistic. |
But that seems to be ok. glmmTMB beta and ordered beta are a bit higher than the Bayesian equivalent: library(glmmTMB)
library(performance)
library(ordbetareg)
library(dplyr)
data("pew")
model_data <- select(pew, therm,
education = "F_EDUCCAT2_FINAL",
region = "F_CREGION_FINAL",
income = "F_INCOME_FINAL"
)
data("ord_fit_mean")
bayes_R2(ord_fit_mean)
#> Estimate Est.Error Q2.5 Q97.5
#> R2 0.04936185 0.007519559 0.03342839 0.06320446
model_data$therm2 <- datawizard::normalize(model_data$therm)
model_data$therm3 <- datawizard::normalize(model_data$therm, include_bounds = FALSE)
m1 <- glmmTMB(
therm2 ~ education + income + (1 | region),
data = model_data,
family = glmmTMB::ordbeta()
)
m2 <- glmmTMB(
therm3 ~ education + income + (1 | region),
data = model_data,
family = glmmTMB::beta_family()
)
r2_nakagawa(m1)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.103
#> Marginal R2: 0.100
r2_nakagawa(m2, tolerance = 1e-10)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.092
#> Marginal R2: 0.092 Created on 2024-06-12 with reprex v2.1.0 |
Absolutely, looks MUCH better! Looking forward to the release! :D |
Yes, this issue will be closed automatically once I merge the PR into the main branch. |
Sounds good. Well done, thanks for getting to the bottom of this! |
Which version is this on? I just installed the dev version 0.12.0.2, and get very different results from what you had above. R2 also seems off, but they are in agreement with I now get: > performance::model_performance(glmm_weird)
# Indices of model performance
AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma
-----------------------------------------------------------------------------------
-1140.006 | -1139.720 | -1106.099 | 1.092 | 0.972 | 4.323 | 0.139 | 6.207
> MuMIn::r.squaredGLMM(glmm_weird)
R2m R2c
[1,] 0.972239 1.092239
Warning messages:
1: In r.squaredGLMM.glmmTMB(glmm_weird) :
the effects of zero-inflation and dispersion model are ignored
2: Model's distribution-specific variance is negative. Results are not reliable. I've put both the data and script, including |
I just bumped versions, in order to avoid confusion. Try to update insight to 0.20.1.3 and performance to 0.12.0.3. I get "reasonable" results as these are not above 1. However, a similar Bayesian model returns R2 ~ 0.40. library(glmmTMB)
library(performance)
df_weird <- datawizard::data_read("c:/Users/mail/Downloads/minimal_data.csv")
# Fit model
glmm_weird <- glmmTMB::glmmTMB(value ~ condition_a + condition_b + (1 | id),
data = df_weird,
family = beta_family
)
performance::r2_nakagawa(glmm_weird)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.771
#> Marginal R2: 0.686
MuMIn::r.squaredGLMM(glmm_weird)
#> Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
#> Warning in r.squaredGLMM.glmmTMB(glmm_weird): the effects of zero-inflation and
#> dispersion model are ignored
#> R2m R2c
#> [1,] 0.6860056 0.770677
packageVersion("insight")
#> [1] '0.20.1.3' Created on 2024-06-14 with reprex v2.1.0 |
Sweet, I get the same now! Thanks! By the way, since you mention fitting a Bayesian model - is it possible to obtain ICC for such a model? (I just fitted with brms and find the same as you). |
Yes, although for Bayesian models, the |
Cool, thanks a lot, I'll dig into that a bit more! |
I'm currently performing repeatability analysis, mostly with the
rptR
package, however my data are proportions (not just ones and zeros) and is best modelled with a beta family. So I've been usingglmmTMB
with abeta_family
.To verify my repeatability estimates, I tested the
performance::icc()
function. However, for certain datasets I get ICCs above 1 - which makes me a bit suspicious about using the ICCs for beta in general.I also get the warnings
1: mu of 1.1 is too close to zero, estimate of random effect variances may be unreliable.
and2: Model's distribution-specific variance is negative. Results are not reliable.
. Themu
warning seems to show up for all the models I run using a beta family.I managed to create a MRE:
For my own analysis, I often get Adjusted ICC values in the 90's, though I'm a bit unsure of how to properly interpret them. I've read the relevant papers, but either I haven't found a concise formulation - or the values I'm getting are plain wrong and that's why things don't add up in my head (I think the Adjusted ICC is the amount of residual variance explained by the random effects after correcting for the fixed effects - is that way off?)
(PS I'm posting the issue here as I think the issue might stem from the
insight::compute_variances()
function)The text was updated successfully, but these errors were encountered: