Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICC above 1 for glmmTMB with beta_family #664

Closed
roaldarbol opened this issue Oct 10, 2022 · 17 comments · Fixed by #883
Closed

ICC above 1 for glmmTMB with beta_family #664

roaldarbol opened this issue Oct 10, 2022 · 17 comments · Fixed by #883
Labels
3 investigators ❔❓ Need to look further into this issue Bug 🐛 Something isn't working get_variance function specific labels

Comments

@roaldarbol
Copy link

roaldarbol commented Oct 10, 2022

I'm currently performing repeatability analysis, mostly with the rptR package, however my data are proportions (not just ones and zeros) and is best modelled with a beta family. So I've been using glmmTMB with a beta_family.
To verify my repeatability estimates, I tested the performance::icc() function. However, for certain datasets I get ICCs above 1 - which makes me a bit suspicious about using the ICCs for beta in general.

I also get the warnings 1: mu of 1.1 is too close to zero, estimate of random effect variances may be unreliable. and 2: Model's distribution-specific variance is negative. Results are not reliable.. The mu warning seems to show up for all the models I run using a beta family.

I managed to create a MRE:

set.seed(123)
a <- seq(from = 5, to = 95)
b1 <- jitter(a, factor = 20)
b2 <- jitter(a, factor = 20)
b2 <- b2 + 30
b3 <- jitter(a, factor = 20)
b3 <- b3 + 30
t_a <- rep('a', length(a))
t_b <- rep('b', length(a))

c <- as_factor(a)
d <- tibble::tibble(id = c(c,c,c,c),
            value = c(a,b1,b2,b3),
            treatment = c(t_a, t_a, t_b, t_b))
d$value <- d$value / (max(d$value) + 0.1)
glmm_rpd <- glmmTMB::glmmTMB(value ~ treatment + (1 | id),
                              data = d,
                              family=beta_family)
performance::icc(glmm_rpd)
# # Intraclass Correlation Coefficient
#
#    Adjusted ICC: 1.001
#  Unadjusted ICC: 0.746
# Warning messages:
# 1: mu of 1.1 is too close to zero, estimate of random effect variances may be unreliable. 
# 2: Model's distribution-specific variance is negative. Results are not reliable. 

For my own analysis, I often get Adjusted ICC values in the 90's, though I'm a bit unsure of how to properly interpret them. I've read the relevant papers, but either I haven't found a concise formulation - or the values I'm getting are plain wrong and that's why things don't add up in my head (I think the Adjusted ICC is the amount of residual variance explained by the random effects after correcting for the fixed effects - is that way off?)

(PS I'm posting the issue here as I think the issue might stem from the insight::compute_variances() function)

@strengejacke strengejacke added Bug 🐛 Something isn't working 3 investigators ❔❓ Need to look further into this issue labels Oct 10, 2022
@roaldarbol
Copy link
Author

roaldarbol commented Oct 13, 2022

Just to add some information (in case it's helpful) If I try to do the same thing with a gaussian model I do not seem to get the error. And it seems to be the same thing that affects the R2 (or at least they're both affected).

glmm_gaussian <- glmmTMB::glmmTMB(total_scaled ~ time_of_day + sex + (1 | animal_id),
              data = paired_long_ld,
              family = gaussian)
glmm_beta <- glmmTMB::glmmTMB(total_scaled ~ time_of_day + sex + (1 | animal_id),
              data = paired_long_ld,
              family = beta_family)
performance::compare_performance(glmm_long_dd_a, glmm_long_dd_b)
# Some of the nested models seem to be identical and probably only vary in their random effects.
# # Comparison of Model Performance Indices
#
# Name           |   Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
# --------------------------------------------------------------------------------------------------------------------------
# glmm_gaussian | glmmTMB | -51.4 (<.001) |  -50.8 (<.001) | -37.8 (<.001) |      0.658 |      0.134 | 0.605 | 0.128 | 0.144
# glmm_beta     | glmmTMB | -68.9 (>.999) |  -68.3 (>.999) | -55.3 (>.999) |      1.037 |      0.200 | 1.047 | 0.129 | 9.221

As this is not-yet-published data I can't share it publicly, but if one of the devs shoot me a message/email I'll be happy to share along with some reproducible code.

@roaldarbol
Copy link
Author

roaldarbol commented Oct 13, 2022

Alright, I'll have to share some data (here). Now I'm getting negative ICCs and +1 R2s. It's also worth noting that when I get these spurious values, I get nothing in the Homogeneity of Variance subplot in check_model(), in case that can help triage the issue.

library(glmmTMB)
library(performance)

# See distributions in image below
df_weird <- read_csv('minimal_data.csv')
ggplot(df_weird, aes(value)) +
  geom_histogram() +
  facet_grid(
    rows = vars(condition_b),
    cols = vars(condition_a)
  )

# Fit model
glmm_weird <- glmmTMB::glmmTMB(value ~ condition_a + condition_b + (1 | id),
              data = df_weird,
              family = beta_family)

# Check model
performance::check_model(glmm_weird)

# See model performance
performance::model_performance(glmm_weird)
# # Indices of model performance
#
# AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |    ICC |  RMSE | Sigma
# ------------------------------------------------------------------------------------
# -1140.006 | -1139.720 | -1106.099 |      1.230 |      1.095 | -1.419 | 0.139 | 6.207

The data doesn't look too crazy:
Screenshot 2022-10-13 at 18 57 48

And the model check doesn't look completely awful - though notice the missing subplot:
Screenshot 2022-10-13 at 18 57 23

@strengejacke strengejacke added the get_variance function specific labels label Nov 18, 2022
@roaldarbol
Copy link
Author

Hey there! Just wanted to check whether this is on the radar? :-) I know there's probably plenty of stuff to work on, I just think it would be worth some attention as this currently seems to be giving actual wrong output.

@strengejacke
Copy link
Member

Yes, it is :-) But indeed needs some more thorough digging into details.

@roaldarbol
Copy link
Author

Perfect, just wanted to make sure. :-) Do you have any clue when it will be dug into roughly? Month, quarter, half a year?

@strengejacke
Copy link
Member

Seems that some issues could be fixed with #883, results look better now (though still, beta-family isn't well tested, also not in other packages)

library(glmmTMB)
library(performance)

# See distributions in image below
df_weird <- readr::read_csv("c:/Users/Daniel/Downloads/minimal_data.csv")
#> Rows: 512 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): condition_a, condition_b
#> dbl (2): id, value
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glmm_weird <- glmmTMB::glmmTMB(value ~ condition_a + condition_b + (1 | id),
                               data = df_weird,
                               family = beta_family
)

# Check model
performance::check_model(glmm_weird)
#> `check_outliers()` does not yet support models of class `glmmTMB`.

# See model performance
performance::model_performance(glmm_weird)
#> # Indices of model performance
#> 
#> AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
#> -----------------------------------------------------------------------------------
#> -1140.006 | -1139.720 | -1106.099 |      1.000 |      0.890 | 1.000 | 0.139 | 6.207

Created on 2024-06-12 with reprex v2.1.0

@roaldarbol
Copy link
Author

Sounds great, looking forward to testing it out! Are the R2 and ICC in your new run-through realistic? I.e. wouldn't an ICC of 1.0 still be unrealistic?

@strengejacke
Copy link
Member

Yes, still found some minor issues. Now it's

AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma  
-----------------------------------------------------------------------------------  
-1140.006 | -1139.720 | -1106.099 |      0.799 |      0.712 | 0.305 | 0.139 | 6.207

However, the same model with brms is:

> brms::bayes_R2(m2)
   Estimate  Est.Error      Q2.5     Q97.5
R2 0.380742 0.02984203 0.3199481 0.4369323

Thus, results still look too optimistic.

@strengejacke
Copy link
Member

strengejacke commented Jun 12, 2024

But that seems to be ok. glmmTMB beta and ordered beta are a bit higher than the Bayesian equivalent:

library(glmmTMB)
library(performance)
library(ordbetareg)
library(dplyr)
data("pew")

model_data <- select(pew, therm,
  education = "F_EDUCCAT2_FINAL",
  region = "F_CREGION_FINAL",
  income = "F_INCOME_FINAL"
)

data("ord_fit_mean")

bayes_R2(ord_fit_mean)
#>      Estimate   Est.Error       Q2.5      Q97.5
#> R2 0.04936185 0.007519559 0.03342839 0.06320446

model_data$therm2 <- datawizard::normalize(model_data$therm)
model_data$therm3 <- datawizard::normalize(model_data$therm, include_bounds = FALSE)

m1 <- glmmTMB(
  therm2 ~ education + income + (1 | region),
  data = model_data,
  family = glmmTMB::ordbeta()
)

m2 <- glmmTMB(
  therm3 ~ education + income + (1 | region),
  data = model_data,
  family = glmmTMB::beta_family()
)

r2_nakagawa(m1)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.103
#>      Marginal R2: 0.100

r2_nakagawa(m2, tolerance = 1e-10)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.092
#>      Marginal R2: 0.092

Created on 2024-06-12 with reprex v2.1.0

@roaldarbol
Copy link
Author

Absolutely, looks MUCH better! Looking forward to the release! :D
If you agree, I'd say we can close this issue now as the completely unrealistic 1+ ICCs seem to be gone. We can could always make a new issue tracking how well estimates match/compare to other packages such as brms or MuMin.

@strengejacke
Copy link
Member

Yes, this issue will be closed automatically once I merge the PR into the main branch.

@roaldarbol
Copy link
Author

Sounds good. Well done, thanks for getting to the bottom of this!

@roaldarbol
Copy link
Author

Which version is this on? I just installed the dev version 0.12.0.2, and get very different results from what you had above. R2 also seems off, but they are in agreement with MuMIn.

I now get:

> performance::model_performance(glmm_weird)
# Indices of model performance

AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
-----------------------------------------------------------------------------------
-1140.006 | -1139.720 | -1106.099 |      1.092 |      0.972 | 4.323 | 0.139 | 6.207
> MuMIn::r.squaredGLMM(glmm_weird)
          R2m      R2c
[1,] 0.972239 1.092239

MuMIn gives me the warning:

Warning messages:
1: In r.squaredGLMM.glmmTMB(glmm_weird) :
  the effects of zero-inflation and dispersion model are ignored
2: Model's distribution-specific variance is negative. Results are not reliable. 

I've put both the data and script, including renv for package versioning in https://github.com/roaldarbol/minimal-data

@strengejacke
Copy link
Member

I just bumped versions, in order to avoid confusion. Try to update insight to 0.20.1.3 and performance to 0.12.0.3. I get "reasonable" results as these are not above 1. However, a similar Bayesian model returns R2 ~ 0.40.

library(glmmTMB)
library(performance)

df_weird <- datawizard::data_read("c:/Users/mail/Downloads/minimal_data.csv")

# Fit model
glmm_weird <- glmmTMB::glmmTMB(value ~ condition_a + condition_b + (1 | id),
  data = df_weird,
  family = beta_family
)

performance::r2_nakagawa(glmm_weird)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.771
#>      Marginal R2: 0.686

MuMIn::r.squaredGLMM(glmm_weird)
#> Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
#> Warning in r.squaredGLMM.glmmTMB(glmm_weird): the effects of zero-inflation and
#> dispersion model are ignored
#>            R2m      R2c
#> [1,] 0.6860056 0.770677

packageVersion("insight")
#> [1] '0.20.1.3'

Created on 2024-06-14 with reprex v2.1.0

@roaldarbol
Copy link
Author

Sweet, I get the same now! Thanks! By the way, since you mention fitting a Bayesian model - is it possible to obtain ICC for such a model? (I just fitted with brms and find the same as you).

@strengejacke
Copy link
Member

Yes, although for Bayesian models, the variance_decomposition() is preferred (at least according to some people of the Stan team).

@roaldarbol
Copy link
Author

Cool, thanks a lot, I'll dig into that a bit more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 investigators ❔❓ Need to look further into this issue Bug 🐛 Something isn't working get_variance function specific labels
Projects
None yet
2 participants