ICC above 1 for `glmmTMB` with `beta_family` #664

roaldarbol · 2022-10-10T17:51:13Z

I'm currently performing repeatability analysis, mostly with the rptR package, however my data are proportions (not just ones and zeros) and is best modelled with a beta family. So I've been using glmmTMB with a beta_family.
To verify my repeatability estimates, I tested the performance::icc() function. However, for certain datasets I get ICCs above 1 - which makes me a bit suspicious about using the ICCs for beta in general.

I also get the warnings 1: mu of 1.1 is too close to zero, estimate of random effect variances may be unreliable. and 2: Model's distribution-specific variance is negative. Results are not reliable.. The mu warning seems to show up for all the models I run using a beta family.

I managed to create a MRE:

set.seed(123)
a <- seq(from = 5, to = 95)
b1 <- jitter(a, factor = 20)
b2 <- jitter(a, factor = 20)
b2 <- b2 + 30
b3 <- jitter(a, factor = 20)
b3 <- b3 + 30
t_a <- rep('a', length(a))
t_b <- rep('b', length(a))

c <- as_factor(a)
d <- tibble::tibble(id = c(c,c,c,c),
            value = c(a,b1,b2,b3),
            treatment = c(t_a, t_a, t_b, t_b))
d$value <- d$value / (max(d$value) + 0.1)
glmm_rpd <- glmmTMB::glmmTMB(value ~ treatment + (1 | id),
                              data = d,
                              family=beta_family)
performance::icc(glmm_rpd)
# # Intraclass Correlation Coefficient
#
#    Adjusted ICC: 1.001
#  Unadjusted ICC: 0.746
# Warning messages:
# 1: mu of 1.1 is too close to zero, estimate of random effect variances may be unreliable. 
# 2: Model's distribution-specific variance is negative. Results are not reliable.

For my own analysis, I often get Adjusted ICC values in the 90's, though I'm a bit unsure of how to properly interpret them. I've read the relevant papers, but either I haven't found a concise formulation - or the values I'm getting are plain wrong and that's why things don't add up in my head (I think the Adjusted ICC is the amount of residual variance explained by the random effects after correcting for the fixed effects - is that way off?)

(PS I'm posting the issue here as I think the issue might stem from the insight::compute_variances() function)

The text was updated successfully, but these errors were encountered:

roaldarbol · 2022-10-13T13:13:27Z

Just to add some information (in case it's helpful) If I try to do the same thing with a gaussian model I do not seem to get the error. And it seems to be the same thing that affects the R2 (or at least they're both affected).

glmm_gaussian <- glmmTMB::glmmTMB(total_scaled ~ time_of_day + sex + (1 | animal_id),
              data = paired_long_ld,
              family = gaussian)
glmm_beta <- glmmTMB::glmmTMB(total_scaled ~ time_of_day + sex + (1 | animal_id),
              data = paired_long_ld,
              family = beta_family)
performance::compare_performance(glmm_long_dd_a, glmm_long_dd_b)
# Some of the nested models seem to be identical and probably only vary in their random effects.
# # Comparison of Model Performance Indices
#
# Name           |   Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
# --------------------------------------------------------------------------------------------------------------------------
# glmm_gaussian | glmmTMB | -51.4 (<.001) |  -50.8 (<.001) | -37.8 (<.001) |      0.658 |      0.134 | 0.605 | 0.128 | 0.144
# glmm_beta     | glmmTMB | -68.9 (>.999) |  -68.3 (>.999) | -55.3 (>.999) |      1.037 |      0.200 | 1.047 | 0.129 | 9.221

As this is not-yet-published data I can't share it publicly, but if one of the devs shoot me a message/email I'll be happy to share along with some reproducible code.

roaldarbol · 2022-10-13T18:02:20Z

Alright, I'll have to share some data (here). Now I'm getting negative ICCs and +1 R2s. It's also worth noting that when I get these spurious values, I get nothing in the Homogeneity of Variance subplot in check_model(), in case that can help triage the issue.

library(glmmTMB)
library(performance)

# See distributions in image below
df_weird <- read_csv('minimal_data.csv')
ggplot(df_weird, aes(value)) +
  geom_histogram() +
  facet_grid(
    rows = vars(condition_b),
    cols = vars(condition_a)
  )

# Fit model
glmm_weird <- glmmTMB::glmmTMB(value ~ condition_a + condition_b + (1 | id),
              data = df_weird,
              family = beta_family)

# Check model
performance::check_model(glmm_weird)

# See model performance
performance::model_performance(glmm_weird)
# # Indices of model performance
#
# AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |    ICC |  RMSE | Sigma
# ------------------------------------------------------------------------------------
# -1140.006 | -1139.720 | -1106.099 |      1.230 |      1.095 | -1.419 | 0.139 | 6.207

The data doesn't look too crazy:

And the model check doesn't look completely awful - though notice the missing subplot:

roaldarbol · 2022-12-15T10:50:07Z

Hey there! Just wanted to check whether this is on the radar? :-) I know there's probably plenty of stuff to work on, I just think it would be worth some attention as this currently seems to be giving actual wrong output.

strengejacke · 2022-12-15T11:02:58Z

Yes, it is :-) But indeed needs some more thorough digging into details.

roaldarbol · 2022-12-15T11:15:03Z

Perfect, just wanted to make sure. :-) Do you have any clue when it will be dug into roughly? Month, quarter, half a year?

strengejacke · 2024-06-12T16:09:09Z

Seems that some issues could be fixed with #883, results look better now (though still, beta-family isn't well tested, also not in other packages)

library(glmmTMB)
library(performance)

# See distributions in image below
df_weird <- readr::read_csv("c:/Users/Daniel/Downloads/minimal_data.csv")
#> Rows: 512 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): condition_a, condition_b
#> dbl (2): id, value
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glmm_weird <- glmmTMB::glmmTMB(value ~ condition_a + condition_b + (1 | id),
                               data = df_weird,
                               family = beta_family
)

# Check model
performance::check_model(glmm_weird)
#> `check_outliers()` does not yet support models of class `glmmTMB`.

# See model performance
performance::model_performance(glmm_weird)
#> # Indices of model performance
#> 
#> AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
#> -----------------------------------------------------------------------------------
#> -1140.006 | -1139.720 | -1106.099 |      1.000 |      0.890 | 1.000 | 0.139 | 6.207

^{Created on 2024-06-12 with reprex v2.1.0}

roaldarbol · 2024-06-12T16:22:41Z

Sounds great, looking forward to testing it out! Are the R2 and ICC in your new run-through realistic? I.e. wouldn't an ICC of 1.0 still be unrealistic?

strengejacke · 2024-06-12T16:50:18Z

Yes, still found some minor issues. Now it's

AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma  
-----------------------------------------------------------------------------------  
-1140.006 | -1139.720 | -1106.099 |      0.799 |      0.712 | 0.305 | 0.139 | 6.207

However, the same model with brms is:

> brms::bayes_R2(m2)
   Estimate  Est.Error      Q2.5     Q97.5
R2 0.380742 0.02984203 0.3199481 0.4369323

Thus, results still look too optimistic.

strengejacke · 2024-06-12T16:55:40Z

But that seems to be ok. glmmTMB beta and ordered beta are a bit higher than the Bayesian equivalent:

library(glmmTMB)
library(performance)
library(ordbetareg)
library(dplyr)
data("pew")

model_data <- select(pew, therm,
  education = "F_EDUCCAT2_FINAL",
  region = "F_CREGION_FINAL",
  income = "F_INCOME_FINAL"
)

data("ord_fit_mean")

bayes_R2(ord_fit_mean)
#>      Estimate   Est.Error       Q2.5      Q97.5
#> R2 0.04936185 0.007519559 0.03342839 0.06320446

model_data$therm2 <- datawizard::normalize(model_data$therm)
model_data$therm3 <- datawizard::normalize(model_data$therm, include_bounds = FALSE)

m1 <- glmmTMB(
  therm2 ~ education + income + (1 | region),
  data = model_data,
  family = glmmTMB::ordbeta()
)

m2 <- glmmTMB(
  therm3 ~ education + income + (1 | region),
  data = model_data,
  family = glmmTMB::beta_family()
)

r2_nakagawa(m1)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.103
#>      Marginal R2: 0.100

r2_nakagawa(m2, tolerance = 1e-10)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.092
#>      Marginal R2: 0.092

^{Created on 2024-06-12 with reprex v2.1.0}

roaldarbol · 2024-06-12T17:11:25Z

Absolutely, looks MUCH better! Looking forward to the release! :D
If you agree, I'd say we can close this issue now as the completely unrealistic 1+ ICCs seem to be gone. We can could always make a new issue tracking how well estimates match/compare to other packages such as brms or MuMin.

strengejacke · 2024-06-12T17:18:20Z

Yes, this issue will be closed automatically once I merge the PR into the main branch.

roaldarbol · 2024-06-12T17:21:56Z

Sounds good. Well done, thanks for getting to the bottom of this!

roaldarbol · 2024-06-14T10:59:55Z

Which version is this on? I just installed the dev version 0.12.0.2, and get very different results from what you had above. R2 also seems off, but they are in agreement with MuMIn.

I now get:

> performance::model_performance(glmm_weird)
# Indices of model performance

AIC       |      AICc |       BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
-----------------------------------------------------------------------------------
-1140.006 | -1139.720 | -1106.099 |      1.092 |      0.972 | 4.323 | 0.139 | 6.207
> MuMIn::r.squaredGLMM(glmm_weird)
          R2m      R2c
[1,] 0.972239 1.092239

MuMIn gives me the warning:

Warning messages:
1: In r.squaredGLMM.glmmTMB(glmm_weird) :
  the effects of zero-inflation and dispersion model are ignored
2: Model's distribution-specific variance is negative. Results are not reliable.

I've put both the data and script, including renv for package versioning in https://github.com/roaldarbol/minimal-data

strengejacke · 2024-06-14T11:34:03Z

I just bumped versions, in order to avoid confusion. Try to update insight to 0.20.1.3 and performance to 0.12.0.3. I get "reasonable" results as these are not above 1. However, a similar Bayesian model returns R2 ~ 0.40.

library(glmmTMB)
library(performance)

df_weird <- datawizard::data_read("c:/Users/mail/Downloads/minimal_data.csv")

# Fit model
glmm_weird <- glmmTMB::glmmTMB(value ~ condition_a + condition_b + (1 | id),
  data = df_weird,
  family = beta_family
)

performance::r2_nakagawa(glmm_weird)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.771
#>      Marginal R2: 0.686

MuMIn::r.squaredGLMM(glmm_weird)
#> Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
#> Warning in r.squaredGLMM.glmmTMB(glmm_weird): the effects of zero-inflation and
#> dispersion model are ignored
#>            R2m      R2c
#> [1,] 0.6860056 0.770677

packageVersion("insight")
#> [1] '0.20.1.3'

^{Created on 2024-06-14 with reprex v2.1.0}

roaldarbol · 2024-06-14T14:00:58Z

Sweet, I get the same now! Thanks! By the way, since you mention fitting a Bayesian model - is it possible to obtain ICC for such a model? (I just fitted with brms and find the same as you).

strengejacke · 2024-06-14T14:25:01Z

Yes, although for Bayesian models, the variance_decomposition() is preferred (at least according to some people of the Stan team).

roaldarbol · 2024-06-14T14:55:54Z

Cool, thanks a lot, I'll dig into that a bit more!

strengejacke added Bug 🐛 Something isn't working 3 investigators ❔❓ Need to look further into this issue labels Oct 10, 2022

strengejacke added the get_variance function specific labels label Nov 18, 2022

strengejacke mentioned this issue Jan 2, 2023

Marginal R2 is a misnomer in brms easystats/performance#526

Closed

strengejacke mentioned this issue May 9, 2023

Problem with icc() when running glmmTMB and family = ordbeta() or beta_family() #764

Closed

strengejacke mentioned this issue May 20, 2024

r2_nakagawa and glmmTMB with beta_family easystats/performance#710

Closed

strengejacke mentioned this issue Jun 12, 2024

warnings/approximation of observation-level variance in .get_variance_distributional #883

Merged

5 tasks

strengejacke closed this as completed in #883 Jun 14, 2024

roaldarbol mentioned this issue Jul 5, 2024

ICC for beta_family not accurate easystats/performance#742

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ICC above 1 for `glmmTMB` with `beta_family` #664

ICC above 1 for `glmmTMB` with `beta_family` #664

roaldarbol commented Oct 10, 2022 •

edited

Loading

roaldarbol commented Oct 13, 2022 •

edited

Loading

roaldarbol commented Oct 13, 2022 •

edited

Loading

roaldarbol commented Dec 15, 2022

strengejacke commented Dec 15, 2022

roaldarbol commented Dec 15, 2022

strengejacke commented Jun 12, 2024

roaldarbol commented Jun 12, 2024

strengejacke commented Jun 12, 2024

strengejacke commented Jun 12, 2024 •

edited

Loading

roaldarbol commented Jun 12, 2024

strengejacke commented Jun 12, 2024

roaldarbol commented Jun 12, 2024

roaldarbol commented Jun 14, 2024

strengejacke commented Jun 14, 2024

roaldarbol commented Jun 14, 2024

strengejacke commented Jun 14, 2024

roaldarbol commented Jun 14, 2024

ICC above 1 for glmmTMB with beta_family #664

ICC above 1 for glmmTMB with beta_family #664

Comments

roaldarbol commented Oct 10, 2022 • edited Loading

roaldarbol commented Oct 13, 2022 • edited Loading

roaldarbol commented Oct 13, 2022 • edited Loading

roaldarbol commented Dec 15, 2022

strengejacke commented Dec 15, 2022

roaldarbol commented Dec 15, 2022

strengejacke commented Jun 12, 2024

roaldarbol commented Jun 12, 2024

strengejacke commented Jun 12, 2024

strengejacke commented Jun 12, 2024 • edited Loading

roaldarbol commented Jun 12, 2024

strengejacke commented Jun 12, 2024

roaldarbol commented Jun 12, 2024

roaldarbol commented Jun 14, 2024

strengejacke commented Jun 14, 2024

roaldarbol commented Jun 14, 2024

strengejacke commented Jun 14, 2024

roaldarbol commented Jun 14, 2024

ICC above 1 for `glmmTMB` with `beta_family` #664

ICC above 1 for `glmmTMB` with `beta_family` #664

roaldarbol commented Oct 10, 2022 •

edited

Loading

roaldarbol commented Oct 13, 2022 •

edited

Loading

roaldarbol commented Oct 13, 2022 •

edited

Loading

strengejacke commented Jun 12, 2024 •

edited

Loading