-
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICC for beta_family
not accurate
#742
Comments
The variance components for the beta-model don't look that bad, actually. library(glmmTMB)
library(performance)
df_ratio_episode <- data.frame(
animal_id = factor(
rep(
c(
"208", "209", "210", "214", "223", "228", "211", "213", "217", "222", "234",
"241", "216", "230", "231", "240", "242", "244", "218", "220", "225", "237",
"239", "219", "251", "252", "253", "254"
),
each = 2L
),
levels = c(
"200", "204", "205", "206", "215", "224", "208", "209", "210", "214", "223",
"228", "211", "213", "217", "222", "234", "241", "216", "230", "231", "240",
"242", "244", "218", "220", "225", "237", "239", "245", "219", "236", "251",
"252", "253", "254"
)
),
trial = rep(c(1, 2), 28),
activity_ratio = c(
0.1313027016689785, 0.08387917431645128, 0.1395420340967623,
0.09844057594710427, 0.19414443290359096, 0.16304581176275632,
0.17274983272168504, 0.17357956037939837, 0.09729583968716982,
0.05138063319955499, 0.14298075594540044, 0.10179701101266003,
0.09168390375802275, 0.11591243874797318, 0.2521345405747349,
0.16335726666875724, 0.13436311090275369, 0.12012636336085161,
0.13868852567209072, 0.12008249718946021, 0.27708418835127824,
0.22042035159734397, 0.2649703945513039, 0.22158610629846917,
0.2001770607989554, 0.2238562351804714, 0.1105503693420828,
0.08255349183783911, 0.21927303214082697, 0.22211274055043914,
0.10446530203550744, 0.11336175801811256, 0.0826812722435201,
0.09328851878674252, 0.13701773797551595, 0.1297098120849381,
0.05986226055235673, 0.14423247009476106, 0.19474645802355026,
0.1713563584485577, 0.25663498351317365, 0.30249307043720924,
0.09082761877930186, 0.10402396536249521, 0.21941679494558652,
0.28459112981037343, 0.11218161441362348, 0.12449715062493952,
0.18427917423975973, 0.14845015830783756, 0.19444224064643065,
0.13471565660441723, 0.11247341287367296, 0.08660523675310272,
0.1763980204528711, 0.1049572229068965
)
)
glmm_ratio_beta <- glmmTMB::glmmTMB(activity_ratio ~ trial + (1 | animal_id),
data = df_ratio_episode,
family = glmmTMB::beta_family
)
insight::get_variance(glmm_ratio_beta)
#> $var.fixed
#> [1] 0.003054337
#>
#> $var.random
#> [1] 0.160573
#>
#> $var.residual
#> [1] 1.92015
#>
#> $var.distribution
#> [1] 1.92015
#>
#> $var.dispersion
#> [1] 0
#>
#> $var.intercept
#> animal_id
#> 0.160573 glmm_ratio_gauss <- glmmTMB::glmmTMB(activity_ratio ~ trial + (1 | animal_id),
data = df_ratio_episode,
family = gaussian
)
insight::get_variance(glmm_ratio_gauss)
#> $var.fixed
#> [1] 4.886653e-05
#>
#> $var.random
#> [1] 0.00282474
#>
#> $var.residual
#> [1] 0.0007858734
#>
#> $var.distribution
#> [1] 0.0007858734
#>
#> $var.dispersion
#> [1] 0
#>
#> $var.intercept
#> animal_id
#> 0.00282474 Created on 2024-07-05 with reprex v2.1.0 |
The residual variance seems suspiciously high to me though. 🤔 |
We had this discussion before, but I can't find it right now. There seems to be a mismatch between the variance-function from > family(mod1)$variance
function (mu)
{
mu * (1 - mu)
} and what the docs suggest:
(which is The code base in insight is inconclusive, as well: # Get distributional variance for beta-family
# ----------------------------------------------
.variance_family_beta <- function(model, mu, phi) {
stats::family(model)$variance(mu)
# was:
# mu * (1 - mu) / (1 + phi)
# but that code is not what "glmmTMB" uses for the beta family
# mu * (1 - mu)
} Tagging @bbolker |
It's hard to find something to validate against. The current implementation in insight and performance yields results similar to betareg for your example in easystats/insight#664: library(glmmTMB)
set.seed(123)
a <- seq(from = 5, to = 95)
b1 <- jitter(a, factor = 20)
b2 <- jitter(a, factor = 20)
b2 <- b2 + 30
b3 <- jitter(a, factor = 20)
b3 <- b3 + 30
t_a <- rep('a', length(a))
t_b <- rep('b', length(a))
c <- as.factor(a)
d <- data.frame(id = c(c,c,c,c),
value = c(a,b1,b2,b3),
treatment = c(t_a, t_a, t_b, t_b))
d$value <- d$value / (max(d$value) + 0.1)
m <- glmmTMB::glmmTMB(value ~ treatment,
data = d,
family=beta_family)
performance::r2_nakagawa(m)
#> Random effect variances not available. Returned R2 does not account for random effects.
#> # R2 for Mixed Models
#>
#> Conditional R2: NA
#> Marginal R2: 0.289
m2 <- betareg::betareg(value ~ treatment, data = d)
summary(m2)$pseudo.r.squared
#> [1] 0.2434683 The next results (including m <- glmmTMB::glmmTMB(value ~ treatment + (1 | id),
data = d,
family=beta_family)
performance::r2_nakagawa(m) # 0.685
m2 <- betareg::betareg(value ~ treatment + id, data = d)
summary(m2)$pseudo.r.squared # 0.940 But... for the example in this issue, using |
I actually think the betareg version is on point - it helps to plot it out, and here you can really see that the ICC should be super high for the first example! library(glmmTMB)
#> Warning in checkDepPackageVersion(dep_pkg = "TMB"): Package version inconsistency detected.
#> glmmTMB was built with TMB version 1.9.11
#> Current TMB version is 1.9.14
#> Please re-install glmmTMB from source or restore original 'TMB' package (see '?reinstalling' for more information)
library(performance)
library(ggplot2)
library(paletteer)
set.seed(123)
a <- seq(from = 5, to = 95)
b1 <- jitter(a, factor = 20)
b2 <- jitter(a, factor = 20)
b2 <- b2 + 30
b3 <- jitter(a, factor = 20)
b3 <- b3 + 30
t_a <- rep('a', length(a))
t_b <- rep('b', length(a))
c <- as.factor(a)
d <- data.frame(id = c(c,c,c,c),
value = c(a,b1,b2,b3),
treatment = c(t_a, t_a, t_b, t_b))
d$value <- d$value / (max(d$value) + 0.1)
d |>
ggplot(aes(x = forcats::fct_reorder(id, value, .desc = TRUE),
y = value)) +
geom_line(aes(group = id), alpha = 0.5) +
geom_point() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text.x = element_blank()
) +
facet_grid(~ treatment) Created on 2024-07-05 with reprex v2.1.0 |
Did some more digging. :-) It seems that for the Gaussian family, the conditional R2 of a model without random effects pretty much equates the Adjusted ICC for a random effects model. For the beta family, that's not the case. library(glmmTMB)
#> Warning in checkDepPackageVersion(dep_pkg = "TMB"): Package version inconsistency detected.
#> glmmTMB was built with TMB version 1.9.11
#> Current TMB version is 1.9.14
#> Please re-install glmmTMB from source or restore original 'TMB' package (see '?reinstalling' for more information)
library(performance)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.21.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#>
#> Attaching package: 'brms'
#> The following object is masked from 'package:glmmTMB':
#>
#> lognormal
#> The following object is masked from 'package:stats':
#>
#> ar
library(betareg)
set.seed(123)
a <- seq(from = 5, to = 95)
b1 <- jitter(a, factor = 20)
b2 <- jitter(a, factor = 20)
b2 <- b2 + 30
b3 <- jitter(a, factor = 20)
b3 <- b3 + 30
t_a <- rep('a', length(a))
t_b <- rep('b', length(a))
c <- as.factor(a)
d <- data.frame(id = c(c,c,c,c),
value = c(a,b1,b2,b3),
treatment = c(t_a, t_a, t_b, t_b))
d$value <- d$value / (max(d$value) + 0.1)
# GLMMs and GLMs with beta and Gaussian families
glmm_beta <- glmmTMB::glmmTMB(value ~ treatment + (1|id),
data = d,
family=beta_family)
glm_beta <- betareg(value ~ treatment + id,
data = d)
glmm_gauss <- glmmTMB::glmmTMB(value ~ treatment + (1|id),
data = d,
family = gaussian)
glm_gauss <- glm(value ~ treatment + id,
data = d,
family = gaussian)
icc(glmm_beta)
#> # Intraclass Correlation Coefficient
#>
#> Adjusted ICC: 0.619
#> Unadjusted ICC: 0.511
r2(glmm_beta)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.685
#> Marginal R2: 0.174
r2(glm_beta)
#> # R2 for Beta Regression
#> Pseudo R2: 0.941
icc(glmm_gauss)
#> # Intraclass Correlation Coefficient
#>
#> Adjusted ICC: 0.995
#> Unadjusted ICC: 0.749
r2(glmm_gauss)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.996
#> Marginal R2: 0.247
r2(glm_gauss)
#> R2: 0.997 Created on 2024-07-05 with reprex v2.1.0 A side-note, might be for a different issue - it seems that bayes_beta <- brms::brm(value ~ treatment + (1|id),
data = d,
family=Beta)
#> Compiling Stan program...
#> I just removed all the compiling for convenience....
bayes_beta_fe <- brms::brm(value ~ treatment + id,
data = d,
family = Beta)
#> Compiling Stan program...
#> I just removed all the compiling for convenience....
variance_decomposition(bayes_beta)
#> # Random Effect Variances and ICC
#>
#> Conditioned on: all random effects
#>
#> ## Variance Ratio (comparable to ICC)
#> Ratio: 0.61 CI 95%: [0.58 0.65]
#>
#> ## Variances of Posterior Predicted Distribution
#> Conditioned on fixed effects: 0.02 CI 95%: [0.02 0.03]
#> Conditioned on rand. effects: 0.06 CI 95%: [0.06 0.06]
#>
#> ## Difference in Variances
#> Difference: 0.04 CI 95%: [0.03 0.04]
r2(bayes_beta)
#> # Bayesian R2 with Compatibility Interval
#>
#> Conditional R2: 0.979 (95% CI [0.976, 0.981])
#> Marginal R2: 0.326 (95% CI [0.311, 0.341])
r2(bayes_beta_fe)
#> # Bayesian R2 with Compatibility Interval
#>
#> Conditional R2: 0.979 (95% CI [0.976, 0.981])
variance_decomposition(bayes_gauss)
#> # Random Effect Variances and ICC
#>
#> Conditioned on: all random effects
#>
#> ## Variance Ratio (comparable to ICC)
#> Ratio: 0.75 CI 95%: [0.74 0.76]
#>
#> ## Variances of Posterior Predicted Distribution
#> Conditioned on fixed effects: 0.01 CI 95%: [0.01 0.01]
#> Conditioned on rand. effects: 0.06 CI 95%: [0.06 0.06]
#>
#> ## Difference in Variances
#> Difference: 0.04 CI 95%: [0.04 0.04]
r2(bayes_gauss)
#> # Bayesian R2 with Compatibility Interval
#>
#> Conditional R2: 0.996 (95% CI [0.996, 0.996])
#> Marginal R2: 0.247 (95% CI [0.242, 0.252])
r2(bayes_gauss_fe)
#> # Bayesian R2 with Compatibility Interval
#>
#> Conditional R2: 0.996 (95% CI [0.996, 0.996]) |
betareg computes the R2 as power of correlation between eta and linkfun(y) ( |
This is definitely a definition of R^2, closest to Efron's pseudo-R^2 (although Efron computes correlation on the response scale before squaring it ...) It would be interesting to see where |
A quick search in the related publication (https://www.jstatsoft.org/article/view/v034i02) has only one match for "R2". Maybe elsewhere it's described in more detail. May I also point out to this comment: #742 (comment) What's your opinion on this? |
But what would be "the linear predictor for the mean"? Maybe that would help improving |
@strengejacke For the uninitiated, could you just write a line or two about why we talk about R2 almost exclusively when the issue is on ICC - I imagine there's a mathematical connection is implicit, but it's not entirely clear. :-) Also, feel free to change the issue title to reflect that it's an issue that affects both ICC and R2 for the beta family if they're two parts of the same puzzle. 😊 |
A bit out of order in the thread, but responding to the comment above about the mismatch between the variance function (
This is for consistency with the way that |
Thanks for clarification!
Yes, definitely, it's just I couldn't find the thread/post. :-/ |
Sure. ICC is relevant for mixed models only. The ICC is calculated by dividing the random effect variance, σ2i, by the total variance, i.e. the sum of the random effect variance and the residual variance, σ2ε. The R2 for mixed models are calculated this way:
The main point is that for both ICC and R2 in mixed models, we need the different variance components. That's why both are related. |
Thanks for the clarification, that helps. Just tested out the latest changes - Ferrari's R2 looks more in line with what I'd expect. Will this eventually change the ICC too? A few notes on consistency as of the latest commit:
library(glmmTMB)
#> Warning in checkDepPackageVersion(dep_pkg = "TMB"): Package version inconsistency detected.
#> glmmTMB was built with TMB version 1.9.11
#> Current TMB version is 1.9.14
#> Please re-install glmmTMB from source or restore original 'TMB' package (see '?reinstalling' for more information)
library(performance)
library(betareg)
set.seed(123)
a <- seq(from = 5, to = 95)
b1 <- jitter(a, factor = 20)
b2 <- jitter(a, factor = 20)
b2 <- b2 + 30
b3 <- jitter(a, factor = 20)
b3 <- b3 + 30
t_a <- rep('a', length(a))
t_b <- rep('b', length(a))
c <- as.factor(a)
d <- data.frame(id = c(c,c,c,c),
value = c(a,b1,b2,b3),
treatment = c(t_a, t_a, t_b, t_b))
d$value <- d$value / (max(d$value) + 0.1)
# Specify models
glmm_beta <- glmmTMB::glmmTMB(value ~ treatment + (1|id),
data = d,
family=beta_family)
glm_betareg <- betareg(value ~ treatment + id,
data = d)
glm_betafamily <- glm(value ~ treatment + id,
data = d,
family = beta_family)
glmm_gauss <- glmmTMB::glmmTMB(value ~ treatment + (1|id),
data = d,
family = gaussian)
glm_gauss <- glm(value ~ treatment + id,
data = d,
family = gaussian)
# ICC
icc(glmm_beta)
#> # Intraclass Correlation Coefficient
#>
#> Adjusted ICC: 0.619
#> Unadjusted ICC: 0.511
icc(glmm_gauss)
#> # Intraclass Correlation Coefficient
#>
#> Adjusted ICC: 0.995
#> Unadjusted ICC: 0.749
# ICC Bootstrapped
icc(glmm_beta, method = "boot", ci = 0.95)
#> # Intraclass Correlation Coefficient
#>
#> Adjusted ICC: 0.619 [1.000, 1.000]
#> Unadjusted ICC: 0.511 [0.660, 0.785]
icc(glmm_gauss, method = "boot", ci = 0.95)
#> # Intraclass Correlation Coefficient
#>
#> Adjusted ICC: 0.995 [0.993, 0.996]
#> Unadjusted ICC: 0.749 [0.693, 0.795]
# R2 - default
r2(glmm_beta)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.685
#> Marginal R2: 0.174
r2(glm_betareg)
#> # R2 for Beta Regression
#> Pseudo R2: 0.941
r2(glm_betafamily)
#> # R2 for Generalized Linear Regression
#> Nagelkerke's R2: NaN
r2(glmm_gauss)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.996
#> Marginal R2: 0.247
r2(glm_gauss)
#> R2: 0.997
# R2 - Ferrari
r2_ferrari(glmm_beta)
#> # R2 for Generalized Linear Regression
#> Ferrari's R2: 0.941
r2_ferrari(glm_betareg)
#> # R2 for Generalized Linear Regression
#> Ferrari's R2: 0.941
r2_ferrari(glm_betafamily)
#> # R2 for Generalized Linear Regression
#> Ferrari's R2: 0.929
# R2 Bootstrapped
r2(glmm_beta, method = "boot", ci = 0.95)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.685 [1.000, 1.000]
#> Marginal R2: 0.174 [0.214, 0.326]
r2(glm_betareg, method = "boot", ci = 0.95)
#> # R2 for Beta Regression
#> Pseudo R2: 0.941
r2(glm_betafamily, method = "boot", ci = 0.95)
#> Error in stats::integrate(.dRsq, i, j, R2_pop = dots$R2_pop, R2_obs = dots$R2_obs, : non-finite function value
r2(glmm_gauss, method = "boot", ci = 0.95)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.996 [0.995, 0.997]
#> Marginal R2: 0.247 [0.198, 0.302]
r2(glm_gauss, method = "boot", ci = 0.95)
#> R2: 0.997 [0.995, 0.997] Created on 2024-07-08 with reprex v2.1.0 |
@bbolker: That means, in |
Yes, the conditional variance should definitely be divided by library(glmmTMB)
set.seed(101)
dd <- data.frame(x=rnorm(100))
## (I was originally planning to use an x covariate, now ignored)
dd$y <- simulate_new(~ 1, family = beta_family, newdata = dd,
newparams = list(beta = c(1), betadisp = 2))[[1]]
m <- glmmTMB(y~1, family = beta_family, data = dd)
mu <- predict(m, type = "response")
var1 <- family(m)$var(mu)/(1+sigma(m)) ## 0.0187
var(dd$y) ## 0.0193 |
What about orderedbeta family? |
* Fix conditional distribution of Beta family easystats/performance#742 * remove test
It's now implemented for glm (see easystats/insight@887bb59). For glmmTMB,
Only if you fit a mixed model for glmmTMB. For non-mixed models, results are identical.
Yeah, not sure about this one. Will look into it.
Bootstrapping is not implemented for R2 for non-mixed models. You should be able to update insight, which now finally is supposed to return the correct ICC/R2 for mixed models from the beta-family. |
Actually the other way around - the mixed model matches the betareg model, the GLM is different. See the above (quoted below).
Works really well now!!! 🥳🥳🥳 |
Two questions: does this work for you? I get: library(glmmTMB)
set.seed(101)
dd <- data.frame(x=rnorm(100))
## (I was originally planning to use an x covariate, now ignored)
dd$y <- simulate_new(~ 1, family = beta_family, newdata = dd,
newparams = list(beta = c(1), betadisp = 2))[[1]]
#> Error in eval(family$initialize): y values must be 0 < y < 1 Second is #742 (comment). |
|
For the confidence intervals, it seems that none of the
(Installed both {performance} and {insight} from the latest commits, 06ed6f1 and easystats/insight@d346f95) |
Thanks! Will open a new issue and close this as resolved for now. |
In case anyone tries icc for beta family and thinks they're weird - I think so too, they are quite high, even when you draw two completely unrelated samples from a beta distribution (like 0.4-0.5). Would love to test it thoroughly, but don't have the time currently. But I posted an example that could be used to dig in here. |
The ICC generated from
icc()
for a glmmTMB model with abeta_family()
family seems suspiciously low now. Compared to either the variance decomposition of a similar Bayesian model, or the same model specified with agaussian()
family (the Bayesian and Gaussian seem quite in agreement).So this is kind of a continuation of easystats/insight#664, but decided to create a new issue rather than re-open, as there seems to be an extra issue with the bootstrapping in
icc()
. Maybe the new {insight} developments have not made their way into the bootstrapping, IDK. Currently, the estimate is fery low, but the CI is quite high, so the estimate falls well outside the CI.The reprex is run on the latest commit into main in {performance}.
Here's a reprex:
Created on 2024-07-05 with reprex v2.1.0
The text was updated successfully, but these errors were encountered: