Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test get_variance for zero-inflation models #893

Merged
merged 22 commits into from
Jun 20, 2024
Merged

Test get_variance for zero-inflation models #893

merged 22 commits into from
Jun 20, 2024

Conversation

strengejacke
Copy link
Member

@strengejacke strengejacke commented Jun 14, 2024

Related to #889

TL;DR

We have two questions:

  1. How to calculate the distribution-specific variance for zero-inflated models? (or at least: what's more accurate, see this comment
  2. Is it expected that the R2 is lower for zero-inflated models, when the full model (zero-inflation and conditional part) is taken into account (as opposed to the conditional component of zero-inflated models only, see following example)?

You can ignore the detailed changes in this PR - just the comments and examples shown here are relevant for the discussion!

What did this PR change?

In the current main branch, the dispersion parameter for Poisson and ZI Poisson is set to 1. This PR changes the behaviour for ZI Poisson, returning a different dispersion / variance:

# For zero-inflated poisson models, the
# distributional variance is based on Zuur et al. 2012
# ----------------------------------------------
.variance_zip <- function(model, faminfo, family_var) {
  if (inherits(model, "glmmTMB")) {
    p <- stats::predict(model, type = "zprob")
    mu <- stats::predict(model, type = "conditional")
    pvar <- (1 - p) * (mu + p * mu^2)
  } else if (inherits(model, "MixMod")) {
    p <- stats::plogis(stats::predict(model, type_pred = "link", type = "zero_part"))
    mu <- suppressWarnings(stats::predict(model, type = "mean_subject"))
    pvar <- (1 - p) * (mu + p * mu^2)
  } else {
    pvar <- family_var
  }

  mean(pvar)
}

Taking following model, for this particular example, this PR comes closer to a Bayesian model than insight from the main branch.

m <- glmmTMB::glmmTMB(count ~ mined + cover + (1 + cover | site),
  ziformula = ~mined,
  family = poisson(), data = Salamanders
)
performance::r2_nakagawa(m)

Current main branch returns

# R2 for Mixed Models

  Conditional R2: 0.650
     Marginal R2: 0.525

This PR returns

# R2 for Mixed Models

  Conditional R2: 0.414
     Marginal R2: 0.334

brms returns (marginal R2 only)

library(brms)
m2 <- brms::brm(bf(count ~ mined + (1 | site), zi ~mined),
  family = brms::zero_inflated_poisson(), data = Salamanders, backend = "rstan"
)
brms::bayes_R2(m2)
    Estimate  Est.Error      Q2.5     Q97.5
R2 0.1686378 0.01874732 0.1334217 0.2070608

Any ideas how to validate the results? @bbolker @bwiernik ?

@strengejacke
Copy link
Member Author

What would be expected as R2 result for zero inflated models in comparison to non zero inflated? Would R2 be higher or lower for simple Poisson as opposed to zero inflated Poisson, for example?

@strengejacke
Copy link
Member Author

strengejacke commented Jun 18, 2024

Revisiting this code, the question is whether the last line should be mean() or var()?

# For zero-inflated poisson models, the
# distributional variance is based on Zuur et al. 2012
# ----------------------------------------------
.variance_zip <- function(model, faminfo, family_var) {
  if (inherits(model, "glmmTMB")) {
    p <- stats::predict(model, type = "zprob")
    mu <- stats::predict(model, type = "conditional")
    pvar <- (1 - p) * (mu + p * mu^2)
  }
  mean(pvar) # var() better here?
}

Using mean()

# using mean()
data(fish, package = "insight")
model <- glmmTMB::glmmTMB(
  count ~ child + camper + (1 | persons),
  ziformula = ~ child + camper,
  data = fish,
  family = poisson()
)
insight::get_variance(model)
#> $var.fixed
#> [1] 1.141135
#> 
#> $var.random
#> [1] 1.133478
#> 
#> $var.residual
#> [1] 1.03198
#> 
#> $var.distribution
#> [1] 1.03198
#> 
#> $var.dispersion
#> [1] 0
#> 
#> $var.intercept
#>  persons 
#> 1.133478

performance::r2_nakagawa(model)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.688
#>      Marginal R2: 0.345

Using var()

# using var()
data(fish, package = "insight")
model <- glmmTMB::glmmTMB(
  count ~ child + camper + (1 | persons),
  ziformula = ~ child + camper,
  data = fish,
  family = poisson()
)
insight::get_variance(model)
#> $var.fixed
#> [1] 1.141135
#> 
#> $var.random
#> [1] 1.133478
#> 
#> $var.residual
#> [1] 4.911707
#> 
#> $var.distribution
#> [1] 4.911707
#> 
#> $var.dispersion
#> [1] 0
#> 
#> $var.intercept
#>  persons 
#> 1.133478

performance::r2_nakagawa(model)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.317
#>      Marginal R2: 0.159

The same model fitted with brms returns (using bayes_R2()) a conditional R2 of ~0.20 (and marginal < 0.1).

@strengejacke strengejacke added the 3 investigators ❔❓ Need to look further into this issue label Jun 18, 2024
@strengejacke
Copy link
Member Author

I think, mean() is better. The important parameters are mu and p. When we take the average mu and p from the predictions, it is similar to taking the mean() at the end.

p <- stats::predict(model, type = "zprob")
mu <- stats::predict(model, type = "conditional")
pvar <- (1 - p) * (mu + p * mu^2)
mean(pvar)

# similar to
p <- mean(stats::predict(model, type = "zprob"))
mu <- mean(stats::predict(model, type = "conditional"))
pvar <- (1 - p) * (mu + p * mu^2)
pvar

@strengejacke strengejacke merged commit 9759453 into main Jun 20, 2024
18 of 26 checks passed
@strengejacke strengejacke deleted the var_zi branch June 20, 2024 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 investigators ❔❓ Need to look further into this issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant