-
-
Notifications
You must be signed in to change notification settings - Fork 103
Description
With the binomial() family, the response can be specified in three ways (see ?family):
- Case/Factor: Rows indicating cases, response a factor indicator success/failure
- Frequency: Rows indicating trial arms, response numeric between 0 and 1 indicating proportion of success, weights indicating number of trials in the arm.
- Matrix: Rows indicating trial arms, response a two-column matrix, first column indicating number of successes, second column indicating number of failures.
These three specifications yield equivalent coefficients, standard errors, and log-likelihoods, but other overall model fit statistics (e.g., deviance, AIC, df.residual, etc.) aren't accurate because the sample size used for them is wrong. The correct sample sizes for the three cases would be:
nobs(model)sum(weights(model))sum(insight::get_response(model))
Because of this, the results for r2() (and insight::n_obs()) aren't accurate when the second or third specifications are used. r2() errors when the third specification is used because it doesn't expect a multi-column response, but it returns inaccurate results silently when the second or third option is used (r2_somers fails for both—frequency form because the Hmisc function rejects nonbinary values; matrix form because it's not expecting a multi-column response).
Would it be possible, when family(model)$family %in% c("binomial", "quasibinomial") to check whether the response is given in frequency or matrix form, then either return an error/warning or instead compute the statistics using appropriate methods?
Example showing errors with `r2_nagelkerke()`
contr <- rbinom(n = 100, size = 1, prob = .2)
treat <- rbinom(n = 100, size = 1, prob = .8)
dat_factor <- data.frame(
group = factor(c(rep("c", length(contr)), rep("t", length(treat)))),
outcome = c(contr, treat)
)
dat_proportion <- data.frame(
group = factor(c("c", "t")),
size = c(length(contr), length(treat)),
outcome_prop = c(sum(contr) / length(contr), sum(treat) / length(treat))
)
dat_matrix <- data.frame(
group = factor(c("c", "t")),
size = c(length(contr), length(treat)),
success = c(sum(contr), sum(treat)),
failure = c(length(contr) - sum(contr), length(treat) - sum(treat))
)
m_factor <-
glm(outcome ~ group,
family = binomial(link = "logit"),
data = dat_factor)
m_proportion <-
glm(outcome_prop ~ group,
weights = size,
family = binomial(link = "logit"),
data = dat_proportion)
m_matrix <-
glm(cbind(success, failure) ~ group,
family = binomial(link = "logit"),
data = dat_matrix)
performance::performance::r2_nagelkerke(m_factor)
performance::performance::r2_nagelkerke(m_proportion)
performance::performance::r2_nagelkerke(m_matrix)