Warnings for binomial stats using frequency format

With the `binomial()` family, the response can be specified in three ways (see `?family`):
  1. Case/Factor: Rows indicating cases, response a factor indicator success/failure
  2. Frequency: Rows indicating trial arms, response numeric between 0 and 1 indicating proportion of success, weights indicating number of trials in the arm.
  3. Matrix: Rows indicating trial arms, response a two-column matrix, first column indicating number of successes, second column indicating number of failures.

These three specifications yield equivalent coefficients, standard errors, and log-likelihoods, but other overall model fit statistics (e.g., deviance, AIC, df.residual, etc.) aren't accurate because the sample size used for them is wrong. The correct sample sizes for the three cases would be:
  1. `nobs(model)`
  2. `sum(weights(model))`
  3. `sum(insight::get_response(model))`

Because of this, the results for `r2()` (and `insight::n_obs()`) aren't accurate when the second or third specifications are used. `r2()` errors when the third specification is used because it doesn't expect a multi-column response, but it returns inaccurate results silently when the second or third option is used (`r2_somers` fails for both—frequency form because the Hmisc function rejects nonbinary values; matrix form because it's not expecting a multi-column response).

Would it be possible, when `family(model)$family %in% c("binomial", "quasibinomial")` to check whether the response is given in frequency or matrix form, then either return an error/warning or instead compute the statistics using appropriate methods?

<details>
  <summary>Example showing errors with `r2_nagelkerke()`</summary>

```
contr <- rbinom(n = 100, size = 1, prob = .2)
treat <- rbinom(n = 100, size = 1, prob = .8)

dat_factor <- data.frame(
  group = factor(c(rep("c", length(contr)), rep("t", length(treat)))),
  outcome = c(contr, treat)
  )
dat_proportion <- data.frame(
  group = factor(c("c", "t")),
  size = c(length(contr), length(treat)),
  outcome_prop = c(sum(contr) / length(contr), sum(treat) / length(treat))
)
dat_matrix <- data.frame(
  group = factor(c("c", "t")),
  size = c(length(contr), length(treat)),
  success = c(sum(contr), sum(treat)),
  failure = c(length(contr) - sum(contr), length(treat) - sum(treat))
)

m_factor <- 
  glm(outcome ~ group,
      family = binomial(link = "logit"),
      data = dat_factor)
m_proportion <- 
  glm(outcome_prop ~ group,
      weights = size,
      family = binomial(link = "logit"),
      data = dat_proportion)
m_matrix <- 
  glm(cbind(success, failure) ~ group,
      family = binomial(link = "logit"),
      data = dat_matrix)

performance::performance::r2_nagelkerke(m_factor)
performance::performance::r2_nagelkerke(m_proportion)
performance::performance::r2_nagelkerke(m_matrix)
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Warnings for binomial stats using frequency format #165

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Warnings for binomial stats using frequency format #165

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions