Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posterior predictive check for binomial glm with matrix response #644

Closed
richardjtelford opened this issue Oct 28, 2023 · 5 comments · Fixed by #645
Closed

posterior predictive check for binomial glm with matrix response #644

richardjtelford opened this issue Oct 28, 2023 · 5 comments · Fixed by #645
Assignees

Comments

@richardjtelford
Copy link

I've noticed that the posterior predictive check for a binomial glm is different depending on whether the response variable is a vector of proportions or a matrix of successes and failures.

Minimal reprex

tot <- rep(10, 100)
suc <- rbinom(100, prob = 0.9, size = tot)

df <- data.frame(tot, suc)

df$prop <- suc/tot

mod1 <- glm(cbind(suc, tot - suc) ~ 1 ,
            family = binomial,
            data = df)

#performance::check_model(mod1)
performance::check_posterior_predictions(mod1)

mod2 <- glm(prop ~ 1 ,
            family = binomial,
            data = df, 
            weights = tot)


performance::check_posterior_predictions(mod2)

I was, perhaps mistakenly, expecting these plots to look the same.

In performance::pp_check.glm, the result is calculated as

sapply(matrix_sim, function(i) i[, 1]/i[, 2], simplify = TRUE)

This is generating lots of Inf where the number of failures is zero.

Should this code not be calculating the proportion, with code such as

sapply(matrix_sim, function(i) i[, 1]/rowSums(i), simplify = TRUE)
@strengejacke

This comment was marked as outdated.

@richardjtelford
Copy link
Author

Thanks for having a look at this. I appear to have given a partial fix. Looking again line 288

  out$y <- response[, 1] / response[, 2]

would also need to become

  out$y <- response[, 1] / rowSums(response)

so that it was calculated in the same way

@strengejacke

This comment was marked as outdated.

@richardjtelford
Copy link
Author

Thank you. That looks how I would expect it to.

Would be nice to improve the x-axis label, but not sure what would work and be easy

@strengejacke
Copy link
Member

strengejacke commented Oct 29, 2023

After fixing a bug in insight, this is how it would look like with the current implementation, and your suggested fix.

set.seed(1)
tot <- rep(10, 100)
suc <- rbinom(100, prob = 0.9, size = tot)
df <- data.frame(tot, suc)
df$prop <- suc / tot

mod1 <- glm(cbind(suc, tot - suc) ~ 1,
  family = binomial,
  data = df
)

mod2 <- glm(prop ~ 1,
  family = binomial,
  data = df,
  weights = tot
)

mod3 <- glm(cbind(suc, tot) ~ 1,
  family = binomial,
  data = df
)

mod4 <- glm(am ~ 1,
  family = binomial,
  data = mtcars
)

Mod1

Curent (mod1)

New (mod1)

Mod2

Curent (mod2)

New (mod2)

Mod3

Curent (mod3)

New (mod3)

Mod4

Curent (mod4)

New (mod4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants