Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLM quasi-Poisson doesn't detect overdispersed data #20

Closed
aldomann opened this issue Oct 12, 2020 · 1 comment
Closed

GLM quasi-Poisson doesn't detect overdispersed data #20

aldomann opened this issue Oct 12, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@aldomann
Copy link
Member

Bug report/feedback

  • Version: 3.4.0
  • Assessment: Dicentrics
  • Module: Fitting

We were testing Biodose Tools on overdispersed data and a linear model, choosing quasi-Poisson as the model family. See snippet below to reproduce the count data calculations.

library(dplyr)

count_data <- data.frame(
  D = c(0.02, 0.05, 0.1, 0.2, 0.3, 0.5, 0.7, 1),
  C0 = c(2000, 2043, 1475, 1494, 724, 463, 579, 203),
  C1 = c(8, 19, 24, 44, 37, 27, 74, 35),
  C2 = c(3, 4, 7, 15, 9, 9, 19, 7),
  C3 = c(0, 1, 2, 3, 3, 3, 9, 3),
  C4 = c(0, 0, 0, 1, 1, 1, 3, 2),
  C5 = c(0, 0, 0, 0, 0, 0, 2, 1)
) %>%
  dplyr::mutate(
    N = as.integer(rowSums(.[grep("C", names(.))])),
    X = biodosetools::calculate_aberr_power(., power = 1),
    X2 = biodosetools::calculate_aberr_power(., power = 2),
    mean = biodosetools::calculate_aberr_mean(X, N),
    var = biodosetools::calculate_aberr_var(X, X2, N),
    DI = biodosetools::calculate_aberr_disp_index(mean, var),
    u = biodosetools::calculate_aberr_u_value(X, N, mean, var, assessment_u = 1.00)
  )

count_data$DI
#>       V1       V2       V3       V4       V5       V6       V7       V8 
#> 1.422317 1.452856 1.562768 1.634828 1.620120 1.715689 1.811425 1.794704

The data is clearly overdispersed, and such a quasi-Poisson model should be used ("Automatic" should give the same results, as it runs the same routine as quasi-Poisson).

However, the glm model dispersion is lower than 1, and thus the Poisson family is used instead. Below we can see a snippet to replicate the calculations done when running biodosetools::get_fit_glm_method():

doses <- count_data[["D"]]
aberr <- count_data[["X"]]
cells <- count_data[["N"]]
disp <- count_data[["DI"]]
coeff_C <- cells
coeff_alpha <- cells * doses
coeff_beta <- cells * doses * doses
model_data <- list(coeff_C = coeff_C, coeff_alpha = coeff_alpha, coeff_beta = coeff_beta, aberr = aberr)
data_weights <- 1 / disp

fit_debug <- stats::glm(
  formula = as.formula("aberr ~ -1 + coeff_C + coeff_alpha"),
  family = stats::quasipoisson(link = "identity"),
  weights = data_weights,
  data = model_data
)

summary(fit_debug)$dispersion
#> [1] 0.652433

My impression is that using the weights parameter "corrects" the overdispersion (not using it reports a dispersion of 1.140056), and therefore the app will never choose a quasi-Poisson model.

Probably related to #14.

@aldomann aldomann added the bug Something isn't working label Oct 12, 2020
@aldomann
Copy link
Member Author

Note that to run the count_data snippet, {biodosetools} needs to installed from the master branch (at least 273ac40), as the function calculate_aberr_var() was not being exported to the namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant