-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return standard error of coefficient estimates #12
Comments
Last week, I implemented a library(glmGamPoi)
data <- data.frame(fav_food = sample(c("apple", "banana", "cherry"), size = 50, replace = TRUE),
city = sample(c("heidelberg", "paris", "new york"), size = 50, replace = TRUE),
age = rnorm(n = 50, mean = 40, sd = 15))
y <- rnbinom(n = 50, mu = 3, size = 1/3.1)
fit <- glm_gp(y, design = ~ fav_food + city + age, col_data = data)
# The custom design matrix makes sure we get the standard errorr of the coefficients
identity_design_matrix <- diag(nrow = ncol(fit$Beta))
pred <- predict(fit, se.fit = TRUE, newdata = identity_design_matrix)
pred
#> $fit
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] -0.2709198 -1.747597 -0.9085034 0.8920867 1.191104 0.02410875
#>
#> $se.fit
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0.765845 0.5959845 0.540548 0.6145076 0.5942666 0.01663141
#>
#> $residual.scale
#> [1] 1 Created on 2021-01-11 by the reprex package (v0.3.0) This does not yet provide the full covariance matrix, but only its diagonal, but I hope this can still be useful. |
This looks great, thank you so much! I'll try it out and see how well it works for my use case 👍 |
I am also interested in this! I would like to be able to do Wald tests with glmGamPoi. Currently I figured out a hack which others may find useful. Note that in below "i" indicates a particular gene.
Needless to say being able to just compute Wald statistics directly from glmGamPoi would be much better. |
That's quite clever :) Can you say a bit more, why you want to do a Wald test instead of a likelihood ratio? Is it speed concerns? |
@const-ae thanks! So first of all I am actually doing this on neuroscience data not genes but I noticed many times on my data the F-statistic from the |
Oh, that doesn't sound good. The last time, I noticed negative F values, there were some convergence problems for the full model, which meant that the reduced model got a larger likelihood. Do you think you could provide an example that produces the negative F-statistic? (If yes, please open a new issue, so that I can look into this).
Yes, that sounds reasonable. I'll try to implement a Wald test, but to be honest, I am not sure, when I will get around to do this. |
Sorry I can't provide the example right now, it's not public data (yet) and my coauthors aren't comfortable with me sharing it until after the publication unfortunately. But I can say, when I re-ran glm_gp on one of the individual rows that had a negative F value in the original "fit to whole matrix" run, it no longer had a negative F value (although the df2 was Inf). This makes me wonder if the problem is something to do with shrinkage across the different rows/ genes? |
Hello, |
Hi @ajaynadig, |
I wanted to point out a tricky footgun that arises when using the Namely, you actually need to divide by The reason is that You can verify this by checking that the values returned by This is a pretty easy error for the user to make; so it would be nice if |
Thanks. You are right that it a subtle bug that is easy to miss. Thank you for commenting here and highlighting it! I amended the documentation of
Would you be interested in drafting a PR that adds the functionality to |
Closed by #63. |
In thelovelab/DESeq2#29 @mschubert raised the point that
glmGamPoi
, unlikeDESeq2
does not return the standard error for the LFC estimates.His use-case for them is change point detection, where he plugs-in the LFC and the standard error into the ecp package to detect structural variances (e.g. stretches of the genome that are duplicated).
I have not returned the standard error, because they are mainly important for the Wald test, where-as I however, do a likelihood ratio test.
If I want to return the covariance matrix of the coefficients, I would either
fisher_scoring_qr_step()
) to make sure they return the covariance matrix ort(design_matrix) %*% design_matrix / sqrt(n)
The second option seems the easier way, but I would need to think through the math and properly test the estimator.
The text was updated successfully, but these errors were encountered: