Return standard error of coefficient estimates #12

const-ae · 2020-11-24T11:01:48Z

In thelovelab/DESeq2#29 @mschubert raised the point that glmGamPoi, unlike DESeq2 does not return the standard error for the LFC estimates.

His use-case for them is change point detection, where he plugs-in the LFC and the standard error into the ecp package to detect structural variances (e.g. stretches of the genome that are duplicated).

I have not returned the standard error, because they are mainly important for the Wald test, where-as I however, do a likelihood ratio test.

If I want to return the covariance matrix of the coefficients, I would either

need to rewrite the inner fitting routines (fisher_scoring_qr_step()) to make sure they return the covariance matrix or
Calculate the covariance matrix on demand. The formula would probably be something like: t(design_matrix) %*% design_matrix / sqrt(n)

The second option seems the easier way, but I would need to think through the math and properly test the estimator.

The text was updated successfully, but these errors were encountered:

const-ae · 2021-01-11T09:31:37Z

Last week, I implemented a predict() function for glmGamPoi. It is mainly for predicting the mu and the corresponding standard error, but you can also use it to get the standard error of the coefficients:

library(glmGamPoi)
data <- data.frame(fav_food = sample(c("apple", "banana", "cherry"), size = 50, replace = TRUE),
                   city = sample(c("heidelberg", "paris", "new york"), size = 50, replace = TRUE),
                   age = rnorm(n = 50, mean = 40, sd = 15))
y <- rnbinom(n = 50, mu = 3, size = 1/3.1)
fit <- glm_gp(y, design = ~ fav_food + city + age, col_data = data)
# The custom design matrix makes sure we get the standard errorr of the coefficients
identity_design_matrix <-  diag(nrow = ncol(fit$Beta))
pred <- predict(fit, se.fit = TRUE, newdata = identity_design_matrix)
pred
#> $fit
#>            [,1]      [,2]       [,3]      [,4]     [,5]       [,6]
#> [1,] -0.2709198 -1.747597 -0.9085034 0.8920867 1.191104 0.02410875
#> 
#> $se.fit
#>          [,1]      [,2]     [,3]      [,4]      [,5]       [,6]
#> [1,] 0.765845 0.5959845 0.540548 0.6145076 0.5942666 0.01663141
#> 
#> $residual.scale
#> [1] 1

^{Created on 2021-01-11 by the reprex package (v0.3.0)}

This does not yet provide the full covariance matrix, but only its diagonal, but I hope this can still be useful.

mschubert · 2021-01-11T11:51:28Z

This looks great, thank you so much!

I'll try it out and see how well it works for my use case 👍

willtownes · 2021-02-08T14:49:00Z

I am also interested in this! I would like to be able to do Wald tests with glmGamPoi. Currently I figured out a hack which others may find useful. Note that in below "i" indicates a particular gene.

Fit glmGamPoi to the data.
Extract overdispersion parameters and store the vector of thetas as 1/overdispersion.
fam<-MASS::negative.binomial(theta=thetas[i])
Assuming genes in rows: fit<-glm(Y[i,]~X+0, family=fam, offset=log(size_factors))
summary(fit,dispersion=1)
Note this is NOT calling glm.nb from MASS. It is the base glm where we force the dispersion parameter to not change during fitting. Also, it's essential to set dispersion=1 in calling summary since otherwise the base R glm thinks this is a linear regression fit (don't ask me why!) and tries to estimate a value for sigma2 on top of the theta that's already in there. It will also weirdly return t-statistics instead of the correct z-statistic unless dispersion is explicitly set in the call to summary.

Needless to say being able to just compute Wald statistics directly from glmGamPoi would be much better.

const-ae · 2021-02-08T14:52:33Z

Currently I figured out a hack which others may find useful.

That's quite clever :)

Can you say a bit more, why you want to do a Wald test instead of a likelihood ratio? Is it speed concerns?

willtownes · 2021-02-08T16:00:30Z

@const-ae thanks! So first of all I am actually doing this on neuroscience data not genes but I noticed many times on my data the F-statistic from the test_de function was negative and the p-value was just truncated to 1.000. Also, I like the convenience of pulling lots of Wald test results out of a single model fit rather than having to re-evaluate test_de with a bunch of separate reduced model matrices. But these are all pretty minor considerations, the main benefit I am getting from glmGamPoi is the stable estimation of the dispersion parameters which is awesome! Oh and speed is not a limitation for me at the moment, I'm not working with a very big dataset.

const-ae · 2021-02-08T16:08:45Z

So first of all I am actually doing this on neuroscience data not genes but I noticed many times on my data the F-statistic from the test_de function was negative and the p-value was just truncated to 1.000.

Oh, that doesn't sound good. The last time, I noticed negative F values, there were some convergence problems for the full model, which meant that the reduced model got a larger likelihood. Do you think you could provide an example that produces the negative F-statistic? (If yes, please open a new issue, so that I can look into this).

Also, I like the convenience of pulling lots of Wald test results out of a single model fit rather than having to re-evaluate test_de with a bunch of separate reduced model matrices.

Yes, that sounds reasonable. I'll try to implement a Wald test, but to be honest, I am not sure, when I will get around to do this.

willtownes · 2021-02-12T16:51:50Z

Sorry I can't provide the example right now, it's not public data (yet) and my coauthors aren't comfortable with me sharing it until after the publication unfortunately. But I can say, when I re-ran glm_gp on one of the individual rows that had a negative F value in the original "fit to whole matrix" run, it no longer had a negative F value (although the df2 was Inf). This makes me wonder if the problem is something to do with shrinkage across the different rows/ genes?

ajaynadig · 2023-02-04T04:44:35Z

Hello,
Thank you for your great improvements to the DESeq2 package over the years. I just wanted to add to this thread that another rationale for a standard error for glmGamPoi beyond Wald tests would be to run shrinkage using ashr or apeglm, as implemented in lfcShrink. These shrinkage functions requires point estimates and standard errors, and currently return an error when run on output from DESeq2 using glmGamPoi. Seeing as ashr and apeglm are now recommended over normal shrinkage within DESeq2, it would be great to be able to use these functions on the output from glmGamPoi.

const-ae · 2023-02-06T17:42:25Z

Hi @ajaynadig,
thanks for reaching out. Taking the output from glmGamPoi and running it through apeglm/ashr is an interesting use case that I haven't thought about so far. Would it be possible to call predict with se.fit = TRUE as described in the comment above (#12 (comment)) to produce the necessary input? (I guess one would have to replace the identity_design_matrix with the contrast vector).

jackkamm · 2024-05-25T00:26:33Z

I wanted to point out a tricky footgun that arises when using the predict() method to return the standard errors.

Namely, you actually need to divide by log(2) to get the standard errors for the lfc returned by test_de().

The reason is that test_de() returns log2-fold-changes, however the GLM link function is on natural log scale.

You can verify this by checking that the values returned by predict(offset=0)$fit are off by a factor of log(2) from the lfc column returned by test_de().

This is a pretty easy error for the user to make; so it would be nice if test_de() could also return the SE on the appropriate scale.

@jackkamm

Thanks to @jackkamm for highlighting the issue (#12)

const-ae · 2024-05-27T08:53:25Z

Thanks. You are right that it a subtle bug that is easy to miss. Thank you for commenting here and highlighting it!

I amended the documentation of predict.glmGamPoi to warn others.

This is a pretty easy error for the user to make; so it would be nice if test_de() could also return the SE on the appropriate scale.

Would you be interested in drafting a PR that adds the functionality to test_de? Maybe we could make it optional for now (i.e., add an argument to test_de that is called compute_lfc_se and set it to FALSE by default).

const-ae · 2024-05-29T16:03:57Z

Closed by #63.

const-ae mentioned this issue Nov 24, 2020

glmGamPoi fit fails if normMatrix is provided thelovelab/DESeq2#29

Closed

const-ae mentioned this issue Aug 6, 2021

determining the less trustworthy log2fc values #27

Open

const-ae added a commit that referenced this issue May 27, 2024

Add Note to documentation of predict regarding log vs log2 link

471e5d0

Thanks to @jackkamm for highlighting the issue (#12)

jackkamm mentioned this issue May 27, 2024

Add compute_lfc_se option to test_de #63

Merged

const-ae closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return standard error of coefficient estimates #12

Return standard error of coefficient estimates #12

const-ae commented Nov 24, 2020

const-ae commented Jan 11, 2021

mschubert commented Jan 11, 2021

willtownes commented Feb 8, 2021

const-ae commented Feb 8, 2021

willtownes commented Feb 8, 2021

const-ae commented Feb 8, 2021

willtownes commented Feb 12, 2021

ajaynadig commented Feb 4, 2023

const-ae commented Feb 6, 2023

jackkamm commented May 25, 2024

const-ae commented May 27, 2024

const-ae commented May 29, 2024

Return standard error of coefficient estimates #12

Return standard error of coefficient estimates #12

Comments

const-ae commented Nov 24, 2020

const-ae commented Jan 11, 2021

mschubert commented Jan 11, 2021

willtownes commented Feb 8, 2021

const-ae commented Feb 8, 2021

willtownes commented Feb 8, 2021

const-ae commented Feb 8, 2021

willtownes commented Feb 12, 2021

ajaynadig commented Feb 4, 2023

const-ae commented Feb 6, 2023

jackkamm commented May 25, 2024

const-ae commented May 27, 2024

const-ae commented May 29, 2024