-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing DESEq2 results from HRSIP command #10
Comments
The output dataframe from the HRSIP command should contain most relevant data from the DESeq2 call (pvalue, padj, log2 fold change, etc). Any OTUs that are missing from that dataframe likely didn't pass the sparsity filtering. |
Thank you, @seb369. |
At this time the baseMean data are not maintained through the HRSIP function. This may be something I will add to an updated version. I would recommend in the mean time to follow the basic protocol for normalizing counts from DESeq2 separately. |
I have followed your advice in the meantime. Looking into DESeq2_l2fc.R function I found out that p values are not calculated in the same way as in the DESeq2 original function (negative binomial generalized linear models +Wald test). From what I have read, that modelisation step was introduced to increase statistical power of detection of genes (OTUs in this case). Also, what is the advantage of employing gm_mean instead of the default behaviour? Thank you very much in advance. I can make this a separate thread if you consider it is a really different topic; I just thought it was related to my original question. |
To your first question, HRSIP does calculate pvalue pretty much the same way, we just do it manually ourselves. The real difference is that our method does not do any outlier filtering based on cooks distances. This makes this analysis a bit less conservative than DESeq2. However there is discussion and even suggestion to use DESeq without this outlier filtering. For that, check out the DESeq2 tutorial. Essentially our pvalue calculation should be the same as running: DESeq2::results(dds, independentFiltering = FALSE, lfcThreshold = l2fc_threshold, altHypothesis="greater", cooksCutoff=FALSE) for most cases (following the other earlier filtering and DESeq steps). To answer your second question, the modified version of the geometric means for estimateSizeFactors is to take into account that your OTU read counts often have a lot of 0's, which the standard geometric means that DESeq usually uses for this step has a problem with. Check out the manual for estimateSizeFactors and a few of the posts online about this (e.g. joey711/phyloseq#445). |
Thank you very much again for your answer. As I said, by accessing all these HTSSIP/DESeq2 internal calculations, I was trying to track some OTUs that I expected to be labeled with H218O given some other results (quantification of 16S rRNA genes in gradient fractions) but are not caught as such by HTSSIP. I am afraid I still do not fully understand why these two approaches do not agree. Since you may have a better understanding of what’s going on and what to look at, I am posting the specifics. Quantification of 16S rRNA genes in gradient fractions shows that most of the sequences slightly shifted towards higher densities in the treated samples (see figure attached) (red and orange), as compared to the control (in blue), with most of them falling in the middle heavy region (MH), suggesting that almost the whole community started to incorporate the isotope. In this region, the curves for control and treated samples overlapped. When I run HTSSIP for detecting significantly labeled OTUs in this MH region, it only highlights one OTU as significantly labeled. The only thing I can think of is that, since the response is weak and we do not see a full separation of the curves of control and treated samples in the range considered (MH) and the whole community seems to shift, it is well possible that more or less the same bacterial community is similarly represented in the control and treated samples in this fraction, which may interfere with methods for detecting labeled OTUs. However, since HTSSIP (or DESeq2) does not directly work with relative abundances, I am not sure how well it handles this kind of situations. Any hint of what to look at to understand this and conciliate results would be highly appreciated. |
It is difficult to determine why you are not finding many incorporators with HTSSIP without really digging into the data, but here are some things to consider or look at. Many of these you might have already considered.
I hope these help. |
I was trying to inspect the results of DESeq2 call within HRSIP to track some OTUs that I expected to be labeled given some other results but are not caught as such by HTSSIP. Is there a direct way to retrieve those (I can't see it in the manual) or should I ran all commands in the function separately to access thsi result ("res")?
Thank you
The text was updated successfully, but these errors were encountered: