-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bamCoverage and RNA-seq data #401
Comments
Alright, I'll just keep on rambling further until someone stops me :) |
If you're not filtering anything then the counts are just the reads/bin (i.e., the final scale factor is really 1). If you do any filtering then use the
The most DESeq2-like normalization is probably SES, since it should at least theoretically be more robust to outliers. One could use Regarding The only real thing to keep in mind with RNAseq is to not extend reads, since that really mucks up introns. BTW, if you specify a scale factor, then it's used directly unless you use the RPKM normalization or do some filtering. The final bin count is then |
I suppose that --normalizeTo1x using the exome length as effective genome Regarding the SES normalization, I think the scaling factors would end Finally, the confusion about the scaling factors come from the fact that if On Thu, Aug 11, 2016 at 9:58 PM, Devon Ryan notifications@github.com
Fidel Ramirez |
Hi guys, thanks for the info.
Any opinions on the influence of the bin size? My remaining confusion stems from the observation that deepTools tells me it's normalizing for depth when I'm not specifying either of the normalization methods:
|
Oh, someone requested the "normalization: depth" output, though we should change the wording. It should probably be something like "percentage kept after filtering" or "None". I'd vote for "None", but have no strong opinion on it. For the bin size, I guess it depends on the goals. The default size is probably OK as long as people just want to look at things in IGV. |
I would also vote for "normalization: none" (if nothing is done to the read counts except aggregating them over bins) The bins will be "drawn" regardless of the annotation, right? So, in principle, read counts that originated from exons could "bleed" into intronic regions. I'm not saying that's a huge concern necessarily, but maybe we should make a note of that in the documentation. |
Correct, bamCoverage doesn't accept a region of any sort. I should note that if someone wants to do a "metagene" plot at some point, then smaller bins probably make more sense, though 1 base bins are likely overkill (having said that, that's what I used in those cases). |
Wait.. I missed the discussion before, but why Plus the size factor calculated by DESeq could be misleading since it's based on the total annotated gene counts, and you don't always want to normalize by this. |
RPKM is fine, it's just that people probably naively think it's doing something it's not. I would generally prefer using the scale factor, since it's more robust, but as long as rRNAs are blacklisted or not an issue then that's probably not an issue. Speaking of that, at least for the atypical kinds of RNAseq (e.g., RiboSeq), you end up needing to do a LOT of blacklisting of regions so they don't throw off the scaling. I expect that'll be the case for other RNAseq variants or even normal RNAseq if it wasn't polyA selected. |
Two points:
|
@friedue 1kb, not 1bp :) |
blacklisting rRNAs and tRNAs is a very good point! |
Are there any remaining issues surrounding this, or should we close it? |
close.. 👍 |
I haven't yet changed the documentation to reflect these recommendations. I can assign it to myself and aim to get it done this week. |
Sounds good! |
I assume that the same points we discussed for single files will also apply to |
Yep, they apply to bamCompare too. |
I rarely use deepTools for RNA-seq data, so apologies if that turns out to be a trivial issue.
I'm just wondering whether the documentation in regard to the RNA-seq handling should be improved.
Let's assume I have several RNA-seq data sets, just plain ol' single read. Currently, we basically say "run bamCoverage in default":
bamCoverage -b reads.bam -o coverage.bw
.If
bamCoverage
is used in default mode, I am assuming that the bigWig will simply contain the counts per bin? No normalization for sequencing depth or the like, right?Should one not use the
--skipNonCoveredRegions
with RNA-seq since the vast majority of the genome is not expected to be covered? Will this influence the way the bins are made, i.e., will this force a behavior where the bins will focus on transcribed regions? Or should we simply recommend to have bin size = 1 bp for RNA-seq data?I was asked whether there's a way to get a "normalized" RNA-seq coverage track using deepTools. I guess,
--normalizeTo1x
should be avoided for RNA-seq since the effective genome size is not really appropriate here (neither is the fragment size, particularly not for paired-end data!)?--normalizeUsingRPKM
is also a bit misleading since RPKMs in the RNA-seq world refer to the transcriptome length (which is a thorny issue in itself). With-bs 1
this option would basically result in a simple division by total counts, which is perhaps the best we can do?How have you used
bamCoverage
with RNA-seq data?The text was updated successfully, but these errors were encountered: