-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
low mapping rate ? #160
Comments
Hi @atasub, It's hard to say exactly if this mapping rate is much lower than expected or not. Many RNA-seq experiments do end up with a mapping rate of 65-70%. One thing that might contribute to a lower mapping rate would be short reads relative to the minimum required exact match length (default of 31). If your reads are relatively short (after trimming, which it looks like you are doing here) --- say ~50bp, then one might try lowering the k value with which the index is built. This will allow more sensitive mapping. However, the other thing to try is simply to align one of these samples to the genome with a tool like STAR or HISAT2 and look at their mapping rate to known features. If it's similar, then the other reads could be accounted for by e.g. intron retention or even contamination. Finally, @vals has an excellent series of blog posts on investigating and addressing low mapping rates (albeit in single-cell data) that you might find useful. Let me know what you find. |
Hi @rob-p , |
Almost always when I've seen stuff that is a low mapping rate to RNA and a high genomic mapping rate, the culprit is the sample failed and had little to no RNA in it, and what actually got sequenced was DNA. I'm guessing if you'll see a lot of intergenic reads in your hisat2 alignments. |
Hi @atasub If you're using the same reference and gene annotation for HISAT2 and Salmon but getting lower mapping rate with Salmon, you probably have some DNA contamination. You should get the same gene expression results from either strategy. (Because in the end the GTF file for the genome and the Fasta file for the transcriptome are equivalent). |
Hi @atasub , |
Another red flag would be a high rRNA rate going along with it-- the rRNA depletion methods don't work 100%, and if you have no mRNA then the rRNA rate will tend to be higher. |
Yes that is also a good explanation, I recommend putting human rRNA in the Salmon index. |
@hiraksarkar Say, I have paired-end data, I do: How can I specify selective mapping over quasi-mapping? |
@InesdeSantiago
We strongly recommend these options while using selective alignment, as they tend to produce superior result almost always (I am considering them making default soon :) ) Please let me know if you face problem in any of the above steps, or if the results are not expected. |
@hiraksarkar |
@InesdeSantiago |
@hiraksarkar. Yes, force of habit, I meant the transcriptome! ;-) |
Hi, I have a similar case, with 30~40% mapping rate by Salmon. I tried hisat2, the mapping rate goes to >80%. samtools sort the sam files to bam, and them qualimap2 gives me the QC results: Exonic: | 31,212,828 / 41.39% There is not too much DNA contamination, but a large portion of intronic mappings. |
I recently ran Salmon by quasi-mapping-based mode and when I checked the salmon_quant.log file, saw that mapping rate was around ~%65-68 for all of the samples. Do you have any suggestions to improve the mapping rate? I used "--libType A" to to infer the library type info and got a warning that "Greater than 5% of the fragments disagreed with the provided library typ", but I guess this is not an issue. This is an example for one of the "lib_format_counts.json" files:
The text was updated successfully, but these errors were encountered: