low mapping rate ? #160

atasub · 2017-10-06T10:18:10Z

I recently ran Salmon by quasi-mapping-based mode and when I checked the salmon_quant.log file, saw that mapping rate was around ~%65-68 for all of the samples. Do you have any suggestions to improve the mapping rate? I used "--libType A" to to infer the library type info and got a warning that "Greater than 5% of the fragments disagreed with the provided library typ", but I guess this is not an issue. This is an example for one of the "lib_format_counts.json" files:

{
    "read_files": "( /mnt/dznehomes/homes/simonj/RNAseq_pipeline/frontal_data/samples/Trimmed_FASTQ_files/00116_GFM_R1_trimmed.fastq.gz, /mnt/dznehomes/homes/simonj/RNAseq_pipeline/frontal_data/samples/Trimmed_FASTQ_files/00116_GFM_R2_trimmed.fastq.gz )",
    "expected_format": "ISR",
    "compatible_fragment_ratio": 0.9241470144855659,
    "num_compatible_fragments": 34584460,
    "num_assigned_fragments": 37423115,
    "num_consistent_mappings": 334748580,
    "num_inconsistent_mappings": 28046150,
    "MSF": 0,
    "OSF": 32448,
    "ISF": 20518131,
    "MSR": 0,
    "OSR": 487250,
    "ISR": 334748580,
    "SF": 1833525,
    "SR": 5088606,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0
}

The text was updated successfully, but these errors were encountered:

rob-p · 2017-10-06T13:57:45Z

Hi @atasub,

It's hard to say exactly if this mapping rate is much lower than expected or not. Many RNA-seq experiments do end up with a mapping rate of 65-70%. One thing that might contribute to a lower mapping rate would be short reads relative to the minimum required exact match length (default of 31). If your reads are relatively short (after trimming, which it looks like you are doing here) --- say ~50bp, then one might try lowering the k value with which the index is built. This will allow more sensitive mapping.

However, the other thing to try is simply to align one of these samples to the genome with a tool like STAR or HISAT2 and look at their mapping rate to known features. If it's similar, then the other reads could be accounted for by e.g. intron retention or even contamination. Finally, @vals has an excellent series of blog posts on investigating and addressing low mapping rates (albeit in single-cell data) that you might find useful. Let me know what you find.

atasub · 2017-10-06T15:22:07Z

Hi @rob-p ,
thank you for your reply, it was very helpful.
I had used HISAT2 before and got overall alignment rate around ~%97-99. So, in this case would you recommend to use alignment-based mode using HISAT2 based bam files?

roryk · 2017-10-06T15:27:43Z

Almost always when I've seen stuff that is a low mapping rate to RNA and a high genomic mapping rate, the culprit is the sample failed and had little to no RNA in it, and what actually got sequenced was DNA. I'm guessing if you'll see a lot of intergenic reads in your hisat2 alignments.

vals · 2017-10-06T15:27:48Z

Hi @atasub

If you're using the same reference and gene annotation for HISAT2 and Salmon but getting lower mapping rate with Salmon, you probably have some DNA contamination.

You should get the same gene expression results from either strategy. (Because in the end the GTF file for the genome and the Fasta file for the transcriptome are equivalent).

hiraksarkar · 2017-10-06T15:28:17Z

Hi @atasub ,
This is interesting, can you tell us a little about the read sequence size. I am currently looking into such RNA-seq files which have bad mapping rates, so curious to know a little more. Also can you try running salmon in selective alignment mode, not sure if that improves mapping rate or quantification, but it is worth a try. https://github.com/COMBINE-lab/salmon/tree/selective-alignment, the associated pre-print is here.

roryk · 2017-10-06T15:28:46Z

Another red flag would be a high rRNA rate going along with it-- the rRNA depletion methods don't work 100%, and if you have no mRNA then the rRNA rate will tend to be higher.

vals · 2017-10-06T15:29:50Z

Yes that is also a good explanation, I recommend putting human rRNA in the Salmon index.

InesdeSantiago · 2018-02-21T15:25:18Z

@hiraksarkar
How can I use selective mapping in Salmon? I dont see any info in the docs (http://salmon.readthedocs.io/en/latest/salmon.html?highlight=selective)

Say, I have paired-end data, I do:
./bin/salmon quant -i transcripts_index -l -1 reads1.fq -2 reads2.fq -o transcripts_quant

How can I specify selective mapping over quasi-mapping?

hiraksarkar · 2018-02-21T15:52:26Z

@InesdeSantiago
Sorry, that the docs are not updated yet. It's definitely on our to-do list. The selective alignment needs a separate index from normal quasi-index. The steps are as following,
If you are in salmon root folder, the most updated branch that implements Selective Alignment is,
this branch

git checkout rescue-orphan
(re-build it)
build/src/salmon index -i selective_alignment_ind -t transcript.fa
salmon quant -i selective_alignment_ind -la -1 reads1.fa -2 reads2.fq -o transcript_quant --softFilter --editDistance 4 --rangeFactorization 4

We strongly recommend these options while using selective alignment, as they tend to produce superior result almost always (I am considering them making default soon :) )

Please let me know if you face problem in any of the above steps, or if the results are not expected.
Thanks again for using selective alignment and Salmon.

InesdeSantiago · 2018-02-22T03:03:41Z

@hiraksarkar
Thanks!
So, apart from the extra options (softFilter, editDistance, rangeFactorization) the only difference is the indexed genome file?

hiraksarkar · 2018-02-22T03:05:54Z

@InesdeSantiago
That is correct. Just to make it sure, Salmon is not designed for the genome, so probably you want to use it only with transcriptome.

InesdeSantiago · 2018-02-22T10:13:38Z

@hiraksarkar. Yes, force of habit, I meant the transcriptome! ;-)

RaymondSHANG · 2019-11-06T18:49:02Z

Hi, I have a similar case, with 30~40% mapping rate by Salmon. I tried hisat2, the mapping rate goes to >80%. samtools sort the sam files to bam, and them qualimap2 gives me the QC results:

Exonic: | 31,212,828 / 41.39%
Intronic: | 39,191,136 / 51.97%
Intergenic: | 5,008,406 / 6.64%
Intronic/intergenic overlapping exon: | 6,243,753 / 8.28%

There is not too much DNA contamination, but a large portion of intronic mappings.
What can I do with these data? Any suggestions?

tamuanand mentioned this issue Mar 11, 2020

>50% of dovetail mapping in salmon 1.1.0 #485

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low mapping rate ? #160

low mapping rate ? #160

atasub commented Oct 6, 2017 •

edited by rob-p

rob-p commented Oct 6, 2017

atasub commented Oct 6, 2017

roryk commented Oct 6, 2017

vals commented Oct 6, 2017

hiraksarkar commented Oct 6, 2017

roryk commented Oct 6, 2017

vals commented Oct 6, 2017

InesdeSantiago commented Feb 21, 2018

hiraksarkar commented Feb 21, 2018 •

edited

InesdeSantiago commented Feb 22, 2018

hiraksarkar commented Feb 22, 2018

InesdeSantiago commented Feb 22, 2018

RaymondSHANG commented Nov 6, 2019

low mapping rate ? #160

low mapping rate ? #160

Comments

atasub commented Oct 6, 2017 • edited by rob-p

rob-p commented Oct 6, 2017

atasub commented Oct 6, 2017

roryk commented Oct 6, 2017

vals commented Oct 6, 2017

hiraksarkar commented Oct 6, 2017

roryk commented Oct 6, 2017

vals commented Oct 6, 2017

InesdeSantiago commented Feb 21, 2018

hiraksarkar commented Feb 21, 2018 • edited

InesdeSantiago commented Feb 22, 2018

hiraksarkar commented Feb 22, 2018

InesdeSantiago commented Feb 22, 2018

RaymondSHANG commented Nov 6, 2019

atasub commented Oct 6, 2017 •

edited by rob-p

hiraksarkar commented Feb 21, 2018 •

edited