Skip to content

Commit

Permalink
[ci skip] pull from develop
Browse files Browse the repository at this point in the history
  • Loading branch information
Rob Patro committed Jul 7, 2021
1 parent b7238a9 commit 7e20085
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,21 +30,22 @@ The documentation for Salmon is available on [ReadTheDocs](http://readthedocs.or
Salmon is, and will continue to be, [freely and actively supported on a best-effort basis](https://oceangenomics.com/about/#open).
If you need industrial-grade technical support, please consider the options at [oceangenomics.com/contact](http://oceangenomics.com/contact).

### Pre-computed decoy transcriptomes
Decoy sequences in transcriptomes
=================================

tl;dr: fast is good but fast and accurate is better!
Although the precomputed decoys (<=v.14.2) are still compatible with the latest major release (v1.5.1). We recommend updating your index using the full genome, as it can give significantly higher accuracy. For more information, please check our extensive benchmarking comparing different alignment methods and their performance on RNA-seq quantification in the latest revised preprint [manuscript](https://www.biorxiv.org/content/10.1101/657874v2).
Please use the [tutorial](https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/) for a step-by-step guide on how to efficiently index the reference transcriptome and genome for accurate gentrome based RNA-seq quantification.

Specifically, there are 3 possible ways in which the salmon index can be created:
[Alignment and mapping methodology influence transcript abundance estimation](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02151-8), and accounting for the [accounting for fragments of unexpected origin can improve transcript quantification](https://www.biorxiv.org/content/10.1101/2021.01.17.426996v1). To this end, salmon provides the ability to index both the transcriptome as well as decoy seuqence that can be considered during mapping and quantification. The decoy sequence accounts for reads that might otherwise be (spuriously) attributed to some annotated transcript. This [tutorial](https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/) provides a step-by-step guide on how to efficiently index the reference transcriptome and genome to produce a decoy-aware index. Specifically, there are 3 possible ways in which the salmon index can be created:

* cDNA-only index : salmon_index - https://combine-lab.github.io/salmon/getting_started/. This method will result in the smallest index and require the least resources to build, but will be the most prone to possible spurious alignments.

* SA mashmap index: salmon_partial_sa_index - (regions of genome that have high sequence similarity to the transcriptome) - Details can be found in [this README](https://github.com/COMBINE-lab/SalmonTools/blob/master/README.md) and using [this script](https://raw.githubusercontent.com/COMBINE-lab/SalmonTools/master/scripts/generateDecoyTranscriptome.sh). While running mashmap can require considerable resources, the resulting decoy files are fairly small. This will result in an index bigger than the cDNA-only index, but still mucch smaller than the full genome index below. It will confer many, though not all, of the benefits of using the entire genome as a decoy sequence.

* SAF genome index: salmon_sa_index - (the full genome is used as decoy) - The tutorial for creating such an index can be found [here](https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/). This will result in the largest index, but likely does the best job in avoiding spurious alignments to annotated transcripts.

**Facing problems with Indexing ?, Check if anyone else already had this problem in the issues section or fill the index generation [request form](https://forms.gle/3baJc5SYrkSWb1z48)**
**Facing problems with Indexing?**, Check if anyone else already had this problem in the issues section or fill the index generation [request form](https://forms.gle/3baJc5SYrkSWb1z48)

### **NOTE**:
If you are generating an index to be used for single-cell or single-nucleus quantification with [alevin-fry](https://github.com/COMBINE-lab/alevin-fry), then we recommend you consider building a spliced+intron (_splici_) reference. This serves much of the purpose of a decoy-aware index when quantifying with alevin-fry, while also providing the capability to attribute splicing status to mapped fragments. More details about the _splici_ reference and the Unspliced/Spliced/Ambiguous quantification mode it enables can be found [here](https://combine-lab.github.io/alevin-fry-tutorials/2021/improving-txome-specificity/).

Chat live about Salmon
======================
Expand Down

0 comments on commit 7e20085

Please sign in to comment.