Skip to content

Commit

Permalink
Merge pull request #723 from COMBINE-lab/doc_update
Browse files Browse the repository at this point in the history
update single-cell protocol documentation
  • Loading branch information
rob-p committed Nov 22, 2021
2 parents 6829e44 + 9fdee40 commit cefb91f
Showing 1 changed file with 17 additions and 1 deletion.
18 changes: 17 additions & 1 deletion doc/source/alevin.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
Alevin
================

Alevin is a tool --- integrated with the salmon software --- that introduces a family of algorithms for quantification and analysis of 3' tagged-end single-cell sequencing data. Currently alevin supports the following two major droplet based single-cell protocols:
Alevin is a tool --- integrated with the salmon software --- that introduces a family of algorithms for quantification and analysis of 3' tagged-end single-cell sequencing data. Currently alevin supports the following single-cell protocols:

1. Drop-seq
2. 10x-Chromium v1/2/3
3. inDropV2
4. CELSeq 1/2
5. Quartz-Seq2
6. sci-RNA-seq3

Alevin works under the same indexing scheme (as salmon) for the reference, and consumes the set of FASTA/Q files(s) containing the Cellular Barcode(CB) + Unique Molecule identifier (UMI) in one read file and the read sequence in the other. Given just the transcriptome and the raw read files, alevin generates a cell-by-gene count matrix (in a fraction of the time compared to other tools).

Expand Down Expand Up @@ -177,6 +181,18 @@ map end-to-end. Instead, the score of the mapping will be the position along th
highest score. This is the score which must reach the fraction threshold for the read to be considered
as valid.

Single-cell protocol specific notes
------------------------------------

In cases where single-cell protocol supports variable length cellbarcodes, alevin adds nucleotide padding to make the lengths uniform.
Furthermore, the padding scheme ensures that there are no collisions added in the process. The padding scheme is as follows:

1. sci-RNA-seq3: The barcode is composed of 9-10 bp hairpin adaptor and 10 bp reverse transcription index making it 19-20 bp long. If
the bacode is 20 bp long, alevin adds `A` and it adds `AC` if it is 19 bp long. Thus, the length of barcode in the output is 21 bp.
2. inDropV2: 8-11 bp barcode1 along with 8 bp barcode2 makes up the barcode. For barcode lengths of 16, 17, 18, and 19 bp, alevin adds
`AAAC`, `AAG`, `AT`, and `A` respectively. Thus, the length of barcode in the output is 20 bp. Furthermore, the position of barcode1 is
dependent on finding exact match of sequence `w1`. If exact match is not found, a search for `w1` is performed allowing a maximum hamming
distance 2 b/w `w1` and read2 substring of w1 length within the required bounds; the first match is returned.

Output
------
Expand Down

0 comments on commit cefb91f

Please sign in to comment.