Merge pull request #723 from COMBINE-lab/doc_update

update single-cell protocol documentation
COMBINE-lab · Nov 22, 2021 · cefb91f · cefb91f
2 parents 6829e44 + 9fdee40
commit cefb91f
Showing 1 changed file with 17 additions and 1 deletion.
diff --git a/doc/source/alevin.rst b/doc/source/alevin.rst
@@ -1,10 +1,14 @@
 Alevin
 ================
 
-Alevin is a tool --- integrated with the salmon software --- that introduces a family of algorithms for quantification and analysis of 3' tagged-end single-cell sequencing data. Currently alevin supports the following two major droplet based single-cell protocols:
+Alevin is a tool --- integrated with the salmon software --- that introduces a family of algorithms for quantification and analysis of 3' tagged-end single-cell sequencing data. Currently alevin supports the following single-cell protocols:
 
 1. Drop-seq
 2. 10x-Chromium v1/2/3
+3. inDropV2
+4. CELSeq 1/2
+5. Quartz-Seq2
+6. sci-RNA-seq3
 
 Alevin works under the same indexing scheme (as salmon) for the reference, and consumes the set of FASTA/Q files(s) containing the Cellular Barcode(CB) + Unique Molecule identifier (UMI) in one read file and the read sequence in the other.  Given just the transcriptome and the raw read files, alevin generates a cell-by-gene count matrix (in a fraction of the time compared to other tools).
 
@@ -177,6 +181,18 @@ map end-to-end.  Instead, the score of the mapping will be the position along th
 highest score.  This is the score which must reach the fraction threshold for the read to be considered
 as valid.
 
+Single-cell protocol specific notes
+------------------------------------
+
+In cases where single-cell protocol supports variable length cellbarcodes, alevin adds nucleotide padding to make the lengths uniform.
+Furthermore, the padding scheme ensures that there are no collisions added in the process. The padding scheme is as follows:
+
+1. sci-RNA-seq3: The barcode is composed of 9-10 bp hairpin adaptor and 10 bp reverse transcription index making it 19-20 bp long. If 
+the bacode is 20 bp long, alevin adds `A` and it adds `AC` if it is 19 bp long. Thus, the length of barcode in the output is 21 bp. 
+2. inDropV2: 8-11 bp barcode1 along with 8 bp barcode2 makes up the barcode. For barcode lengths of 16, 17, 18, and 19 bp, alevin adds
+`AAAC`, `AAG`, `AT`, and `A` respectively. Thus, the length of barcode in the output is 20 bp. Furthermore, the position of barcode1 is
+dependent on finding exact match of sequence `w1`. If exact match is not found, a search for `w1` is performed allowing a maximum hamming
+ distance 2 b/w `w1` and read2 substring of w1 length within the required bounds; the first match is returned.  
 
 Output
 ------