Caution
This package is currently under development. Please use midsv until it is complete.
csvtag is a toolkit for csv tag, a format of cs tag that supports inversion.
This is essentially the same encoding as the minimap2 cs tag, but with the one difference that lowercase letters represent inversions:
| Prefix | Sequence | Description |
|---|---|---|
| = | [ACGTN]+ | Identical sequence (long form) |
| : | [0-9]+ | Identical sequence length |
| * | [ACGTN][ACGTN] | Substitution: ref to query |
| + | [ACGTN]+ | Insertion to the reference |
| - | [ACGTN]+ | Deletion from the reference |
| ~ | [ACGTN]{2}[0-9]+[ACGTN]{2} | Intron length and splice signal |
| [=+-*~] | [acgtn] | Inversion |
Important
All csv tags are based on the forward strand of the reference sequence (SAM FLAG is 0). The reverse strand is entirely reverse complemented.
- Inversion detection uses RNAME, POS, and FLAG from SAM files.
- Sort alignments by RNAME and POS.
- If there are 2 or fewer reads for a QNAME, there is no Inversion, so output the cstag in uppercase.
- If there are 3 or more reads for a QNAME, detect Inversion.
- Extract three alignments in order of ascending POS (first, second, third).
- (1) If the reads of first, second, and third are within 50 bp of each other, and only the second is reverse-oriented, then the second is determined to be an Inversion.
- Reverse complement the cs tag of the second and output it as a csv tag in lowercase.
- If there are gaps between first, second, and third, fill them with
N.
- Apply the same process to any adjacent reads.
csvtag.call(): Generate a csv tagcsvtag.to_sequence(): Reconstruct a query subsequence from the alignment