Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap attribute - CIGAR description - dead link #16

Closed
Juke34 opened this issue Oct 24, 2018 · 0 comments

Comments

@Juke34
Copy link
Contributor

@Juke34 Juke34 commented Oct 24, 2018

Hi,

First of all the link to the Exonerate documentation in the Gap attribute paragraph doesn't work.
Secondly if you go to the exonerate manual web page they don't describe exactly what was available in the past.
I mean they describe the CIGAR format and explain the meaning like that:

Operator Description
M Match
C Codon
G Gap
N Non-equivalenced region
5 5' splice site
3 3' splice site
I Intron
S Split codon
F Frameshift

The CIGAR format related to Samtools that we can find everywhere on internet is like that:

Operator Description
D Deletion; the nucleotide is present in the reference but not in the read
H Hard Clipping; the clipped nucleotides are not present in the read.
I Insertion; the nucleotide is present in the read  but not in the reference.
M Match; can be either an alignment match or mismatch. The nucleotide is present in the reference.
N Skipped region; a region of nucleotides is not present in the read
P Padding; padded area in the read and not in the reference
S Soft Clipping;  the clipped nucleotides are present in the read
X Read Mismatch; the nucleotide is present in the reference
= Read Match; the nucleotide is present in the reference

While old resources like
from 2004 FlyBase here: http://rice.bio.indiana.edu:7082/annot/gff3.html
from 2010 WormBase here: http://wiki.wormbase.org/index.php/GFF3specProposal
Describe the format like that:

Operator Description
M match
I insert a gap into the reference sequence
D insert a gap into the target (delete from reference)
F frameshift forward in the reference sequence
R frameshift reverse in the reference sequence

To gather all the information in one place and not loose any, maybe a solution would be to create your own page describing the CIGAR format in its whole.

Here is the union of the values I have seen in the CIGAR format:

Operator Description
M Match ; can be either an alignment match or mismatch. The nucleotide is present in the reference.
C Codon
G Gap
N Non-equivalenced region
5 5' splice site
3 3' splice site
I Intron / the nucleotide is present in the read but not in the reference. / insert a gap into the reference sequence
S Split codon / Soft Clipping; the clipped nucleotides are present in the read
H Hard Clipping; the clipped nucleotides are not present in the read
F Frameshift / frameshift forward in the reference sequence
D Deletion; the nucleotide is present in the reference but not in the read / insert a gap into the target (delete from reference)
P Padding; padded area in the read and not in the reference
X Read Mismatch; the nucleotide is present in the reference
= Read Match; the nucleotide is present in the reference
R frameshift reverse in the reference sequence
Juke34 added a commit to Juke34/Specifications that referenced this issue Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.