Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AS, ZS and NH tags: semantics? #49

Open
rschulzUK opened this issue May 28, 2016 · 3 comments
Open

AS, ZS and NH tags: semantics? #49

rschulzUK opened this issue May 28, 2016 · 3 comments

Comments

@rschulzUK
Copy link

Dear Daehwan,

I stumbled across two cases (see below) where I am unclear about whether I either incorrectly interpret the tags or they are indeed inconsistent. Hisat2 version and invocation were:

@PG ID:hisat2   PN:hisat2   VN:2.0.3-beta   CL:"/home/rschulz/bin/hisat2-2.0.3-beta/hisat2-align-s --wrapper basic-0 -p 4 --rna-strandness RF -x /home/rschulz/research/data/genomes/GRCm38/hisat2/genome_tran -S ./output/E.sam -1 /tmp/31877.inpipe1 -2 /tmp/31877.inpipe2"

Based on the definition from the Hisat2 web site, ZS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. [...], I understand ZS to refer to the best-scoring alignment among the other found alignments, which could be greater than AS. However, I am confused by the additional sentence Note that, when the read is part of a concordantly-aligned pair, this score could be greater than [AS:i].. Why can ZS only be greater than AS when the read is part of a concordantly-aligned pair?

NH is defined as The number of mapped locations for the read or the pair. Does the use of locations instead of alignments imply that distinct alignments spanning the same coordinates in the target genome are not counted here? That could explain case 1 below, but not case 2.

Any help with understanding this would be much appreciated.

Case 1: ZS is present, suggesting that there are >1 alignments, but NH=1.

HWI-ST1037:275:C496DACXX:7:1206:15243:63664 163 1   4802273 255 7S10M43550N82M  =   4845943 43777   GTTTGGGTCCCCCCTCCCCTGTCTCGGAAACAAACAAACAAACAAACCGAAACACAGACATACAGTATTTCCAACCTAGGTAATATGAAAAGAAATCAA BBBFFFBFFBBFFFFIIIFFFBBFBFBFFFFFFI7BFF<BBF<BFFIF7B77<<<BBBBB<07B<0<<BB<B<B777'<<0<B<<<<BBFFFFFF<<BF AS:i:-9 ZS:i:-16    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:92 YS:i:0  YT:Z:CP XS:A:+  NH:i:1

Case 2: NH=2, but ZS is not present.

HWI-D00505:41:C437JACXX:5:1315:15701:47013  163 1   4807888 1   95M472N5M   =   4808454 666 CCGACGCACTGTCCGCCAGCCGGTGGATGTGCGGCAACAACATGTCCGCTCCGATGCCCGCCGTTGTGCCGGCCGCCCGGAAGGCCACCGCCGCGGTTAT    BBBFFFFFFFFFFIIIIIIIIIIFFIFFIFFIIIIIFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFBFFF<BFF    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100    YS:i:0  YT:Z:CP XS:A:+  NH:i:2

Best, Reiner

@ghost
Copy link

ghost commented Oct 19, 2017

Reiner, was this ever addressed?

@rschulzUK
Copy link
Author

Hi Rick,

I do not know. I have not recently used hisat to see if Daehwan addressed this.

Best, Reiner

@ghost
Copy link

ghost commented Oct 20, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant