Handle no-ops more intelligently when creating MD tags #392

Closed
kristalcurtis opened this Issue Sep 22, 2014 · 5 comments

Comments

Projects
None yet
2 participants
@kristalcurtis

Currently, Adam creates very verbose MD tags when it encounters no-ops, e.g., 2G0A0T0G0A2G2A0T0G0T1G0T0C3T0G0A0G0T1G0T0C0A1G0T0C0T0G0A0T0G1A3G0A0C0A0T0C0A0T1G2C0A0G0A0T0G0C0T0G0A0G4C0A0G1C0A2C0A0T0A0T1G0T0G9.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 22, 2014

Member

D'oh! That doesn't look correct... Which piece of code is emitting this?

Member

fnothaft commented Sep 22, 2014

D'oh! That doesn't look correct... Which piece of code is emitting this?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 22, 2014

Member

Er, actually, if those all are mismatches, that could be correct. Are you able to share the read/alignment publicly?

Member

fnothaft commented Sep 22, 2014

Er, actually, if those all are mismatches, that could be correct. Are you able to share the read/alignment publicly?

@kristalcurtis

This comment has been minimized.

Show comment
Hide comment
@kristalcurtis

kristalcurtis Sep 22, 2014

Sure, I can share; it's just from the SMaSH Venter reads.

Here's the original read:
@chr22_42898209_42898675_?:?:?_?:?:?_MATERNAL_42933514_42933981_649d432/2
GGGGGGGGGGGCACCATATGGGTGGCTGGGGGCTCAGCATCTGGGCCATGATGTCCCCTTCATCAGACCTGACCACTCAAAAGACCACATTTCCCTCATCC
+
CGFGGG?FGGCGFGGAGG4GFGGG6FFAGGGGDGEFFGFGGGDB@GFGGGGG?GGGGGFFEEFFGG@>EGGBGGGGG@G16?DGB4GDGFBEGFEGFFGF>

It aligns to chr22, 42898574 (if 0-indexed), in the reverse direction. Here's the cigar string & MD tag I get from Adam:
cigar: 10M1I1M2I1M1I3M1D3M1D5M1D4M1I4M1I4M2I1M2I16M4D4M1D5M1D2M1I1M1I9M1I3M1D1M2D1M1D9M
mdTag: 3A0T1A3A0A0A0T0G0^T2T0^C1T0T0T0G0^A1T2T0C0A0G0G0T0C0T1A0T0G0A1G1G0G0A1A1C0A0T0^GGCC0C0A0G0A0^T0G1T0G0A0^G0C1C0C2G0C0C0A0C1C0A0T0^A1^GG1^G0C3C1C0C1

It looks like there's an indel at the beginning of the read:
scala> genome.substring(42898574, 42898574 + 101)
res6: String = GGGATGAGGGAAATGTGGTCTTTTGAGTGGTCAGGTCTGATGAAGGGGACATCATGGCCCAGATGCTGAGCCCCCAGCCACCCATATGGTGCCCCCCCCCC

Maybe that's what is causing the hiccup? I realize this MD tag is different from the one I posted... maybe the code changed in between me getting the above result (about a week ago) and now?

Sure, I can share; it's just from the SMaSH Venter reads.

Here's the original read:
@chr22_42898209_42898675_?:?:?_?:?:?_MATERNAL_42933514_42933981_649d432/2
GGGGGGGGGGGCACCATATGGGTGGCTGGGGGCTCAGCATCTGGGCCATGATGTCCCCTTCATCAGACCTGACCACTCAAAAGACCACATTTCCCTCATCC
+
CGFGGG?FGGCGFGGAGG4GFGGG6FFAGGGGDGEFFGFGGGDB@GFGGGGG?GGGGGFFEEFFGG@>EGGBGGGGG@G16?DGB4GDGFBEGFEGFFGF>

It aligns to chr22, 42898574 (if 0-indexed), in the reverse direction. Here's the cigar string & MD tag I get from Adam:
cigar: 10M1I1M2I1M1I3M1D3M1D5M1D4M1I4M1I4M2I1M2I16M4D4M1D5M1D2M1I1M1I9M1I3M1D1M2D1M1D9M
mdTag: 3A0T1A3A0A0A0T0G0^T2T0^C1T0T0T0G0^A1T2T0C0A0G0G0T0C0T1A0T0G0A1G1G0G0A1A1C0A0T0^GGCC0C0A0G0A0^T0G1T0G0A0^G0C1C0C2G0C0C0A0C1C0A0T0^A1^GG1^G0C3C1C0C1

It looks like there's an indel at the beginning of the read:
scala> genome.substring(42898574, 42898574 + 101)
res6: String = GGGATGAGGGAAATGTGGTCTTTTGAGTGGTCAGGTCTGATGAAGGGGACATCATGGCCCAGATGCTGAGCCCCCAGCCACCCATATGGTGCCCCCCCCCC

Maybe that's what is causing the hiccup? I realize this MD tag is different from the one I posted... maybe the code changed in between me getting the above result (about a week ago) and now?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 22, 2014

Member

Hmmm, that's a messy alignment. I'll look into this...

Member

fnothaft commented Sep 22, 2014

Hmmm, that's a messy alignment. I'll look into this...

@fnothaft fnothaft added the bug label Sep 22, 2014

@fnothaft fnothaft added the wontfix label Jul 20, 2016

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 20, 2016

Member

Closing as won't fix.

Member

fnothaft commented Jul 20, 2016

Closing as won't fix.

@fnothaft fnothaft closed this Jul 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment