Want an engine feature to convert XT tagged base qualities to two #3540

sooheelee · 2017-09-01T17:15:09Z

Base qualities of two (#) are handled specially by BWA and our tools and are typically used to indicate adapter sequence. See reply to jhess in https://gatkforums.broadinstitute.org/gatk/discussion/comment/35120#Comment_35120:

That's correct, Q2 bases are considered to be special and left untouched by BQSR.

Currently, there is no easy way to convert base qualities to two. The only instances I am aware of is (i) for SamToFastq, which then unaligns the reads and (ii) MergeBamAlignment, which isn't necessarily a part of everyone's workflow. Also, MergeBamAlignment's CLIP_ADAPTERS softclips XT tagged sequence, which then becomes fair game for our assembly-based callers.

MarkIlluminaAdapters uses aligned reads to mark those with 3' adapter sequence with the XT tag. The XT tag values note the start of the 3' adapter sequence in the read. During MergeBamAlignment, one must especially request that this XT tag is retained in the merged output. Because our assembly-based callers throw out CIGAR strings from the aligner when reassembling reads, so as to use soft-clipped sequence that may contain true variants we wish to resolve, adapter sequence can be incorporated into the graph. This is not an issue for libraries with low levels of adapter read through and for germline calling as we prune nodes in the graph that have less than two reads supporting it.

However, for somatic cases and for libraries where there is considerable adapter read through, the current solution is to hard-clip adapter sequences out of reads or to toss these reads altogether so as not to increase the extent of spurious calls.

The issue with hard-clipping is that our reads become malformed due to a mismatch in CIGAR string and sequence length. These the GATK engine filters. So the solution is to either correct the CIGAR strings or to go back and re-align the clipped reads or again to toss the reads.

It would be great not to have to throw out reads that include some adapter sequence in somatic workflows that call down to the lowest allele fraction variants. It seems this would simply be a matter of a tool or feature that replaces adapter sequence marked with the XT tag with base qualities of 2 and special handling by our callers of sequence with base quality of two.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want an engine feature to convert XT tagged base qualities to two #3540

Want an engine feature to convert XT tagged base qualities to two #3540

sooheelee commented Sep 1, 2017 •

edited

Want an engine feature to convert XT tagged base qualities to two #3540

Want an engine feature to convert XT tagged base qualities to two #3540

Comments

sooheelee commented Sep 1, 2017 • edited

sooheelee commented Sep 1, 2017 •

edited