Please check indel normalisation works appropriately for reverse orientation reads. #10

mdkeehan · 2020-11-30T21:02:56Z

First thank you for writing a gpu aligner that actually offers appreciable benefits. (Even without nvlink).
I have been checking how the alignments compare to bwa2 alignments and noticed the following difference in detected variant calls.

The top alignment is from arioc the bottom alignment is from bwa2. The insertion event is from a single G to two G's.
It looks like arioc has left normalized the reverse reads differently to the forward aligned reads.

mdkeehan · 2020-12-10T01:10:27Z

A hopefully clearer image from shareable test data showing the homopolymer indel alignment problem based on the S_cerevisiae test data supplied with Arioc.
The location is chromosome IV:561156-561223

The variant is from a 7 long sequence of T to an 8 long sequence of T.
The top bam alignment is from arioc
The bottom alignment is from bwa.
reads have been colored by read strand to show the problem.

RWilton · 2020-12-10T15:27:01Z

Thank you for looking so carefully at the alignments here. From your IGV snapshots it certainly does appear that Arioc has mapped these short repeating indels differently at the same reference location for the forward and reverse-complement strands.

I have a "theory" about why Arioc is doing this, but rather than speculate, I'd prefer first to reproduce the problem in a debugging environment. I will have a look and follow up as soon as I can.

mdkeehan · 2020-12-10T20:04:18Z

Thank you Richard for having a look at it. Let me know if there is something I can do to help. I'd really like to be able to recommend Arioc to my colleagues as a faster replacement for bwa.

RWilton · 2020-12-17T15:10:08Z

Ok, here is what I think is going on: BWA and Bowtie 2 (BT2) compute reverse-complement alignments by aligning the reverse complement of the read sequence to the forward strand of the reference genome; Arioc does this by aligning the read sequence to the reverse complement of the reference. There is no notion of "indel normalization" in the Smith-Waterman algorithm the aligners are using to compute gapped alignments.

I can see how what you've observed would be unexpected behavior if you're accustomed to looking at mappings produced by BWA or BT2. On the other hand, it should not affect variant calls downstream unless your toolchain relies on the read aligner to produce consistently normalized variant mappings -- something that neither BWA, BT2, nor Arioc can promise. (The additional computation involved would be a tangible performance hit in a general-purpose read aligner.) The fact that BWA and BT2 report indel mappings that are consistent with regard to forward versus reverse complement strands is an artifact of how those aligners are implemented, and not because they explicitly perform variant normalization.

If this is a problem in your work -- that is, if you rely on the read aligner to produce identical forward and reverse complement mappings around indels -- then please give me some details! I'd want to consider how the problem might be addressed in Arioc without going down the rabbit hole of implementing some kind of variant-normalization postprocessing.

mdkeehan · 2020-12-20T22:39:29Z

By default freebayes includes an alignment indel normalisation step. My example difference in variant calling goes away...
Solution use a variant caller with alignment indel normalisation step built in.

The example with the problem was generated using bcftools mpileup|call.

RWilton closed this as completed Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please check indel normalisation works appropriately for reverse orientation reads. #10

Please check indel normalisation works appropriately for reverse orientation reads. #10

mdkeehan commented Nov 30, 2020

mdkeehan commented Dec 10, 2020

RWilton commented Dec 10, 2020

mdkeehan commented Dec 10, 2020

RWilton commented Dec 17, 2020

mdkeehan commented Dec 20, 2020

Please check indel normalisation works appropriately for reverse orientation reads. #10

Please check indel normalisation works appropriately for reverse orientation reads. #10

Comments

mdkeehan commented Nov 30, 2020

mdkeehan commented Dec 10, 2020

RWilton commented Dec 10, 2020

mdkeehan commented Dec 10, 2020

RWilton commented Dec 17, 2020

mdkeehan commented Dec 20, 2020