Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please check indel normalisation works appropriately for reverse orientation reads. #10

Closed
mdkeehan opened this issue Nov 30, 2020 · 5 comments

Comments

@mdkeehan
Copy link

First thank you for writing a gpu aligner that actually offers appreciable benefits. (Even without nvlink).
I have been checking how the alignments compare to bwa2 alignments and noticed the following difference in detected variant calls.
arioc_alignment
The top alignment is from arioc the bottom alignment is from bwa2. The insertion event is from a single G to two G's.
It looks like arioc has left normalized the reverse reads differently to the forward aligned reads.

@mdkeehan
Copy link
Author

A hopefully clearer image from shareable test data showing the homopolymer indel alignment problem based on the S_cerevisiae test data supplied with Arioc.
The location is chromosome IV:561156-561223

The variant is from a 7 long sequence of T to an 8 long sequence of T.
The top bam alignment is from arioc
The bottom alignment is from bwa.
reads have been colored by read strand to show the problem.

arioc_s_c_alignment

@RWilton
Copy link
Owner

RWilton commented Dec 10, 2020

Thank you for looking so carefully at the alignments here. From your IGV snapshots it certainly does appear that Arioc has mapped these short repeating indels differently at the same reference location for the forward and reverse-complement strands.

I have a "theory" about why Arioc is doing this, but rather than speculate, I'd prefer first to reproduce the problem in a debugging environment. I will have a look and follow up as soon as I can.

@mdkeehan
Copy link
Author

Thank you Richard for having a look at it. Let me know if there is something I can do to help. I'd really like to be able to recommend Arioc to my colleagues as a faster replacement for bwa.

@RWilton
Copy link
Owner

RWilton commented Dec 17, 2020

Ok, here is what I think is going on: BWA and Bowtie 2 (BT2) compute reverse-complement alignments by aligning the reverse complement of the read sequence to the forward strand of the reference genome; Arioc does this by aligning the read sequence to the reverse complement of the reference. There is no notion of "indel normalization" in the Smith-Waterman algorithm the aligners are using to compute gapped alignments.

I can see how what you've observed would be unexpected behavior if you're accustomed to looking at mappings produced by BWA or BT2. On the other hand, it should not affect variant calls downstream unless your toolchain relies on the read aligner to produce consistently normalized variant mappings -- something that neither BWA, BT2, nor Arioc can promise. (The additional computation involved would be a tangible performance hit in a general-purpose read aligner.) The fact that BWA and BT2 report indel mappings that are consistent with regard to forward versus reverse complement strands is an artifact of how those aligners are implemented, and not because they explicitly perform variant normalization.

If this is a problem in your work -- that is, if you rely on the read aligner to produce identical forward and reverse complement mappings around indels -- then please give me some details! I'd want to consider how the problem might be addressed in Arioc without going down the rabbit hole of implementing some kind of variant-normalization postprocessing.

@mdkeehan
Copy link
Author

By default freebayes includes an alignment indel normalisation step. My example difference in variant calling goes away...
Solution use a variant caller with alignment indel normalisation step built in.

The example with the problem was generated using bcftools mpileup|call.

@RWilton RWilton closed this as completed Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants