-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please check indel normalisation works appropriately for reverse orientation reads. #10
Comments
A hopefully clearer image from shareable test data showing the homopolymer indel alignment problem based on the S_cerevisiae test data supplied with Arioc. The variant is from a 7 long sequence of T to an 8 long sequence of T. |
Thank you for looking so carefully at the alignments here. From your IGV snapshots it certainly does appear that Arioc has mapped these short repeating indels differently at the same reference location for the forward and reverse-complement strands. I have a "theory" about why Arioc is doing this, but rather than speculate, I'd prefer first to reproduce the problem in a debugging environment. I will have a look and follow up as soon as I can. |
Thank you Richard for having a look at it. Let me know if there is something I can do to help. I'd really like to be able to recommend Arioc to my colleagues as a faster replacement for bwa. |
Ok, here is what I think is going on: BWA and Bowtie 2 (BT2) compute reverse-complement alignments by aligning the reverse complement of the read sequence to the forward strand of the reference genome; Arioc does this by aligning the read sequence to the reverse complement of the reference. There is no notion of "indel normalization" in the Smith-Waterman algorithm the aligners are using to compute gapped alignments. I can see how what you've observed would be unexpected behavior if you're accustomed to looking at mappings produced by BWA or BT2. On the other hand, it should not affect variant calls downstream unless your toolchain relies on the read aligner to produce consistently normalized variant mappings -- something that neither BWA, BT2, nor Arioc can promise. (The additional computation involved would be a tangible performance hit in a general-purpose read aligner.) The fact that BWA and BT2 report indel mappings that are consistent with regard to forward versus reverse complement strands is an artifact of how those aligners are implemented, and not because they explicitly perform variant normalization. If this is a problem in your work -- that is, if you rely on the read aligner to produce identical forward and reverse complement mappings around indels -- then please give me some details! I'd want to consider how the problem might be addressed in Arioc without going down the rabbit hole of implementing some kind of variant-normalization postprocessing. |
By default freebayes includes an alignment indel normalisation step. My example difference in variant calling goes away... The example with the problem was generated using bcftools mpileup|call. |
First thank you for writing a gpu aligner that actually offers appreciable benefits. (Even without nvlink).
I have been checking how the alignments compare to bwa2 alignments and noticed the following difference in detected variant calls.
The top alignment is from arioc the bottom alignment is from bwa2. The insertion event is from a single G to two G's.
It looks like arioc has left normalized the reverse reads differently to the forward aligned reads.
The text was updated successfully, but these errors were encountered: