-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read marked as ambiguous but bowtie2 only returned a single hit #108
Comments
Hi Martin, I have tried to recreate what you described on our cluster, and indeed the read is considered ambiguous here as well. The important part in the alignment to the OT strand is: From the Bowtie2 manual:
To illustrate this a bit more I have then checked the Bowtie 2 results against the CT and GA genome indexes. While the read doesn't align to the GA converted genome at all, there are several possible alignments for the CT converted version:
So as you can see the read aligns perfectly to chromosome 2 and 3, the other 3 alignments have quite poor alignment scores. In conclusion, to me it looks like the read is identified correctly as ambiguous and therefore not present in the final BAM file. I hope this clears things up a bit? |
Hi Felix, thank you for your swift response. Does this only apply to the CT_converted reference? Since I have the opposite case, where bismark maps the following read to the GA_converted reference (the CT is unmapped); Read
Bismark result
but when you look at the bowtie2 run; Bowtie2
I get a different returned hit (position 13592231 rather than 13591519 in the same Chromosome) with AS -18 XS 18. I can see why when I run bowtie -a, but that's not what bismark is running right? Isn't bismark calling bowtie2 with the following parameters ( Bowtie2 -a
So in this case, wouldn't this be ambiguous? |
Hi Martin, I think you have found a special case here. The read does indeed show 5 alignments with the command you used even with the latest version of Bowtie2 v2.3.2. If you would use the Bowtie2 defaults (which are `--score-min L,-0.6,-0.6) you would find that the read does in fact yield 306 ambiguous alignments! And I think exactly this is the problem here: the read aligns to so many different repetitive loci that it actually gives up trying any further, but in this case it didn't find a good second alignment yet. So the command:
produces one alignment ( If you increase the
Here, both
I suppose the design decision of the default Bowtie2 alignment mode is a trade-off between overall speed and sensitivity.... |
Hi Felix, thanks for taking the time to chat about it. I'm confused how you're not getting the XS -18 in your example, if I run the same parameters as yours (with 2.3.2) I get; bowtie2 2.3.2
I can understand why we get different alignments depending on which got returned first or different seed etc, but not why it's missing the XS flag. The part that I'm more baffled about is when I run bismark on this same machine I don't get the XS flag. Surely these should be identical if I'm sat at the same machine with the same version of bowtie2. Running bismark using bowtie2-2.3.2
When bismark says it is running with options |
Hi Martin, This is indeed a surprising effect, which I cannot reproduce on our cluster over here (I consistently get the same first alignment, I have tried some 30 times for both Bismark and Bowtie2 on its own). In your case, Bowtie2 obviously picks a different alignment as the first random hit and the 15 subsequent tries depending on whether you launch it on its own or from within Bismark. I know that a number of factors play into the random and seeding heuristics of Bowtie2 such as the read position within the file, possibly some parameters specific to your operating system and platform, and sometimes even the read name! (I recall we had a conversion with Ben Langmead at some point that the same read would give an alignment under one condition but would not align if the read was pre-pended with an incrementing number!). How this works exactly is probably a question you would have to ask to the Bowtie2 developers directly I am afraid. On a different note, I think your original question raises a very valid point: Is there a way we can discriminate between reads that have only a single unique alignment (and this no |
Yeah I understand the difficulties in reproducing the results from these kinds of programs. I've seen the read position one too, where if you give a single read you get something like AS:i:0 XS:i:-14 but within the full fastq file you get AS:i:-14 AS:i:-14. When seeing the behaviour we've been discussing above my thought was that bowtie2 called by bismark isn't just using those |
No I think the command is using the exact same parameters in both circumstances. Just for the record I have raised an issue on the Bowtie2 project for this as well: BenLangmead/bowtie2#114 |
Thanks for raising the issue with Bowtie2. FYI: I've been able to work out the differences between our bowtie2 and bismark runs. It's because you're using the CT converted result that bismark created with bowtie2 which renamed the read name from;
to
and I was using the original read but converting the C's to T's using |
Excellent, good we got to the bottom of this as well! It appears that Bowtie2 might get an option in the future that will help eliminating these repetitive corner cases in a future release, so yea - well spotted! |
Me too, I honestly thought I was going mad! Yeah I read Ben's comment, looks promising. Glad I could help. |
I have another odd example where bismark is marking a read as ambiguous but I'm not sure it should be. The raw read is this one and it's mapped against TAIR10.
I get one hit back from the CT converted reference with
Wouldn't this mean that there is one good unique mapping result to the GA converted reference and several inferior ones to the CT converted reference rather than it being an ambiguous read? |
Hi Martin, I'll try to take a look at this tomorrow. Thanks, Felix |
Hi Martin, Apologies for taking so long. I did now have time to look at this issue and you were indeed right that the second alignment should trump the first one. I have now changed the timing of when the ambiguous within same thread is reset, and now it appears to work fine. Please can you get the latest development version to see if it works on your end? Fixed in this commit: 40da666. Many thanks for spotting this, Felix |
Hi Felix, That's okay, your commit fixed the issue perfectly. Thank you for looking at that. I was hoping to bump into you at yesterdays BBQ at Babraham to say hi in person but I didn't spot you. Cheers, Martin |
Hi Martin, Great that this looks now fine, I think it warrants a new release soon. You should have said you were here yesterday, we were cycling home before the BBQ started to pick the kids up… Cheers, Felix
From: Martin Vickers <notifications@github.com>
Reply-To: FelixKrueger/Bismark <reply@reply.github.com>
Date: Tuesday, 27 June 2017 at 10:12
To: FelixKrueger/Bismark <Bismark@noreply.github.com>
Cc: Felix Krueger <felix.krueger@babraham.ac.uk>, Mention <mention@noreply.github.com>
Subject: Re: [FelixKrueger/Bismark] Read marked as ambiguous but bowtie2 only returned a single hit (#108)
Hi Felix,
That's okay, your commit fixed the issue perfectly. Thank you for looking at that. I was hoping to bump into you at yesterdays BBQ at Babraham to say hi in person but I didn't spot you.
Cheers, Martin
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#108 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFmTrTHeHGFTEmFLvVPEfzMqbljrfHXnks5sIMdxgaJpZM4Ns_rO>.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
|
TL:DR
I have the following read that bismark won't map against TAIR10 that I think should do;
The reference is TAIR10 with ChrM and ChrC. To try to rule anything odd on my desktop and allow for it to be recreated I've just done this on an AWS Ubuntu instance and put the finer details here in this gist so it can be recreated;
https://gist.github.com/martinjvickers/b987d076aaad5f066b6afb5885a3bb8d
Bit more detail
If you run bowtie2 on this directional single end read (I know I'd have to convert the C's to T's but since there are none in this read I've not) you get a single hit from the CT-CT run and an unmapped result for the CT-GA run.
So I'd have thought that this would be a hit.
I forked bismark and un-commented all of the print out statements in the check_bowtie_results_single_end_bowtie2 function (https://github.com/martinjvickers/Bismark) and reran it and it's being marked as ambiguous. Even more strangely there appears to be a third unmapped hit which bowtie2 never returned when run on individually;
Is this a bug or am I misunderstanding something about how the reads are determined as ambiguous or how bowtie2 is used?
The text was updated successfully, but these errors were encountered: