New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to handle overlapped read clipping in MergeBamAlignment when clipping already present #709
Fix to handle overlapped read clipping in MergeBamAlignment when clipping already present #709
Conversation
…-clipping is already present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few minor comments. I think other folks should be aware of this bug and it's fix.
@@ -590,11 +592,10 @@ private void transferAlignmentInfoToPairedRead(final SAMRecord firstUnaligned, f | |||
* Checks to see whether the ends of the reads overlap and soft clips reads | |||
* them if necessary. | |||
*/ | |||
protected void clipForOverlappingReads(final SAMRecord read1, final SAMRecord read2) { | |||
protected static void clipForOverlappingReads(final SAMRecord read1, final SAMRecord read2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while we're here, can this be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it needs to be protected or package level for the tests to be able to access it.
int clipped = 0; | ||
while (iterator.hasNext()) { | ||
final CigarElement elem = iterator.next(); | ||
if (elem.getOperator() != CigarOperator.SOFT_CLIP) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this will do the wrong thing if we have hard clips (ex. 120M12S8H
), namely break too early. I think that's the only one though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. I've fixed that. The only remaining issue is that it will lose the hard-clipping due to the issue here:
https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/util/CigarUtil.java#L46
The cigar will be correctly soft-clipped, but the hard-clip will get lost. I'm not inclined to try and fix that at the moment unless you think it's critical.
Original bug report: #708 |
I can confirm this fixes the test case on #708 |
@nh13 Second commit pushed to handle the hard-clipping. It's not ideal though, so do you want to take a second look? |
if (elem.getOperator() != CigarOperator.SOFT_CLIP) break; | ||
clipped = elem.getLength(); | ||
if (elem.getOperator() != CigarOperator.SOFT_CLIP && elem.getOperator() != CigarOperator.HARD_CLIP) break; | ||
if (elem.getOperator() == CigarOperator.SOFT_CLIP) clipped = elem.getLength(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine that we will lose the hard clip, since methods in CigarUtil
also note this. Perhaps add a TODO here and where this function is used that it doesn't handle hard clips?
@tfenne one minor comment about adding a note about losing the hard clips and then merge at will. |
@ktibbett Do you want to take a peek at this at all since it affects |
Description
Fixed the clipping of overlapping reads which is currently broken in the case where the reads already contain soft-clipping. If the reads contain softclipping at their 3' ends the overlapping read clipping should add to that soft-clipping in order to stop the reads alignment's extending past each other's 5' ends. Prior to the change, the existing soft-clipping was ignored, and hence the result was incorrect.
Checklist (never delete this)
Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.
Content
Review