Hotfix for various TagPileup bugs#77
Merged
WilliamKMLai merged 4 commits intomasterfrom Jan 3, 2022
Merged
Conversation
Bug discovered with test data showing 1bp shift downstream in ticket #75 The getUnclippedEnd() function returns a 1-indexed inclusive coordinate which needs to be decremented to be 0-indexed when assigning the FivePrime mark variable's value. This is assigned in two places (paired end and not-paired end data code blocks).
By removing the code block shifting the BED coordinates that appears to be some sort of strand-specific correction, we restore the data to the correct composite (checking for BEDcoord with different strands and checking both sense and antisense composites). Suspect the correction code block that is removed in this commit was for correction of code that is no longer in place. Relates to issue #75
The rationale is outlined in issue #76. PileupExtract is updated for pileups that require proper pairs. The midpoint is changed to get the leftmost coordinate of the insert (getAlignmentStart or getMateAlignmentStart depending if R1 or R2 is the leftmost read) and add on half of the insert size. Note the correction for when the BED interval is even and on the negative strand. This ensures that even though we perform a floor integer division calculation, the distance from the 5' end of the BED interval is consistent between BED intervals. Odd intervals have consistent 5' distances regardless of direction. The filter for insert size is changed to use the built-in SamRecord function getInferredInsertSize() by checking if the absolute value is more or less than the limits specified by the PileupParameters object. I also switched the filter to use a continue statement instead of setting FivePrime to an invalid position in order to save a little on downstream computation.
Invert if statement int TagPileup so that it parses BED coordinates such that unexpected strand characters default to the positive strand ("+").
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request includes fixes for minor shifts in tag pileup under specific conditions. More specifically, fixes for antisense 1bp shift, strand-specific shift correction, and adjustment to insert size determination (#75 & #76).