-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AEJ-1617 Prevent Smith-Waterman from integer overflow #142
Conversation
public static final int MAX_SW_SEQUENCE_LENGTH = 32*1024-1; //2^15 - 1 | ||
public static final int MAX_SW_MATCH_VALUE = 64*1024; // 2^16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you reference this in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be match this name to that used in SW java class MAX_SEQUENCE_LENGTH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the TestingUtils
class to hold all common functions and variables across all tests. #134 tests also use this class. I wanted to show that these values are specific to SmithWaterman only. I didn't want to make any confusion if someone would write new tests for PairHMM or Deflater/Inflater in the future.
@@ -0,0 +1,24 @@ | |||
package com.intel.gkl.testing_utils; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use camel format for pkg name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package names should be snake_cased but I can concatenate it and make testingutils
instead.
byte[] align = Arrays.copyOf(ref, ref.length); | ||
|
||
SWParameters SWparameters = new SWParameters(matchValue, -5, -10, -10); | ||
SWNativeAlignerResult result = smithWaterman.align(ref, align, SWparameters, SWOverhangStrategy.IGNORE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to consider other SW Overhang strategies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overhanging strategies won't make a difference in this case. These strategies determine the way misaligned sequence ends are handled. In this test, we expect a full alignment. For instance, in the SOFTCLIP
strategy there would be nothing to truncate.
private final int MAX_SEQUENCE_LENGTH = 32*1024-1; // 2^15 - 1 | ||
private final int MAXIMUM_MATCH_VALUE = 64*1024; // 2^16 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to add note here why these values, and for match values, but I think we should be adding checks where we compute the matrix score, th8s?
As a result of the current implementation limitations (integer data types), the maximum sequence length is 2^15 -1 ~ 32k bytes
Thus, the maximum matching value is ~ 2^16 + 2
(max_sequence_length * match_value <= int32_max_value)
Parameter validation was added to the SmithWaterman Java class:
If any of those conditions is not satisfied
IllegalArgumentException
is raisedThe worst-case scenario is tested: