-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More fix enum hashCode #4623
More fix enum hashCode #4623
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4623 +/- ##
===============================================
- Coverage 79.821% 79.813% -0.008%
Complexity 17152 17152
===============================================
Files 1067 1067
Lines 62405 62415 +10
Branches 10126 10130 +4
===============================================
+ Hits 49812 49815 +3
- Misses 8648 8655 +7
Partials 3945 3945
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question and a refactoring suggestion...
@@ -10,4 +34,213 @@ | |||
public abstract class AlignedContigGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this class should really just be an interface? The two subclasses could be top-level classes that just implement the interface. It's a bit weird to have this be an abstract class when it really only defines the signature of one method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. But in 2nd thought, this is not related to this fix, so I'll do the refactor and the move in another PR.
I'm trying to put the implementations in one place and see if I can have one concrete class, just to avoid the two code path that exist mostly for historical reasons.
@@ -207,11 +204,15 @@ private SvDiscoveryInputData getSvDiscoveryInputData(final JavaSparkContext ctx, | |||
final String outputPrefixWithSampleName = variantsOutDir + (variantsOutDir.endsWith("/") ? "" : "/") | |||
+ SVUtils.getSampleId(headerForReads) + "_"; | |||
|
|||
final JavaRDD<GATKRead> reloadedContigAlignments = new ReadsSparkSource(ctx, readArguments.getReadValidationStringency()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how moving the two classes to live in the abstract class would change anything, so maybe this is what actually resolved the behavior for you? Before, how was getReads
not returning the raw reads in the original bam file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That getReads()
call was not right, and luckily it was actually not used so I'm going to fix it in a followup cleanup and test coverage bump PR.
for (final Strand strand : dupSeqStrandOnCtg) { | ||
result = 31 * result + Objects.hashCode( strand.ordinal() ); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual fix, I believe. Chris and I missed the list of enums. (A List's hashCode combines the hashCode's of its elements, and that's no good on Spark.)
03b460f
to
e947f5e
Compare
result = 31 * result + Objects.hashCode( strand.ordinal() ); | ||
} | ||
for (final Strand strand : dupSeqStrandOnCtg) { | ||
result = 31 * result + Objects.hashCode( strand.ordinal() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects.hashCode(strand.ordinal()) is a very expensive way of calculating strand.ordinal(). Maybe just use strand.ordinal() here, and on lines 1081 and 1086 below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right, that was dumb of me. @SHuang-Broad, can you change that to just ordinal()
instead of Objects.hashCode(strandSwitch.ordinal())
in the other places I put it in as well (AlignmentInterval
, CpxVariantInducingAssemblyContig
, ChimericAlignment
, and NovelAdjacencyAndAltHaplotype
)? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have noticed this on Chris' earlier check-in, too, but didn't. It also has this unnecessary indirection through Objects.hashCode. Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you both! It's been done now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the suggestion for simplifying the calculation of the hashCode of a Strand, this is fine with me.
Looks like you might have a failing test, too.
e947f5e
to
bf0b479
Compare
bf0b479
to
344db59
Compare
Verified to be running correctly and not generating duplicates anymore. |
More fix on hashcode functions of structs that contain enums
in
BreakpointComplications
After this fix, the duplicated records disappear.