Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More fix enum hashCode #4623

Merged
merged 1 commit into from
Apr 7, 2018
Merged

More fix enum hashCode #4623

merged 1 commit into from
Apr 7, 2018

Conversation

SHuang-Broad
Copy link
Contributor

in BreakpointComplications

After this fix, the duplicated records disappear.

@codecov-io
Copy link

codecov-io commented Apr 3, 2018

Codecov Report

Merging #4623 into master will decrease coverage by 0.008%.
The diff coverage is 16.667%.

@@               Coverage Diff               @@
##              master     #4623       +/-   ##
===============================================
- Coverage     79.821%   79.813%   -0.008%     
  Complexity     17152     17152               
===============================================
  Files           1067      1067               
  Lines          62405     62415       +10     
  Branches       10126     10130        +4     
===============================================
+ Hits           49812     49815        +3     
- Misses          8648      8655        +7     
  Partials        3945      3945
Impacted Files Coverage Δ Complexity Δ
...v/discovery/inference/BreakpointComplications.java 59.351% <0%> (-1.155%) 20 <0> (ø)
...ry/inference/CpxVariantInducingAssemblyContig.java 83.178% <0%> (ø) 24 <0> (ø) ⬇️
...overy/inference/NovelAdjacencyAndAltHaplotype.java 63.393% <100%> (ø) 18 <0> (ø) ⬇️
...park/sv/discovery/inference/ChimericAlignment.java 74.627% <100%> (ø) 33 <0> (ø) ⬇️
...park/sv/discovery/alignment/AlignmentInterval.java 92.803% <100%> (ø) 83 <0> (ø) ⬇️
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 80% <0%> (+1.29%) 39% <0%> (ø) ⬇️
...utils/smithwaterman/SmithWatermanIntelAligner.java 90% <0%> (+10%) 3% <0%> (ø) ⬇️

Copy link
Member

@cwhelan cwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question and a refactoring suggestion...

@@ -10,4 +34,213 @@
public abstract class AlignedContigGenerator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this class should really just be an interface? The two subclasses could be top-level classes that just implement the interface. It's a bit weird to have this be an abstract class when it really only defines the signature of one method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. But in 2nd thought, this is not related to this fix, so I'll do the refactor and the move in another PR.
I'm trying to put the implementations in one place and see if I can have one concrete class, just to avoid the two code path that exist mostly for historical reasons.

@@ -207,11 +204,15 @@ private SvDiscoveryInputData getSvDiscoveryInputData(final JavaSparkContext ctx,
final String outputPrefixWithSampleName = variantsOutDir + (variantsOutDir.endsWith("/") ? "" : "/")
+ SVUtils.getSampleId(headerForReads) + "_";

final JavaRDD<GATKRead> reloadedContigAlignments = new ReadsSparkSource(ctx, readArguments.getReadValidationStringency())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how moving the two classes to live in the abstract class would change anything, so maybe this is what actually resolved the behavior for you? Before, how was getReads not returning the raw reads in the original bam file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That getReads() call was not right, and luckily it was actually not used so I'm going to fix it in a followup cleanup and test coverage bump PR.

for (final Strand strand : dupSeqStrandOnCtg) {
result = 31 * result + Objects.hashCode( strand.ordinal() );
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual fix, I believe. Chris and I missed the list of enums. (A List's hashCode combines the hashCode's of its elements, and that's no good on Spark.)

result = 31 * result + Objects.hashCode( strand.ordinal() );
}
for (final Strand strand : dupSeqStrandOnCtg) {
result = 31 * result + Objects.hashCode( strand.ordinal() );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects.hashCode(strand.ordinal()) is a very expensive way of calculating strand.ordinal(). Maybe just use strand.ordinal() here, and on lines 1081 and 1086 below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right, that was dumb of me. @SHuang-Broad, can you change that to just ordinal() instead of Objects.hashCode(strandSwitch.ordinal()) in the other places I put it in as well (AlignmentInterval, CpxVariantInducingAssemblyContig, ChimericAlignment, and NovelAdjacencyAndAltHaplotype)? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noticed this on Chris' earlier check-in, too, but didn't. It also has this unnecessary indirection through Objects.hashCode. Sorry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both! It's been done now.

Copy link
Contributor

@tedsharpe tedsharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the suggestion for simplifying the calculation of the hashCode of a Strand, this is fine with me.
Looks like you might have a failing test, too.

@SHuang-Broad
Copy link
Contributor Author

Verified to be running correctly and not generating duplicates anymore.

@SHuang-Broad SHuang-Broad merged commit d48895f into master Apr 7, 2018
@SHuang-Broad SHuang-Broad deleted the sh_more_fix_enum_hascodes branch April 7, 2018 16:45
cwhelan pushed a commit to cwhelan/gatk-linked-reads that referenced this pull request May 25, 2018
More fix on hashcode functions of structs that contain enums
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants