More fix enum hashCode #4623

SHuang-Broad · 2018-04-03T06:48:43Z

in BreakpointComplications

After this fix, the duplicated records disappear.

codecov-io · 2018-04-03T07:44:06Z

Codecov Report

Merging #4623 into master will decrease coverage by 0.008%.
The diff coverage is 16.667%.

@@               Coverage Diff               @@
##              master     #4623       +/-   ##
===============================================
- Coverage     79.821%   79.813%   -0.008%     
  Complexity     17152     17152               
===============================================
  Files           1067      1067               
  Lines          62405     62415       +10     
  Branches       10126     10130        +4     
===============================================
+ Hits           49812     49815        +3     
- Misses          8648      8655        +7     
  Partials        3945      3945

Impacted Files	Coverage Δ	Complexity Δ
...v/discovery/inference/BreakpointComplications.java	`59.351% <0%> (-1.155%)`	`20 <0> (ø)`
...ry/inference/CpxVariantInducingAssemblyContig.java	`83.178% <0%> (ø)`	`24 <0> (ø)`	⬇️
...overy/inference/NovelAdjacencyAndAltHaplotype.java	`63.393% <100%> (ø)`	`18 <0> (ø)`	⬇️
...park/sv/discovery/inference/ChimericAlignment.java	`74.627% <100%> (ø)`	`33 <0> (ø)`	⬇️
...park/sv/discovery/alignment/AlignmentInterval.java	`92.803% <100%> (ø)`	`83 <0> (ø)`	⬇️
...oadinstitute/hellbender/utils/gcs/BucketUtils.java	`80% <0%> (+1.29%)`	`39% <0%> (ø)`	⬇️
...utils/smithwaterman/SmithWatermanIntelAligner.java	`90% <0%> (+10%)`	`3% <0%> (ø)`	⬇️

cwhelan

I have a question and a refactoring suggestion...

cwhelan · 2018-04-03T13:57:18Z

...org/broadinstitute/hellbender/tools/spark/sv/discovery/alignment/AlignedContigGenerator.java

@@ -10,4 +34,213 @@
 public abstract class AlignedContigGenerator {


It seems like this class should really just be an interface? The two subclasses could be top-level classes that just implement the interface. It's a bit weird to have this be an abstract class when it really only defines the signature of one method.

Agree. But in 2nd thought, this is not related to this fix, so I'll do the refactor and the move in another PR.
I'm trying to put the implementations in one place and see if I can have one concrete class, just to avoid the two code path that exist mostly for historical reasons.

cwhelan · 2018-04-03T13:59:43Z

.../org/broadinstitute/hellbender/tools/spark/sv/StructuralVariationDiscoveryPipelineSpark.java

@@ -207,11 +204,15 @@ private SvDiscoveryInputData getSvDiscoveryInputData(final JavaSparkContext ctx,
        final String outputPrefixWithSampleName = variantsOutDir + (variantsOutDir.endsWith("/") ? "" : "/")
                                                    + SVUtils.getSampleId(headerForReads) + "_";

+        final JavaRDD<GATKRead> reloadedContigAlignments = new ReadsSparkSource(ctx, readArguments.getReadValidationStringency())


I don't see how moving the two classes to live in the abstract class would change anything, so maybe this is what actually resolved the behavior for you? Before, how was getReads not returning the raw reads in the original bam file?

That getReads() call was not right, and luckily it was actually not used so I'm going to fix it in a followup cleanup and test coverage bump PR.

tedsharpe · 2018-04-03T15:39:04Z

...rg/broadinstitute/hellbender/tools/spark/sv/discovery/inference/BreakpointComplications.java

+                for (final Strand strand : dupSeqStrandOnCtg) {
+                    result = 31 * result + Objects.hashCode( strand.ordinal() );
+                }
+            }


This is the actual fix, I believe. Chris and I missed the list of enums. (A List's hashCode combines the hashCode's of its elements, and that's no good on Spark.)

tedsharpe · 2018-04-04T16:15:44Z

...rg/broadinstitute/hellbender/tools/spark/sv/discovery/inference/BreakpointComplications.java

+                result = 31 * result + Objects.hashCode( strand.ordinal() );
+            }
+            for (final Strand strand : dupSeqStrandOnCtg) {
+                result = 31 * result + Objects.hashCode( strand.ordinal() );


Objects.hashCode(strand.ordinal()) is a very expensive way of calculating strand.ordinal(). Maybe just use strand.ordinal() here, and on lines 1081 and 1086 below.

Oh, you're right, that was dumb of me. @SHuang-Broad, can you change that to just ordinal() instead of Objects.hashCode(strandSwitch.ordinal()) in the other places I put it in as well (AlignmentInterval, CpxVariantInducingAssemblyContig, ChimericAlignment, and NovelAdjacencyAndAltHaplotype)? Thanks.

I should have noticed this on Chris' earlier check-in, too, but didn't. It also has this unnecessary indirection through Objects.hashCode. Sorry.

Thank you both! It's been done now.

tedsharpe

Other than the suggestion for simplifying the calculation of the hashCode of a Strand, this is fine with me.
Looks like you might have a failing test, too.

SHuang-Broad · 2018-04-07T16:44:10Z

Verified to be running correctly and not generating duplicates anymore.

More fix on hashcode functions of structs that contain enums

SHuang-Broad added the SV label Apr 3, 2018

SHuang-Broad mentioned this pull request Apr 3, 2018

switch away from calling hashCode directly on enums #4621

Merged

cwhelan approved these changes Apr 3, 2018

View reviewed changes

tedsharpe reviewed Apr 3, 2018

View reviewed changes

SHuang-Broad force-pushed the sh_more_fix_enum_hascodes branch from 03b460f to e947f5e Compare April 4, 2018 07:04

tedsharpe reviewed Apr 4, 2018

View reviewed changes

tedsharpe approved these changes Apr 4, 2018

View reviewed changes

SHuang-Broad force-pushed the sh_more_fix_enum_hascodes branch from e947f5e to bf0b479 Compare April 7, 2018 09:14

(SV) more hashcode fixes

344db59

SHuang-Broad force-pushed the sh_more_fix_enum_hascodes branch from bf0b479 to 344db59 Compare April 7, 2018 13:50

SHuang-Broad merged commit d48895f into master Apr 7, 2018

SHuang-Broad deleted the sh_more_fix_enum_hascodes branch April 7, 2018 16:45

cwhelan pushed a commit to cwhelan/gatk-linked-reads that referenced this pull request May 25, 2018

(SV) more hashcode fixes (broadinstitute#4623)

65add25

More fix on hashcode functions of structs that contain enums

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More fix enum hashCode #4623

More fix enum hashCode #4623

SHuang-Broad commented Apr 3, 2018

codecov-io commented Apr 3, 2018 •

edited

cwhelan left a comment

cwhelan Apr 3, 2018

SHuang-Broad Apr 4, 2018

cwhelan Apr 3, 2018

SHuang-Broad Apr 7, 2018

tedsharpe Apr 3, 2018

tedsharpe Apr 4, 2018

cwhelan Apr 4, 2018

tedsharpe Apr 4, 2018

SHuang-Broad Apr 7, 2018

tedsharpe left a comment •

edited

SHuang-Broad commented Apr 7, 2018

		@@ -10,4 +34,213 @@
		public abstract class AlignedContigGenerator {

More fix enum hashCode #4623

More fix enum hashCode #4623

Conversation

SHuang-Broad commented Apr 3, 2018

codecov-io commented Apr 3, 2018 • edited

Codecov Report

cwhelan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsharpe left a comment • edited

Choose a reason for hiding this comment

SHuang-Broad commented Apr 7, 2018

codecov-io commented Apr 3, 2018 •

edited

tedsharpe left a comment •

edited