New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right and Left Outer Shuffle Region Join don't match #1813

Closed
fnothaft opened this Issue Dec 1, 2017 · 0 comments

Comments

Projects
1 participant
@fnothaft
Member

fnothaft commented Dec 1, 2017

E.g.,

scala> snps.leftOuterShuffleRegionJoin(filteredSnps).transform(_.filter(r => !r._2.isEmpty)).rdd.count
res8: Long = 774212                                                             

scala> filteredSnps.rightOuterShuffleRegionJoin(snps).transform(_.filter(r => !r._1.isEmpty)).rdd.count
res9: Long = 197826                                                             

scala> filteredSnps.shuffleRegionJoin(snps).rdd.count
res10: Long = 774212                                                            

scala> snps.shuffleRegionJoin(filteredSnps).rdd.count
res11: Long = 774212

Historically, rightOuterShuffleRegionJoin just called leftOuterShuffleRegionJoin (at the sort/merge join level, see here) followed by swap on the tuples. ad5ae6d introduced a new implementation of the right outer shuffle region join that has correctness issues.

@fnothaft fnothaft added the bug label Dec 1, 2017

@fnothaft fnothaft added this to the 0.23.0 milestone Dec 1, 2017

@fnothaft fnothaft self-assigned this Dec 1, 2017

fnothaft added a commit to fnothaft/adam that referenced this issue Dec 1, 2017

[ADAM-1813] Delegate right outer shuffle region join to left OSRJ imp…
…lementation.

Left and right outer joins are symmetric: that is to say, a right outer join is
can be rewritten as a left outer join by swapping the two input tables, and by
modifying the layout of the output. To resolve the mismatch between the left and
right outer joins, this PR deletes the right outer join implementation and
delegates back to the left outer join + tuple order swap. Resolves bigdatagenomics#1813.

@heuermh heuermh closed this in #1814 Dec 1, 2017

heuermh added a commit that referenced this issue Dec 1, 2017

[ADAM-1813] Delegate right outer shuffle region join to left OSRJ imp…
…lementation.

Left and right outer joins are symmetric: that is to say, a right outer join is
can be rewritten as a left outer join by swapping the two input tables, and by
modifying the layout of the output. To resolve the mismatch between the left and
right outer joins, this PR deletes the right outer join implementation and
delegates back to the left outer join + tuple order swap. Resolves #1813.

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment