Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenomicRDD shuffle region join passes partition count to partition size #1220

Closed
fnothaft opened this issue Oct 22, 2016 · 0 comments
Closed

GenomicRDD shuffle region join passes partition count to partition size #1220

fnothaft opened this issue Oct 22, 2016 · 0 comments
Assignees
Labels
bug
Milestone

Comments

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Oct 22, 2016

Leads to joined RDDs having a hilarious number of partitions. See https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/ShuffleRegionJoin.scala#L125 vs https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/GenomicRDD.scala#L362. Interestingly, this seems to work, but with bad performance. Fortunately, there is a WAR since you can override the value passed to partitionLength through the optPartitions parameter. This'll be a simple fix.

@fnothaft fnothaft added the bug label Oct 22, 2016
@fnothaft fnothaft added this to the 0.21.0 milestone Oct 22, 2016
@fnothaft fnothaft self-assigned this Oct 22, 2016
fnothaft added a commit to fnothaft/adam that referenced this issue Nov 8, 2016
…in GenomicRDD.

Resolves bigdatagenomics#1220. Adds a function called in each of the shuffle join implementations
that calculates the sequence dictionary after the join, as well as the partition sizes
to request.
fnothaft added a commit to fnothaft/adam that referenced this issue Dec 1, 2016
…in GenomicRDD.

Resolves bigdatagenomics#1220. Adds a function called in each of the shuffle join implementations
that calculates the sequence dictionary after the join, as well as the partition sizes
to request.
@heuermh heuermh closed this in #1253 Dec 6, 2016
heuermh added a commit that referenced this issue Dec 6, 2016
…in GenomicRDD.

Resolves #1220. Adds a function called in each of the shuffle join implementations
that calculates the sequence dictionary after the join, as well as the partition sizes
to request.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.