New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenomicRDD shuffle region join passes partition count to partition size #1220

Closed
fnothaft opened this Issue Oct 22, 2016 · 0 comments

Comments

Projects
1 participant
@fnothaft
Member

fnothaft commented Oct 22, 2016

Leads to joined RDDs having a hilarious number of partitions. See https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/ShuffleRegionJoin.scala#L125 vs https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/GenomicRDD.scala#L362. Interestingly, this seems to work, but with bad performance. Fortunately, there is a WAR since you can override the value passed to partitionLength through the optPartitions parameter. This'll be a simple fix.

@fnothaft fnothaft added the bug label Oct 22, 2016

@fnothaft fnothaft added this to the 0.21.0 milestone Oct 22, 2016

@fnothaft fnothaft self-assigned this Oct 22, 2016

fnothaft added a commit to fnothaft/adam that referenced this issue Nov 8, 2016

[ADAM-1220] Fix optPartitions parameter in shuffle region join hooks …
…in GenomicRDD.

Resolves #1220. Adds a function called in each of the shuffle join implementations
that calculates the sequence dictionary after the join, as well as the partition sizes
to request.

fnothaft added a commit to fnothaft/adam that referenced this issue Dec 1, 2016

[ADAM-1220] Fix optPartitions parameter in shuffle region join hooks …
…in GenomicRDD.

Resolves #1220. Adds a function called in each of the shuffle join implementations
that calculates the sequence dictionary after the join, as well as the partition sizes
to request.

@heuermh heuermh closed this in #1253 Dec 6, 2016

heuermh added a commit that referenced this issue Dec 6, 2016

[ADAM-1220] Fix optPartitions parameter in shuffle region join hooks …
…in GenomicRDD.

Resolves #1220. Adds a function called in each of the shuffle join implementations
that calculates the sequence dictionary after the join, as well as the partition sizes
to request.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment