New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-864] Don't force shuffle if reducing partition count. #866

Merged
merged 1 commit into from Nov 4, 2015

Conversation

Projects
None yet
3 participants
@fnothaft
Member

fnothaft commented Oct 13, 2015

Resolves #864. In Spark, coalescing will reduce the number of partitions in an
RDD without performing a shuffle, but coalescing will only increase the number
of partitions if a shuffle is performed. This PR modifies Transform and Vcf2ADAM
to check whether the coalesce option will increase or decrease the partition
count. Additionally, it adds a flag that allows the user to force a shuffle;
this may be desirable as this causes a HashPartitioned shuffle, which may
improve the balance of records across partitions.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 13, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/1000/
Test PASSed.

AmplabJenkins commented Oct 13, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/1000/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 13, 2015

Member

Could you add ADAM2Fasta in this pull request? It uses -partitions and -shuffle as argument names instead of -coalesce and -force_shuffle_coalesce, respectively.

Member

heuermh commented Oct 13, 2015

Could you add ADAM2Fasta in this pull request? It uses -partitions and -shuffle as argument names instead of -coalesce and -force_shuffle_coalesce, respectively.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft
Member

fnothaft commented Oct 13, 2015

@heuermh Done!

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 13, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/1001/
Test PASSed.

AmplabJenkins commented Oct 13, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/1001/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 13, 2015

Member

LGTM, thanks.

Member

heuermh commented Oct 13, 2015

LGTM, thanks.

@heuermh heuermh added this to the 0.18.2 milestone Nov 2, 2015

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Nov 4, 2015

Member

Can I merge this? If so, is the Merge github button still ok? If I remember correctly, the commit-pr.sh script won't work for me because I'm using git over https.

Member

heuermh commented Nov 4, 2015

Can I merge this? If so, is the Merge github button still ok? If I remember correctly, the commit-pr.sh script won't work for me because I'm using git over https.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 4, 2015

Member

I thought I'd pushed a fix for the commit-pr.sh issue? Let me rebase this first... or, do you just want me to merge it?

Member

fnothaft commented Nov 4, 2015

I thought I'd pushed a fix for the commit-pr.sh issue? Let me rebase this first... or, do you just want me to merge it?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Nov 4, 2015

Member

commit-pr.sh hasn't been revised yet.

Rebase and I'll hit the button, no need to bend the don't-merge-your-own-pull-requests rule. :)

Member

heuermh commented Nov 4, 2015

commit-pr.sh hasn't been revised yet.

Rebase and I'll hit the button, no need to bend the don't-merge-your-own-pull-requests rule. :)

[ADAM-864] Don't force shuffle if reducing partition count.
Resolves #864. In Spark, coalescing will reduce the number of partitions in an
RDD without performing a shuffle, but coalescing will only increase the number
of partitions if a shuffle is performed. This PR modifies Transform and Vcf2ADAM
to check whether the coalesce option will increase or decrease the partition
count. Additionally, it adds a flag that allows the user to force a shuffle;
this may be desirable as this causes a HashPartitioned shuffle, which may
improve the balance of records across partitions. Additionally, we modify
ADAM2Fasta to support similar options.
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 4, 2015

Member

Rebased!

Member

fnothaft commented Nov 4, 2015

Rebased!

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 4, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/1010/
Test PASSed.

AmplabJenkins commented Nov 4, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/1010/
Test PASSed.

heuermh added a commit that referenced this pull request Nov 4, 2015

Merge pull request #866 from fnothaft/check-partition-count
[ADAM-864] Don't force shuffle if reducing partition count.

@heuermh heuermh merged commit b91188b into bigdatagenomics:master Nov 4, 2015

1 check passed

default Merged build finished.
Details
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Nov 4, 2015

Member

Thanks!

Member

heuermh commented Nov 4, 2015

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment