Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16817][CORE][WIP] Use Alluxio to improve stability of shuffle by replication of shuffle data #22005

Closed
wants to merge 4 commits into from

Conversation

Chopinxb
Copy link

@Chopinxb Chopinxb commented Aug 6, 2018

What changes were proposed in this pull request?

(In the PR, I propose to use Alluxio to help store shuffle data in order to improve the stability of complicated OLAP task.
Motivation
In original ways, when there is a shuffle fetch failure (NodeManager(shuffle service) crashed), spark will rerun previous stage to reproduce shuffle data. This way works well, but in some cases we cannot accept the recalculation price.
In this PR, when there is a shuffle fetch failure , reduce will retry fetch shuffle data from Alluxio to avoid recalculation
Usage

  1. Enable this feature in spark-default.conf.
    spark.alluxio.shuffle.enabled ture

How was this patch tested?

manual tests

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jerryshao
Copy link
Contributor

I believe such kind of PR requires SPIP and community discussion first.

@srowen srowen mentioned this pull request Nov 10, 2018
@asfgit asfgit closed this in a3ba3a8 Nov 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants