Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29153][CORE]Add ability to merge resource profiles within a stage with Stage Level Scheduling #28053

Closed
wants to merge 7 commits into from

Conversation

tgravescs
Copy link
Contributor

What changes were proposed in this pull request?

For the stage level scheduling feature, add the ability to optionally merged resource profiles if they were specified on multiple RDD within a stage. There is a config to enable this feature, its off by default (spark.scheduler.resourceProfile.mergeConflicts). When the config is set to true, Spark will merge the profiles selecting the max value of each resource (cores, memory, gpu, etc). further documentation will be added with SPARK-30322.

This also added in the ability to check if an equivalent resource profile already exists. This is so that if a user is running stages and combining the same profiles over and over again we don't get an explosion in the number of profiles.

Why are the changes needed?

To allow users to specify resource on multiple RDD and not worry as much about if they go into the same stage and fail.

Does this PR introduce any user-facing change?

Yes, when the config is turned on it now merges the profiles instead of errorring out.

How was this patch tested?

Unit tests

@SparkQA
Copy link

SparkQA commented Mar 27, 2020

Test build #120493 has finished for PR 28053 at commit 3783a37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 30, 2020

Test build #120607 has finished for PR 28053 at commit 3730e07.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Mar 30, 2020

Test build #120606 has finished for PR 28053 at commit aab2964.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 30, 2020

Test build #120611 has finished for PR 28053 at commit 3730e07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
throw new IllegalArgumentException("Multiple ResourceProfiles specified in the RDDs for " +
"this stage, either resolve the conflicting ResourceProfile's yourself or enable " +
"spark.scheduler.resourceProfile.mergeConflicts and understand how Spark handles " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we refer to the conf variable instead hard-coded?

@Ngone51
Copy link
Member

Ngone51 commented Apr 1, 2020

LGTM, except one minor comment.

@tgravescs
Copy link
Contributor Author

thanks for the reviews, @dongjoon-hyun did you have any further comments?

@SparkQA
Copy link

SparkQA commented Apr 1, 2020

Test build #120678 has finished for PR 28053 at commit c3c0598.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@SparkQA
Copy link

SparkQA commented Apr 1, 2020

Test build #120688 has finished for PR 28053 at commit 452378b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor Author

merged to master. Thanks @dongjoon-hyun @Ngone51

@asfgit asfgit closed this in 55dea9b Apr 2, 2020
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…tage with Stage Level Scheduling

### What changes were proposed in this pull request?

For the stage level scheduling feature, add the ability to optionally merged resource profiles if they were specified on multiple RDD within a stage.  There is a config to enable this feature, its off by default (spark.scheduler.resourceProfile.mergeConflicts). When the config is set to true, Spark will merge the profiles selecting the max value of each resource (cores, memory, gpu, etc).  further documentation will be added with SPARK-30322.

This also added in the ability to check if an equivalent resource profile already exists. This is so that if a user is running stages and combining the same profiles over and over again we don't get an explosion in the number of profiles.

### Why are the changes needed?

To allow users to specify resource on multiple RDD and not worry as much about if they go into the same stage and fail.

### Does this PR introduce any user-facing change?

Yes, when the config is turned on it now merges the profiles instead of errorring out.

### How was this patch tested?

Unit tests

Closes apache#28053 from tgravescs/SPARK-29153.

Lead-authored-by: Thomas Graves <tgraves@apache.org>
Co-authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants