[SPARK-29153][CORE]Add ability to merge resource profiles within a stage with Stage Level Scheduling #28053

tgravescs · 2020-03-27T13:28:44Z

What changes were proposed in this pull request?

For the stage level scheduling feature, add the ability to optionally merged resource profiles if they were specified on multiple RDD within a stage. There is a config to enable this feature, its off by default (spark.scheduler.resourceProfile.mergeConflicts). When the config is set to true, Spark will merge the profiles selecting the max value of each resource (cores, memory, gpu, etc). further documentation will be added with SPARK-30322.

This also added in the ability to check if an equivalent resource profile already exists. This is so that if a user is running stages and combining the same profiles over and over again we don't get an explosion in the number of profiles.

Why are the changes needed?

To allow users to specify resource on multiple RDD and not worry as much about if they go into the same stage and fail.

Does this PR introduce any user-facing change?

Yes, when the config is turned on it now merges the profiles instead of errorring out.

How was this patch tested?

Unit tests

SparkQA · 2020-03-27T16:45:29Z

Test build #120493 has finished for PR 28053 at commit 3783a37.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

core/src/main/scala/org/apache/spark/internal/config/package.scala

core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala

core/src/main/scala/org/apache/spark/internal/config/package.scala

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala

core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala

SparkQA · 2020-03-30T20:38:46Z

Test build #120607 has finished for PR 28053 at commit 3730e07.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2020-03-30T20:57:58Z

test this please

SparkQA · 2020-03-30T21:18:54Z

Test build #120606 has finished for PR 28053 at commit aab2964.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-03-30T23:44:21Z

Test build #120611 has finished for PR 28053 at commit 3730e07.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Ngone51 · 2020-04-01T06:19:25Z

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

+      } else {
+        throw new IllegalArgumentException("Multiple ResourceProfiles specified in the RDDs for " +
+          "this stage, either resolve the conflicting ResourceProfile's yourself or enable " +
+          "spark.scheduler.resourceProfile.mergeConflicts and understand how Spark handles " +


Could we refer to the conf variable instead hard-coded?

Ngone51 · 2020-04-01T06:28:44Z

LGTM, except one minor comment.

tgravescs · 2020-04-01T13:36:10Z

thanks for the reviews, @dongjoon-hyun did you have any further comments?

SparkQA · 2020-04-01T16:16:40Z

Test build #120678 has finished for PR 28053 at commit c3c0598.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

dongjoon-hyun

+1, LGTM.

SparkQA · 2020-04-01T22:38:37Z

Test build #120688 has finished for PR 28053 at commit 452378b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2020-04-02T13:30:39Z

merged to master. Thanks @dongjoon-hyun @Ngone51

…tage with Stage Level Scheduling ### What changes were proposed in this pull request? For the stage level scheduling feature, add the ability to optionally merged resource profiles if they were specified on multiple RDD within a stage. There is a config to enable this feature, its off by default (spark.scheduler.resourceProfile.mergeConflicts). When the config is set to true, Spark will merge the profiles selecting the max value of each resource (cores, memory, gpu, etc). further documentation will be added with SPARK-30322. This also added in the ability to check if an equivalent resource profile already exists. This is so that if a user is running stages and combining the same profiles over and over again we don't get an explosion in the number of profiles. ### Why are the changes needed? To allow users to specify resource on multiple RDD and not worry as much about if they go into the same stage and fail. ### Does this PR introduce any user-facing change? Yes, when the config is turned on it now merges the profiles instead of errorring out. ### How was this patch tested? Unit tests Closes apache#28053 from tgravescs/SPARK-29153. Lead-authored-by: Thomas Graves <tgraves@apache.org> Co-authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>

tgravescs added 2 commits March 26, 2020 17:29

Add ability to merge resource profiles within a stage

aec786b

remove extra line

3783a37

dongjoon-hyun reviewed Mar 27, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/internal/config/package.scala Show resolved Hide resolved

dongjoon-hyun reviewed Mar 27, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala Show resolved Hide resolved

dongjoon-hyun reviewed Mar 27, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala Outdated Show resolved Hide resolved

dongjoon-hyun added the SPARK CORE label Mar 27, 2020

Ngone51 reviewed Mar 30, 2020

View reviewed changes

tgravescs added 2 commits March 30, 2020 12:26

Update per review comments

405991b

Update locking in ResourceProfileManager to use read and write lock

aab2964

tgravescs commented Mar 30, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala Outdated Show resolved Hide resolved

only print message when adding new profile

3730e07

Ngone51 reviewed Apr 1, 2020

View reviewed changes

Change to use config key

c3c0598

dongjoon-hyun reviewed Apr 1, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Apr 1, 2020

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala Outdated Show resolved Hide resolved

Fix typos

452378b

dongjoon-hyun approved these changes Apr 1, 2020

View reviewed changes

asfgit closed this in 55dea9b Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29153][CORE]Add ability to merge resource profiles within a stage with Stage Level Scheduling #28053

[SPARK-29153][CORE]Add ability to merge resource profiles within a stage with Stage Level Scheduling #28053

tgravescs commented Mar 27, 2020

SparkQA commented Mar 27, 2020

SparkQA commented Mar 30, 2020

tgravescs commented Mar 30, 2020

SparkQA commented Mar 30, 2020

SparkQA commented Mar 30, 2020

Ngone51 Apr 1, 2020

Ngone51 commented Apr 1, 2020

tgravescs commented Apr 1, 2020

SparkQA commented Apr 1, 2020

dongjoon-hyun left a comment

SparkQA commented Apr 1, 2020

tgravescs commented Apr 2, 2020

[SPARK-29153][CORE]Add ability to merge resource profiles within a stage with Stage Level Scheduling #28053

[SPARK-29153][CORE]Add ability to merge resource profiles within a stage with Stage Level Scheduling #28053

Conversation

tgravescs commented Mar 27, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Mar 27, 2020

SparkQA commented Mar 30, 2020

tgravescs commented Mar 30, 2020

SparkQA commented Mar 30, 2020

SparkQA commented Mar 30, 2020

Ngone51 Apr 1, 2020

Choose a reason for hiding this comment

Ngone51 commented Apr 1, 2020

tgravescs commented Apr 1, 2020

SparkQA commented Apr 1, 2020

dongjoon-hyun left a comment

Choose a reason for hiding this comment

SparkQA commented Apr 1, 2020

tgravescs commented Apr 2, 2020