New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29153][CORE]Add ability to merge resource profiles within a stage with Stage Level Scheduling #28053
Conversation
Test build #120493 has finished for PR 28053 at commit
|
core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala
Outdated
Show resolved
Hide resolved
Test build #120607 has finished for PR 28053 at commit
|
test this please |
Test build #120606 has finished for PR 28053 at commit
|
Test build #120611 has finished for PR 28053 at commit
|
} else { | ||
throw new IllegalArgumentException("Multiple ResourceProfiles specified in the RDDs for " + | ||
"this stage, either resolve the conflicting ResourceProfile's yourself or enable " + | ||
"spark.scheduler.resourceProfile.mergeConflicts and understand how Spark handles " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we refer to the conf variable instead hard-coded?
LGTM, except one minor comment. |
thanks for the reviews, @dongjoon-hyun did you have any further comments? |
Test build #120678 has finished for PR 28053 at commit
|
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Test build #120688 has finished for PR 28053 at commit
|
merged to master. Thanks @dongjoon-hyun @Ngone51 |
…tage with Stage Level Scheduling ### What changes were proposed in this pull request? For the stage level scheduling feature, add the ability to optionally merged resource profiles if they were specified on multiple RDD within a stage. There is a config to enable this feature, its off by default (spark.scheduler.resourceProfile.mergeConflicts). When the config is set to true, Spark will merge the profiles selecting the max value of each resource (cores, memory, gpu, etc). further documentation will be added with SPARK-30322. This also added in the ability to check if an equivalent resource profile already exists. This is so that if a user is running stages and combining the same profiles over and over again we don't get an explosion in the number of profiles. ### Why are the changes needed? To allow users to specify resource on multiple RDD and not worry as much about if they go into the same stage and fail. ### Does this PR introduce any user-facing change? Yes, when the config is turned on it now merges the profiles instead of errorring out. ### How was this patch tested? Unit tests Closes apache#28053 from tgravescs/SPARK-29153. Lead-authored-by: Thomas Graves <tgraves@apache.org> Co-authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves <tgraves@apache.org>
What changes were proposed in this pull request?
For the stage level scheduling feature, add the ability to optionally merged resource profiles if they were specified on multiple RDD within a stage. There is a config to enable this feature, its off by default (spark.scheduler.resourceProfile.mergeConflicts). When the config is set to true, Spark will merge the profiles selecting the max value of each resource (cores, memory, gpu, etc). further documentation will be added with SPARK-30322.
This also added in the ability to check if an equivalent resource profile already exists. This is so that if a user is running stages and combining the same profiles over and over again we don't get an explosion in the number of profiles.
Why are the changes needed?
To allow users to specify resource on multiple RDD and not worry as much about if they go into the same stage and fail.
Does this PR introduce any user-facing change?
Yes, when the config is turned on it now merges the profiles instead of errorring out.
How was this patch tested?
Unit tests