Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-35677][Core][SQL] Support dynamic range of executor numbers for dynamic allocation #32819

Closed
wants to merge 3 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jun 8, 2021

What changes were proposed in this pull request?

Currently, Spark allows users to set scalability within a Spark application using dynamic allocation. spark.dynamicAllocation.minExecutors & spark.dynamicAllocation.maxExecutors are used for scaling up and down. Within an application,Spark tactfully use them to request executors from cluster manager according to the real-time workload. Once set, the range is fixed through the whole application lifecycle. This is not very convenient for long-running application when the range should be changeable for some cases, such as:

  1. the cluster manager itself or the queue will scale up and down, which looks very likely to happen in modern cloud platforms
  2. the application is long-running, but the timeliness, priority, e.t.c are not only determined by the workload of the application, but also by the traffic across the cluster manager or just different moments
  3. e.t.c.

Why are the changes needed?

make the dynamic allocation for long term Spark applications

Does this PR introduce any user-facing change?

Configs below are changeable:

spark.dynamicAllocation.maxExecutors
spark.dynamicAllocation.minExecutors

How was this patch tested?

new tests

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Test build #139470 has finished for PR 32819 at commit 2975fb6.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SparkListenerExecutorAllocatorRangeUpdate(

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43993/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43993/

@yaooqinn yaooqinn changed the title [WIP][SPARK-35677][Core][SQL] Support dynamic executor range for dynamic allocation [WIP][SPARK-35677][Core][SQL] Support dynamic range of executor numbers for dynamic allocation Jun 8, 2021
@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43997/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Test build #139474 has finished for PR 32819 at commit 0f94be6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44020/

@SparkQA
Copy link

SparkQA commented Jun 8, 2021

Test build #139496 has finished for PR 32819 at commit 4078349.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Jun 9, 2021

cc @cloud-fan @wangyum thanks

@yaooqinn
Copy link
Member Author

yaooqinn commented Jun 9, 2021

retest this please

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Test build #139519 has finished for PR 32819 at commit 4078349.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44044/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44044/

@@ -80,6 +82,22 @@ case class SetCommand(kv: Option[(String, Option[String])])
}
(keyValueOutput, runFunc)

case Some((DYN_ALLOCATION_MIN_EXECUTORS.key, Some(value))) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an administration operation. I am not sure it's a good idea to control this by just run set command for long running service such as thrift server as users can easily run this command, and misuse this feature.

How about expose this via the Rest API?

event: SparkListenerExecutorAllocatorRangeUpdate): Unit = {
val newLower = event.lower.getOrElse(_minNumExecutors)
val newUpper = event.upper.getOrElse(_maxNumExecutors)
validateSettings(newLower, newUpper)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be like

try {
   validateSettings(newLower, newUpper)
   _minNumExecutors = newLower
     _maxNumExecutors = newUpper
} catch {
 ....
}

Otherwise exception is thrown in this listener.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 18, 2021
@github-actions github-actions bot closed this Dec 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants