Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24954][Core] Fail fast on job submit if run a barrier stage with dynamic resource allocation enabled #21915

Closed
wants to merge 3 commits into from

Conversation

jiangxb1987
Copy link
Contributor

What changes were proposed in this pull request?

We don't support run a barrier stage with dynamic resource allocation enabled, it shall lead to some confusing behaviors (eg. with dynamic resource allocation enabled, it may happen that we acquire some executors (but not enough to launch all the tasks in a barrier stage) and later release them due to executor idle time expire, and then acquire again).

We perform the check on job submit and fail fast if running a barrier stage with dynamic resource allocation enabled.

How was this patch tested?

Added new test suite BarrierStageOnSubmittedSuite to cover all the fail fast cases that submitted a job containing one or more barrier stages.

@holdensmagicalunicorn
Copy link

@jiangxb1987, thanks! I am a bot who has found some folks who might be able to help with the review:@mateiz, @rxin and @kayousterhout

@jiangxb1987
Copy link
Contributor Author

cc @squito

@SparkQA
Copy link

SparkQA commented Jul 30, 2018

Test build #93791 has finished for PR 21915 at commit 2ffa2b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
private def checkBarrierStageWithDynamicAllocation(rdd: RDD[_]): Unit = {
if (rdd.isBarrier() && Utils.isDynamicAllocationEnabled(sc.getConf)) {
throw new SparkException("Don't support run a barrier stage with dynamic resource " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • [SPARK-24942]: Barrier execution mode does not support dynamic resource allocation for now. You can disable dynamic resource allocation by setting Spark conf "spark.dynamicAllocation.enabled" to "false".
  • Make the error message a constant to simplify test.


test("submit a barrier ResultStage with dynamic resource allocation enabled") {
val conf = new SparkConf()
.set("spark.dynamicAllocation.enabled", "true")
Copy link
Contributor

@mengxr mengxr Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if we use LocalSparkContext? nvm, I was thinking MLlibTestSparkContext ...

@squito
Copy link
Contributor

squito commented Jul 31, 2018

thanks @jiangxb1987, lgtm aside from @mengxr 's comments

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93877 has finished for PR 21915 at commit 663b900.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93889 has finished for PR 21915 at commit 663b900.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93910 has finished for PR 21915 at commit 663b900.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

retest this please

@mengxr
Copy link
Contributor

mengxr commented Aug 2, 2018

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93931 has finished for PR 21915 at commit 663b900.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #93971 has finished for PR 21915 at commit 663b900.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #94006 has finished for PR 21915 at commit 663b900.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94089 has finished for PR 21915 at commit f3ea9c6.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94090 has finished for PR 21915 at commit 0796f76.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Aug 3, 2018

test this please

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94115 has finished for PR 21915 at commit 0796f76.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94131 has finished for PR 21915 at commit 0796f76.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Aug 3, 2018

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 92b4884 Aug 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants