Skip to content

[SPARK-22870][CORE] Dynamic allocation should allow 0 idle time#20080

Closed
wangyum wants to merge 2 commits intoapache:masterfrom
wangyum:SPARK-22870
Closed

[SPARK-22870][CORE] Dynamic allocation should allow 0 idle time#20080
wangyum wants to merge 2 commits intoapache:masterfrom
wangyum:SPARK-22870

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Dec 26, 2017

What changes were proposed in this pull request?

This pr to make 0 as a valid value for spark.dynamicAllocation.executorIdleTimeout.
For details, see the jira description: https://issues.apache.org/jira/browse/SPARK-22870.

How was this patch tested?

N/A

@SparkQA
Copy link

SparkQA commented Dec 26, 2017

Test build #85390 has finished for PR 20080 at commit 1dcec41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Dec 26, 2017

cc @srowen

@srowen
Copy link
Member

srowen commented Dec 26, 2017

I wonder if we should do the same for cachedExecutorIdleTimeoutS ? it isn't checked for being nonnegative but should be, I suppose. I think this is fine otherwise. I don't see any reason not to allow it (see JIRA) and the logic is compatible with a value of 0 already.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the goal isn't just to allow the value of 0, but also to ensure the executor are removed immediately, or close to. From what I can see we need to make some changes to support that?

For instance, schedule() is called every intervalMillis.

@jiangxb1987
Copy link
Contributor

@wangyum have you already run any jobs on a cluster with this patch?

@srowen
Copy link
Member

srowen commented Jan 5, 2018

Good point @felixcheung though the polling is hard-coded to 100ms, so will be close enough to "immediately". Allowing a value of 0 on both timeouts seems OK, but yeah not clear whether it does have the desired effect.

@srowen
Copy link
Member

srowen commented Jan 8, 2018

@wangyum any comments on the follow-up here? if you've observed it gets the desired behavior, that's pretty convincing. At least we should treat both timeouts consistently.

@wangyum
Copy link
Member Author

wangyum commented Jan 9, 2018

@srowen @jiangxb1987 I have test on my cluster with this patch.

bin/spark-sql --master yarn --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.executorIdleTimeout=0
18/01/09 05:49:03.452 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 26.196061 s
750000000
Time taken: 26.383 seconds, Fetched 1 row(s)
18/01/09 05:49:03.455 INFO SparkSQLCLIDriver: Time taken: 26.383 seconds, Fetched 1 row(s)
spark-sql> 18/01/09 05:49:03.479 INFO ExecutorAllocationManager: Request to remove executorIds: 972

05:49:03.479 - 05:49:03.455 = 24 ms

@srowen
Copy link
Member

srowen commented Jan 9, 2018

This change alone is pretty harmless. If it seems to work as intended, seems OK? Again, just wondering if cachedExecutorIdleTimeoutS should be the same, and check for nonnegative values?

@srowen
Copy link
Member

srowen commented Jan 11, 2018

Ping @wangyum

@SparkQA
Copy link

SparkQA commented Jan 13, 2018

Test build #86083 has finished for PR 20080 at commit b03a496.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jan 13, 2018

Test build #86090 has finished for PR 20080 at commit b03a496.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in fc6fe8a Jan 13, 2018
asfgit pushed a commit that referenced this pull request Jan 13, 2018
## What changes were proposed in this pull request?

This pr to make `0` as a valid value for `spark.dynamicAllocation.executorIdleTimeout`.
For details, see the jira description: https://issues.apache.org/jira/browse/SPARK-22870.

## How was this patch tested?

N/A

Author: Yuming Wang <yumwang@ebay.com>
Author: Yuming Wang <wgyumg@gmail.com>

Closes #20080 from wangyum/SPARK-22870.

(cherry picked from commit fc6fe8a)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants