-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22683][CORE] Add a executorAllocationRatio parameter to throttle the parallelism of the dynamic allocation #19881
Changes from all commits
b80b6a5
c3460e4
7f1494d
e2cf46f
9cea578
4a157c3
51118b8
2cf2d1f
3b1dddc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -145,6 +145,39 @@ class ExecutorAllocationManagerSuite | |
assert(numExecutorsToAdd(manager) === 1) | ||
} | ||
|
||
def testAllocationRatio(cores: Int, divisor: Double, expected: Int): Unit = { | ||
val conf = new SparkConf() | ||
.setMaster("myDummyLocalExternalClusterManager") | ||
.setAppName("test-executor-allocation-manager") | ||
.set("spark.dynamicAllocation.enabled", "true") | ||
.set("spark.dynamicAllocation.testing", "true") | ||
.set("spark.dynamicAllocation.maxExecutors", "15") | ||
.set("spark.dynamicAllocation.minExecutors", "3") | ||
.set("spark.dynamicAllocation.executorAllocationRatio", divisor.toString) | ||
.set("spark.executor.cores", cores.toString) | ||
val sc = new SparkContext(conf) | ||
contexts += sc | ||
var manager = sc.executorAllocationManager.get | ||
post(sc.listenerBus, SparkListenerStageSubmitted(createStageInfo(0, 20))) | ||
for (i <- 0 to 5) { | ||
addExecutors(manager) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this loop isn't really needed right? All we are checking is the target not the number to add? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we want to check the capping by max / min executors, we need to actually try and add executors. The max /min capping does not occur during the computation of the target number of exes, but at the time they are added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||
} | ||
assert(numExecutorsTarget(manager) === expected) | ||
sc.stop() | ||
} | ||
|
||
test("executionAllocationRatio is correctly handled") { | ||
testAllocationRatio(1, 0.5, 10) | ||
testAllocationRatio(1, 1.0/3.0, 7) | ||
testAllocationRatio(2, 1.0/3.0, 4) | ||
testAllocationRatio(1, 0.385, 8) | ||
|
||
// max/min executors capping | ||
testAllocationRatio(1, 1.0, 15) // should be 20 but capped by max | ||
testAllocationRatio(4, 1.0/3.0, 3) // should be 2 but elevated by min | ||
} | ||
|
||
|
||
test("add executors capped by num pending tasks") { | ||
sc = createSparkContext(0, 10, 0) | ||
val manager = sc.executorAllocationManager.get | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need this variable now, can we just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is used at 2 places, one to validate arguments and the other to actually compute the target number of executors. If I remove this variable, I will need to either store
spark.executor.cores
andspark.task.cpus
instead, or to fetch them each time we do a validation or a computation of target nbExecutorsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangxb1987, do you agree with my comment, or do you still want me to remove the variable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally thinking we may avoid introducing the concept
tasksPerExecutorForFullParallelism
, but rather only have executorCores and taskCPUs, but I don't have a strong opinion over that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not exposed, it is merely a more precise description of the actual computation. I just wanted to state more clearly that the existing default behavior is maximizing the parallelism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok