[SPARK-31105][CORE]Respect sql execution id for FIFO scheduling mode#27871
[SPARK-31105][CORE]Respect sql execution id for FIFO scheduling mode#27871liupc wants to merge 1 commit intoapache:masterfrom
Conversation
|
ok to test |
| var priority = taskSet.priority | ||
| var stageId = taskSet.stageId | ||
| val priority = if (taskSet.properties != null) { | ||
| taskSet.properties.getProperty("spark.sql.execution.id", "0").toLong |
There was a problem hiding this comment.
Although we has a similar one inside AppStatusListener already, this means another implicit dependency to sql module.
There was a problem hiding this comment.
BTW, you may want to define private val SQL_EXECUTION_ID_KEY = "spark.sql.execution.id" like AppStatusListener. Or, if we need this, we may need to define this at the more higher common place.
| val taskScheduler = new TaskSchedulerImpl(sc) | ||
|
|
||
| val rootPool = new Pool("", FIFO, 0, 0) | ||
| val schedulableBuilder = new FairSchedulableBuilder(rootPool, conf) |
There was a problem hiding this comment.
FairSchedulableBuilder -> FIFOSchedulableBuilder.
|
How do you think about this, @gatorsmile and @cloud-fan ? |
|
Test build #119893 has finished for PR 27871 at commit
|
|
IIUC Spark didn't optimize the workload toward the direction that the query delayed time should be minimized. Actually, I think scheduling Stages from the same sql execution would lead to a few nodes with the shuffle output files become very hot, thus lead to worse performance for the whole cluster compared to current approach. |
|
Also, please consider the case you submit a sql query that requires so many slots that it would block all the other queries from being executed. In the current approach, other small queries still get a chance to execute between two huge stages, after your change, every small queries need to wait until the first big query finishes. |
|
+1 for @jiangxb1987 's comments. BTW, I guessed that that's the main reason @liupc tried this at |
|
For small queries, usually they won't hit this problem. For big queries, the query latency shouldn't matter too much? @liupc have you tried this in real-world workloads? |
|
Ping, @liupc . If there is no other reason, shall we close this PR according to the review comments? |
Yes, we tried in real workloads, it does better especially there are lots of taskSets to be scheduled for one round scheduling. This is obvious for adaptive execution. Also, I think this is what FIFO should do. |
|
`
In real clusters, resources is more important than locality. And users expect the FIFO behave like this. |
|
Hi, @liupc .
Given that, this PR will not considered as a mergeable PR. If you want to proceed, please split the scopes. First, you may need to focus on adding a new option to respect Thanks! |
|
Thanks @dongjoon-hyun , let's spill the scopes, and add an option to respect |
|
Thank you, @liupc . I hope Apache Spark can improve your use cases in your enviornment in any way. Please move forward. I'll close this PR (AS-IS). You can reopen this later after the |
What changes were proposed in this pull request?
Currently, spark will sort taskSets by jobId and stageId and then schedule them in order for FIFO schedulingMode. In OLAP senerios, especially under high concurrency, the taskSets are always from different sql queries and several jobs can be submitted for execution at one time for one query, especailly for adaptive execution. But now we order those taskSets without considering the execution group, which may causes the query being delayed.
So I propose to consider the sql execution id when scheduling jobs.
Why are the changes needed?
Improvements
Does this PR introduce any user-facing change?
No
How was this patch tested?
existing UT & added UT