[SPARK-31105][CORE]Respect sql execution id for FIFO scheduling mode by liupc · Pull Request #27871 · apache/spark

liupc · 2020-03-11T03:14:52Z

What changes were proposed in this pull request?

Currently, spark will sort taskSets by jobId and stageId and then schedule them in order for FIFO schedulingMode. In OLAP senerios, especially under high concurrency, the taskSets are always from different sql queries and several jobs can be submitted for execution at one time for one query, especailly for adaptive execution. But now we order those taskSets without considering the execution group, which may causes the query being delayed.

So I propose to consider the sql execution id when scheduling jobs.

Why are the changes needed?

Improvements

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing UT & added UT

dongjoon-hyun · 2020-03-16T20:15:02Z

ok to test

dongjoon-hyun · 2020-03-16T21:00:58Z

core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

-  var priority = taskSet.priority
-  var stageId = taskSet.stageId
+  val priority = if (taskSet.properties != null) {
+    taskSet.properties.getProperty("spark.sql.execution.id", "0").toLong


Although we has a similar one inside AppStatusListener already, this means another implicit dependency to sql module.

BTW, you may want to define private val SQL_EXECUTION_ID_KEY = "spark.sql.execution.id" like AppStatusListener. Or, if we need this, we may need to define this at the more higher common place.

dongjoon-hyun · 2020-03-16T21:06:45Z

core/src/test/scala/org/apache/spark/scheduler/PoolSuite.scala

+    val taskScheduler = new TaskSchedulerImpl(sc)
+
+    val rootPool = new Pool("", FIFO, 0, 0)
+    val schedulableBuilder = new FairSchedulableBuilder(rootPool, conf)


FairSchedulableBuilder -> FIFOSchedulableBuilder.

dongjoon-hyun · 2020-03-16T21:08:12Z

How do you think about this, @gatorsmile and @cloud-fan ?

SparkQA · 2020-03-16T22:58:35Z

Test build #119893 has finished for PR 27871 at commit 001b234.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2020-03-17T06:59:45Z

IIUC Spark didn't optimize the workload toward the direction that the query delayed time should be minimized. Actually, I think scheduling Stages from the same sql execution would lead to a few nodes with the shuffle output files become very hot, thus lead to worse performance for the whole cluster compared to current approach.

jiangxb1987 · 2020-03-17T07:02:20Z

Also, please consider the case you submit a sql query that requires so many slots that it would block all the other queries from being executed. In the current approach, other small queries still get a chance to execute between two huge stages, after your change, every small queries need to wait until the first big query finishes.

dongjoon-hyun · 2020-03-17T08:20:27Z

+1 for @jiangxb1987 's comments. BTW, I guessed that that's the main reason @liupc tried this at FIFO scheduler.

cloud-fan · 2020-03-17T08:59:10Z

For small queries, usually they won't hit this problem. For big queries, the query latency shouldn't matter too much?

@liupc have you tried this in real-world workloads?

dongjoon-hyun · 2020-03-17T21:08:00Z

Ping, @liupc . If there is no other reason, shall we close this PR according to the review comments?

liupc · 2020-03-20T09:09:10Z

For small queries, usually they won't hit this problem. For big queries, the query latency shouldn't matter too much?

@liupc have you tried this in real-world workloads?

Yes, we tried in real workloads, it does better especially there are lots of taskSets to be scheduled for one round scheduling. This is obvious for adaptive execution. Also, I think this is what FIFO should do.
usually queries may mapping to several jobs, if several jobs being delayed due to this reason, the total delay is obvious. Suppose each job duration would be 2 min, then if there are 10 jobs in front of the job and the cores is fully used. then due to this reason, it wait 20min to be scheduled.What's worse, in adaptive exeuction, when next batch of jobs being submitted, it may met this issue again, that may greatly harm the query duration. And each query may have the same issue and thus slow down them all.
Also, users will see lots of jobs running for later comming queries in SparkUI, that's confusing.

liupc · 2020-03-20T09:12:57Z

`

IIUC Spark didn't optimize the workload toward the direction that the query delayed time should be minimized. Actually, I think scheduling Stages from the same sql execution would lead to a few nodes with the shuffle output files become very hot, thus lead to worse performance for the whole cluster compared to current approach.

In real clusters, resources is more important than locality. And users expect the FIFO behave like this.

dongjoon-hyun · 2020-03-22T02:43:47Z

Hi, @liupc .

Currently, Core module's FIFO concept doesn't mean SQL level execution. The following is not Apache Spark's contract.

And users expect the FIFO behave like this.
Also, this is a trade-off. And, in general, this hurts the global throughput (as @jiangxb1987 mentioned already). Apache Spark cannot accept that kind of general performance degradation for those rare use cases. Please note that not every users are SQL users.
Lastly, I'm -1 for the current architectural design which makes a dependency from core module's TaskSetManager to the external sql module.

Given that, this PR will not considered as a mergeable PR. If you want to proceed, please split the scopes. First, you may need to focus on adding a new option to respect job level priority in the core module first. The configuration should be false by default. After the suggestion is accepted into master branch, you may need to make a second PR to add another option for the whole SQL Query optimization.

Thanks!

liupc · 2020-03-23T04:06:39Z

Thanks @dongjoon-hyun , let's spill the scopes, and add an option to respect jobGroup level priority in the core module.
And I think even in current approach, the congestion issue is serious, so this PR is not about to solve it, but I proposed another PR for this: #27862
I really think this is helpful for OLAP senarios, and we test this in real workloads in xiaomi.

dongjoon-hyun · 2020-03-23T19:43:12Z

Thank you, @liupc . I hope Apache Spark can improve your use cases in your enviornment in any way. Please move forward. I'll close this PR (AS-IS). You can reopen this later after the core module is ready independently.

[SPARK-31105][CORE]Respect sql execution id for FIFO scheduling mode

001b234

dongjoon-hyun added the SPARK CORE label Mar 16, 2020

dongjoon-hyun reviewed Mar 16, 2020

View reviewed changes

dongjoon-hyun requested changes Mar 16, 2020

View reviewed changes

dongjoon-hyun closed this Mar 23, 2020

Conversation

liupc commented Mar 11, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Mar 16, 2020

Uh oh!

dongjoon-hyun Mar 16, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Mar 16, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Mar 16, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Mar 16, 2020

Uh oh!

SparkQA commented Mar 16, 2020

Uh oh!

jiangxb1987 commented Mar 17, 2020

Uh oh!

jiangxb1987 commented Mar 17, 2020

Uh oh!

dongjoon-hyun commented Mar 17, 2020

Uh oh!

cloud-fan commented Mar 17, 2020

Uh oh!

dongjoon-hyun commented Mar 17, 2020

Uh oh!

liupc commented Mar 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liupc commented Mar 20, 2020

Uh oh!

dongjoon-hyun commented Mar 22, 2020

Uh oh!

liupc commented Mar 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Mar 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

liupc commented Mar 20, 2020 •

edited

Loading

liupc commented Mar 23, 2020 •

edited

Loading