[SPARK-44485][SQL] Optimize TreeNode.generateTreeString by liuzqt · Pull Request #42095 · apache/spark

liuzqt · 2023-07-20T18:40:08Z

What changes were proposed in this pull request?

Optimize several critical code path in TreeNode.generateTreeString

Why are the changes needed?

In TreeNode.generateTreeString, we observed inefficiency in scala collection operations and virtual function call in our internal workload.

This inefficiency become significant in large plan (we hit a example of more than 1000 nodes). So it’s worth optimizing the super hot code path. By rewriting into native Java code(not so sweet as scala syntax sugar though), we should be able to get rid of most of the overhead.

ArrayBuffer.append

Seq.last

SeqLike.$colon$plus

StringOps.$times

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs

HyukjinKwon · 2023-07-21T00:38:59Z

So how much does it improve in general? cc @MaxGekk FYI

liuzqt · 2023-07-21T01:06:53Z

So how much does it improve in general? cc @MaxGekk FYI

In our use case with >1000 nodes plan, the whole explain invocation is ~50% faster, considering this happen on each AQE cycle (where AQE have to update plan change to Spark UI), the improvement is significant.

LuciferYang · 2023-07-21T04:48:00Z

@liuzqt Can you retry the failed GA task?

On the other hand, does this optimization have the same effect for Scala 2.13 as well?

ulysses-you · 2023-07-21T05:50:51Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala

      printNodeId: Boolean,
      indent: Int = 0): Unit = {
-    append("   " * indent)
+    (0 until indent).foreach(_ => append(" "))


should it be append(" ") -> append(" ") ?

LuciferYang · 2023-07-21T08:53:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

      maxFields: Int,
      printNodeId: Boolean,
      indent: Int = 0): Unit = {
+    lastChildren.addLast(true)


Why did this addLast move before super.generateTreeString?

good catch, fixed

liuzqt · 2023-07-21T20:36:03Z

@liuzqt Can you retry the failed GA task?

On the other hand, does this optimization have the same effect for Scala 2.13 as well?

According to the profiling flamegraphs, most overhead comes "itable stub", which refers to the internal method dispatch mechanism used by the Scala compiler for virtual method calls. After changing to Java implementation, those overhead has gone.

We haven't got environment to test agains scala 2.13, but I suppose the improvement should apply.

liuzqt · 2023-07-21T20:37:08Z

Changed LinkedList to ArrayList, in our benchmarks we observed more GC pressure with LinkedList, while ArrayList provided more robust performance boost.

HyukjinKwon · 2023-07-22T07:43:24Z

Merged to master, and branch-3.5.

### What changes were proposed in this pull request? Optimize several critical code path in `TreeNode.generateTreeString` ### Why are the changes needed? In `TreeNode.generateTreeString`, we observed inefficiency in scala collection operations and virtual function call in our internal workload. This inefficiency become significant in large plan (we hit a example of more than 1000 nodes). So it’s worth optimizing the super hot code path. By rewriting into native Java code(not so sweet as scala syntax sugar though), we should be able to get rid of most of the overhead. - `ArrayBuffer.append` <img width="440" alt="itable1" src="https://github.com/apache/spark/assets/22358241/3e1d2e5e-1eeb-46ef-ab7a-20f4cb75f602"> - `Seq.last` <img width="302" alt="itable2" src="https://github.com/apache/spark/assets/22358241/23f29695-8a01-4c8e-b75a-148a92278c2b"> - `SeqLike.$colon$plus` <img width="281" alt="itable3" src="https://github.com/apache/spark/assets/22358241/f0526746-62d0-4556-99be-04a24ab805d2"> - `StringOps.$times` <img width="334" alt="itable4" src="https://github.com/apache/spark/assets/22358241/3a46f18e-7027-421e-aa5a-130d02e1c19c"> ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs Closes #42095 from liuzqt/SPARK-44485. Authored-by: Ziqi Liu <ziqi.liu@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 09c44fd) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

github-actions bot added the SQL label Jul 20, 2023

optimize TreeNode.generateTreeString

6ce8281

liuzqt force-pushed the SPARK-44485 branch from 55c8f05 to 6ce8281 Compare July 20, 2023 18:50

liuzqt changed the title ~~[SPARK-44485][CORE][SQL] Optimize TreeNode.generateTreeString~~ [SPARK-44485][SQL] Optimize TreeNode.generateTreeString Jul 20, 2023

liuzqt force-pushed the SPARK-44485 branch from 4995b97 to 3beb7ad Compare July 20, 2023 21:33

remove unused import

35b002f

liuzqt force-pushed the SPARK-44485 branch from 3beb7ad to 35b002f Compare July 21, 2023 00:06

HyukjinKwon approved these changes Jul 21, 2023

View reviewed changes

cloud-fan approved these changes Jul 21, 2023

View reviewed changes

ulysses-you reviewed Jul 21, 2023

View reviewed changes

LuciferYang reviewed Jul 21, 2023

View reviewed changes

liuzqt added 2 commits July 21, 2023 13:09

change to ArrayList

6787dce

fix

b8d2c82

HyukjinKwon approved these changes Jul 22, 2023

View reviewed changes

HyukjinKwon closed this in 09c44fd Jul 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[SPARK-44485][SQL] Optimize TreeNode.generateTreeString#42095

[SPARK-44485][SQL] Optimize TreeNode.generateTreeString#42095
liuzqt wants to merge 4 commits intoapache:masterfrom
liuzqt:SPARK-44485

liuzqt commented Jul 20, 2023 •

edited

Loading

Uh oh!

HyukjinKwon commented Jul 21, 2023

Uh oh!

liuzqt commented Jul 21, 2023

Uh oh!

LuciferYang commented Jul 21, 2023

Uh oh!

ulysses-you Jul 21, 2023

Uh oh!

liuzqt Jul 21, 2023

Uh oh!

LuciferYang Jul 21, 2023 •

edited

Loading

Uh oh!

liuzqt Jul 21, 2023

Uh oh!

liuzqt commented Jul 21, 2023

Uh oh!

liuzqt commented Jul 21, 2023

Uh oh!

HyukjinKwon commented Jul 22, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

liuzqt commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HyukjinKwon commented Jul 21, 2023

Uh oh!

liuzqt commented Jul 21, 2023

Uh oh!

LuciferYang commented Jul 21, 2023

Uh oh!

ulysses-you Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

liuzqt Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liuzqt Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

liuzqt commented Jul 21, 2023

Uh oh!

liuzqt commented Jul 21, 2023

Uh oh!

HyukjinKwon commented Jul 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

liuzqt commented Jul 20, 2023 •

edited

Loading

LuciferYang Jul 21, 2023 •

edited

Loading

HyukjinKwon commented Jul 22, 2023 •

edited

Loading