Skip to content

Comments

[SPARK-44485][SQL] Optimize TreeNode.generateTreeString#42095

Closed
liuzqt wants to merge 4 commits intoapache:masterfrom
liuzqt:SPARK-44485
Closed

[SPARK-44485][SQL] Optimize TreeNode.generateTreeString#42095
liuzqt wants to merge 4 commits intoapache:masterfrom
liuzqt:SPARK-44485

Conversation

@liuzqt
Copy link
Contributor

@liuzqt liuzqt commented Jul 20, 2023

What changes were proposed in this pull request?

Optimize several critical code path in TreeNode.generateTreeString

Why are the changes needed?

In TreeNode.generateTreeString, we observed inefficiency in scala collection operations and virtual function call in our internal workload.

This inefficiency become significant in large plan (we hit a example of more than 1000 nodes). So it’s worth optimizing the super hot code path. By rewriting into native Java code(not so sweet as scala syntax sugar though), we should be able to get rid of most of the overhead.

  • ArrayBuffer.append
itable1
  • Seq.last
itable2
  • SeqLike.$colon$plus
itable3
  • StringOps.$times
itable4

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs

@github-actions github-actions bot added the SQL label Jul 20, 2023
@liuzqt liuzqt changed the title [SPARK-44485][CORE][SQL] Optimize TreeNode.generateTreeString [SPARK-44485][SQL] Optimize TreeNode.generateTreeString Jul 20, 2023
@HyukjinKwon
Copy link
Member

So how much does it improve in general? cc @MaxGekk FYI

@liuzqt
Copy link
Contributor Author

liuzqt commented Jul 21, 2023

So how much does it improve in general? cc @MaxGekk FYI

In our use case with >1000 nodes plan, the whole explain invocation is ~50% faster, considering this happen on each AQE cycle (where AQE have to update plan change to Spark UI), the improvement is significant.

@LuciferYang
Copy link
Contributor

@liuzqt Can you retry the failed GA task?

On the other hand, does this optimization have the same effect for Scala 2.13 as well?

printNodeId: Boolean,
indent: Int = 0): Unit = {
append(" " * indent)
(0 until indent).foreach(_ => append(" "))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be append(" ") -> append(" ") ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

maxFields: Int,
printNodeId: Boolean,
indent: Int = 0): Unit = {
lastChildren.addLast(true)
Copy link
Contributor

@LuciferYang LuciferYang Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this addLast move before super.generateTreeString?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed

@liuzqt
Copy link
Contributor Author

liuzqt commented Jul 21, 2023

@liuzqt Can you retry the failed GA task?

On the other hand, does this optimization have the same effect for Scala 2.13 as well?

According to the profiling flamegraphs, most overhead comes "itable stub", which refers to the internal method dispatch mechanism used by the Scala compiler for virtual method calls. After changing to Java implementation, those overhead has gone.

We haven't got environment to test agains scala 2.13, but I suppose the improvement should apply.

@liuzqt
Copy link
Contributor Author

liuzqt commented Jul 21, 2023

Changed LinkedList to ArrayList, in our benchmarks we observed more GC pressure with LinkedList, while ArrayList provided more robust performance boost.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 22, 2023

Merged to master, and branch-3.5.

HyukjinKwon pushed a commit that referenced this pull request Jul 22, 2023
### What changes were proposed in this pull request?

Optimize several critical code path in `TreeNode.generateTreeString`

### Why are the changes needed?

In `TreeNode.generateTreeString`, we observed inefficiency in scala collection operations and virtual function call in our internal workload.

This inefficiency become significant in large plan (we hit a example of more than 1000 nodes). So it’s worth optimizing the super hot code path. By rewriting into native Java code(not so sweet as scala syntax sugar though), we should be able to get rid of most of the overhead.

- `ArrayBuffer.append`
<img width="440" alt="itable1" src="https://github.com/apache/spark/assets/22358241/3e1d2e5e-1eeb-46ef-ab7a-20f4cb75f602">

- `Seq.last`
<img width="302" alt="itable2" src="https://github.com/apache/spark/assets/22358241/23f29695-8a01-4c8e-b75a-148a92278c2b">

- `SeqLike.$colon$plus`
<img width="281" alt="itable3" src="https://github.com/apache/spark/assets/22358241/f0526746-62d0-4556-99be-04a24ab805d2">

- `StringOps.$times`
<img width="334" alt="itable4" src="https://github.com/apache/spark/assets/22358241/3a46f18e-7027-421e-aa5a-130d02e1c19c">

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UTs

Closes #42095 from liuzqt/SPARK-44485.

Authored-by: Ziqi Liu <ziqi.liu@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 09c44fd)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants