[SPARK-44485][SQL] Optimize TreeNode.generateTreeString#42095
[SPARK-44485][SQL] Optimize TreeNode.generateTreeString#42095liuzqt wants to merge 4 commits intoapache:masterfrom
Conversation
|
So how much does it improve in general? cc @MaxGekk FYI |
In our use case with >1000 nodes plan, the whole explain invocation is ~50% faster, considering this happen on each AQE cycle (where AQE have to update plan change to Spark UI), the improvement is significant. |
|
@liuzqt Can you retry the failed GA task? On the other hand, does this optimization have the same effect for Scala 2.13 as well? |
| printNodeId: Boolean, | ||
| indent: Int = 0): Unit = { | ||
| append(" " * indent) | ||
| (0 until indent).foreach(_ => append(" ")) |
There was a problem hiding this comment.
should it be append(" ") -> append(" ") ?
| maxFields: Int, | ||
| printNodeId: Boolean, | ||
| indent: Int = 0): Unit = { | ||
| lastChildren.addLast(true) |
There was a problem hiding this comment.
Why did this addLast move before super.generateTreeString?
According to the profiling flamegraphs, most overhead comes "itable stub", which refers to the internal method dispatch mechanism used by the Scala compiler for virtual method calls. After changing to Java implementation, those overhead has gone. We haven't got environment to test agains scala 2.13, but I suppose the improvement should apply. |
|
Changed LinkedList to ArrayList, in our benchmarks we observed more GC pressure with LinkedList, while ArrayList provided more robust performance boost. |
|
Merged to master, and branch-3.5. |
### What changes were proposed in this pull request? Optimize several critical code path in `TreeNode.generateTreeString` ### Why are the changes needed? In `TreeNode.generateTreeString`, we observed inefficiency in scala collection operations and virtual function call in our internal workload. This inefficiency become significant in large plan (we hit a example of more than 1000 nodes). So it’s worth optimizing the super hot code path. By rewriting into native Java code(not so sweet as scala syntax sugar though), we should be able to get rid of most of the overhead. - `ArrayBuffer.append` <img width="440" alt="itable1" src="https://github.com/apache/spark/assets/22358241/3e1d2e5e-1eeb-46ef-ab7a-20f4cb75f602"> - `Seq.last` <img width="302" alt="itable2" src="https://github.com/apache/spark/assets/22358241/23f29695-8a01-4c8e-b75a-148a92278c2b"> - `SeqLike.$colon$plus` <img width="281" alt="itable3" src="https://github.com/apache/spark/assets/22358241/f0526746-62d0-4556-99be-04a24ab805d2"> - `StringOps.$times` <img width="334" alt="itable4" src="https://github.com/apache/spark/assets/22358241/3a46f18e-7027-421e-aa5a-130d02e1c19c"> ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs Closes #42095 from liuzqt/SPARK-44485. Authored-by: Ziqi Liu <ziqi.liu@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 09c44fd) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Optimize several critical code path in
TreeNode.generateTreeStringWhy are the changes needed?
In
TreeNode.generateTreeString, we observed inefficiency in scala collection operations and virtual function call in our internal workload.This inefficiency become significant in large plan (we hit a example of more than 1000 nodes). So it’s worth optimizing the super hot code path. By rewriting into native Java code(not so sweet as scala syntax sugar though), we should be able to get rid of most of the overhead.
ArrayBuffer.appendSeq.lastSeqLike.$colon$plusStringOps.$timesDoes this PR introduce any user-facing change?
No
How was this patch tested?
Existing UTs