[SPARK-28127][SQL] Micro optimization on TreeNode's mapChildren method

## What changes were proposed in this pull request? The `mapChildren` method in the TreeNode class is commonly used across the whole Spark SQL codebase. In this method, there's a if statement that checks non-empty children. However, there's a cached lazy val `containsChild`, which can avoid unnecessary computation since `containsChild` is used in other methods and therefore constructed anyway. Benchmark showed that this optimization can improve the whole TPC-DS planning time by 6.8%. There is no regression on any TPC-DS query. ## How was this patch tested? Existing UTs. Closes #24925 from yeshengm/treenode-children. Authored-by: Yesheng Ma <kimi.ysma@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
apache · Jun 21, 2019 · 54da3bb · 54da3bb
1 parent 47f54b1
commit 54da3bb
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -319,7 +319,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
    * Returns a copy of this node where `f` has been applied to all the nodes in `children`.
    */
   def mapChildren(f: BaseType => BaseType): BaseType = {
-    if (children.nonEmpty) {
+    if (containsChild.nonEmpty) {
       mapChildren(f, forceCopy = false)
     } else {
       this