[SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes#32704
[SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes#32704shahidki31 wants to merge 3 commits intoapache:masterfrom
Conversation
|
cc @HyukjinKwon @maropu @cloud-fan @srowen Kindly review |
|
I don't know enough to review this, sorry |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
Show resolved
Hide resolved
|
Test build #139086 has finished for PR 32704 at commit
|
|
Does this fix nested subqueries? |
|
@cloud-fan Yes, |
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
Show resolved
Hide resolved
|
Test build #139103 has finished for PR 32704 at commit
|
|
Retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test status failure |
|
Test build #139107 has finished for PR 32704 at commit
|
|
thanks, merging to master! |
|
Thanks a lot @cloud-fan |

What changes were proposed in this pull request?
Explain cost command in spark currently doesn't show statistics for all the nodes. It misses some nodes in almost all the TPCDS queries.
In this PR, we are collecting all the plan nodes including the subqueries and computing the statistics for each node, if it doesn't exists in stats cache,
Why are the changes needed?
Before Fix

For eg: Query1, Project node doesn't have statistics
Query15, Aggregate node doesn't have statistics
After Fix:


Query1:
Query 15:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manual testing