Skip to content

[SPARK-32984][TESTS][SQL] Improve showing the differences between approved and actual plans of PlanStabilitySuite#29860

Closed
Ngone51 wants to merge 2 commits intoapache:masterfrom
Ngone51:show-difference-of-planstability
Closed

[SPARK-32984][TESTS][SQL] Improve showing the differences between approved and actual plans of PlanStabilitySuite#29860
Ngone51 wants to merge 2 commits intoapache:masterfrom
Ngone51:show-difference-of-planstability

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Sep 24, 2020

What changes were proposed in this pull request?

This PR proposes to add the caret hints, e.g., ^^^^^, to the approved and actual plans where they first become different.

Why are the changes needed?

It's hard to find the difference between the approved and actual plan since the plans of TPC-DS queries are often huge. Adding the hints would help developers to locate the plan differences quickly.

Does this PR introduce any user-facing change?

Yes, after this change, there're hits added to the plans to highlight the differences. For example,

[info]   last approved simplified plan: /Users/yi.wu/IdeaProjects/spark/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q41.sf100/simplified.txt
[info]   last approved explain plan: /Users/yi.wu/IdeaProjects/spark/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q41.sf100/explain.txt
[info]   
[info]   TakeOrderedAndProject [i_product_name]
[info]     WholeStageCodegen (4)
[info]       HashAggregate [i_product_name]
[info]         InputAdapter
[info]           Exchange [i_product_name] #1
[info]             WholeStageCodegen (3)
[info]               HashAggregate [i_product_name]
[info]                 Project [i_product_name]
[info]                   BroadcastHashJoin [i_manufact,i_manufact]
[info]                     Project [i_manufact,i_product_name]
[info]                       Filter [i_manufact_id,i_manufact]
[info]                         ColumnarToRow
[info]                           InputAdapter
[info]                             Scan parquet default.item [i_manufact_id,i_manufact,i_product_name]
[info]                     InputAdapter
[info]                       BroadcastExchange #2
[info]                         WholeStageCodegen (2)
[info]                           Project [i_manufact]
[info]                             Filter [item_cnt]
[info]                               HashAggregate [i_manufact,count] [count(1),item_cnt,i_manufact,count]
[info]                                                                                              ^^^^^^
[info]                                 InputAdapter
[info]                                   Exchange [i_manufact] #3
[info]                                     WholeStageCodegen (1)
[info]                                       HashAggregate [i_manufact] [count,count]
[info]                                         Project [i_manufact]
[info]                                           Filter [i_category,i_color,i_units,i_size,i_manufact]
[info]                                             ColumnarToRow
[info]                                               InputAdapter
[info]                                                 Scan parquet default.item [i_category,i_manufact,i_size,i_color,i_units]
[info]   
[info]   actual simplified plan: /Users/yi.wu/IdeaProjects/spark/target/tmp/q41.sf100.actual.simplified.txt
[info]   actual explain plan: /Users/yi.wu/IdeaProjects/spark/target/tmp/q41.sf100.actual.explain.txt
[info]   
[info]   TakeOrderedAndProject [i_product_name]
[info]     WholeStageCodegen (4)
[info]       HashAggregate [i_product_name]
[info]         InputAdapter
[info]           Exchange [i_product_name] #1
[info]             WholeStageCodegen (3)
[info]               HashAggregate [i_product_name]
[info]                 Project [i_product_name]
[info]                   BroadcastHashJoin [i_manufact,i_manufact]
[info]                     Project [i_manufact,i_product_name]
[info]                       Filter [i_manufact_id,i_manufact]
[info]                         ColumnarToRow
[info]                           InputAdapter
[info]                             Scan parquet default.item [i_manufact_id,i_manufact,i_product_name]
[info]                     InputAdapter
[info]                       BroadcastExchange #2
[info]                         WholeStageCodegen (2)
[info]                           Project [i_manufact]
[info]                             Filter [alwaysTrue,item_cnt]
[info]                               HashAggregate [i_manufact,count] [count(1),item_cnt,i_manufact,alwaysTrue,count]
[info]                                                                                              ^^^^^^
[info]                                 InputAdapter
[info]                                   Exchange [i_manufact] #3
[info]                                     WholeStageCodegen (1)
[info]                                       HashAggregate [i_manufact] [count,count]
[info]                                         Project [i_manufact]
[info]                                           Filter [i_category,i_color,i_units,i_size,i_manufact]
[info]                                             ColumnarToRow
[info]                                               InputAdapter
[info]                                                 Scan parquet default.item [i_category,i_manufact,i_size,i_color,i_units] 

To be honest, the result is still not perfect when plans are super huge and your monitor can not hold it. But at least now we can have a way to search the differences.

How was this patch tested?

Tested manually.

@Ngone51
Copy link
Member Author

Ngone51 commented Sep 24, 2020

cc @cloud-fan @maropu Please take a look, thanks!

@maropu
Copy link
Member

maropu commented Sep 24, 2020

Looks sideBySide has the same purpose (showing plan differences);
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L61
Is that method not appropriate for showing plan differences in this case?

@Ngone51
Copy link
Member Author

Ngone51 commented Sep 24, 2020

I remember we changed to showing plans one by one instead of side by side because of the TPCDS plans are usually too big to be shown in a good shape using side by side.

We want to first keep the plan shape and then highlight the difference.

@SparkQA
Copy link

SparkQA commented Sep 24, 2020

Test build #129060 has finished for PR 29860 at commit 826a2e9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 25, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33720/

@SparkQA
Copy link

SparkQA commented Sep 25, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33720/

@SparkQA
Copy link

SparkQA commented Sep 25, 2020

Test build #129099 has finished for PR 29860 at commit 826a2e9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
private def addDiffHint(approvedSimplified: String, actualSimplified: String)
: (String, String) = {
// reverse the plan so we can compare the node from the bottom to top
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One hard problem is how to match the lines from both sides. It's possible that the left side has one more node in the middle, so simply matching lines bottom-up may not work. It's like git diff, we should do the match w.r.t. the content, which can be very complicated.

Maybe we should just recommend some online text diff tools in the comment and ask people to use.

@github-actions
Copy link

github-actions bot commented Jan 4, 2021

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 4, 2021
@github-actions github-actions bot closed this Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants