Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28716][SQL] Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans #25434

Conversation

dbaliafroozeh
Copy link
Contributor

@dbaliafroozeh dbaliafroozeh commented Aug 13, 2019

What changes were proposed in this pull request?

Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example:

ReusedExchange d_date_sk#827, BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [id=#2710] 

Where 2710 is the id of the reused exchange.

How was this patch tested?

Passes existing tests

@hvanhovell
Copy link
Contributor

ok to test

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28716] Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans [SPARK-28716][SQL] Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans Aug 13, 2019
@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Aug 13, 2019

Test build #109045 has finished for PR 25434 at commit ca0f975.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2019

Test build #109067 has finished for PR 25434 at commit 0ddfeb8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Aug 15, 2019

Can you put explain output differences with/without this pr in the PR description? Also, It'd be better to put the simple query to reproduce the case.

assert(output.contains(
"""== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false])) ==
assert(output.replaceAll("\\[id=#\\d+\\]", "[id=#x]").contains(
"""== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false])), [id=#x] ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to always print this id? only for reuse cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you need to make two passes over the tree. The first one to figure out which exchanges are reused and the second to print the tree. This seems a bit excessive for a debugging tool.

@hvanhovell
Copy link
Contributor

Merging to master.

rshkv pushed a commit to palantir/spark that referenced this pull request Jan 29, 2021
…d for easier identifying their reuses in query plans

Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example:
```
ReusedExchange d_date_sk#827, BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [id=apache#2710]
```
Where `2710` is the id of the reused exchange.

Passes existing tests

Closes apache#25434 from dbaliafroozeh/ImplementStringArgsExchangeSubqueryExec.

Authored-by: Ali Afroozeh <ali.afroozeh@databricks.com>
Signed-off-by: herman <herman@databricks.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Jan 29, 2021
…d for easier identifying their reuses in query plans

Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example:
```
ReusedExchange d_date_sk#827, BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [id=apache#2710]
```
Where `2710` is the id of the reused exchange.

Passes existing tests

Closes apache#25434 from dbaliafroozeh/ImplementStringArgsExchangeSubqueryExec.

Authored-by: Ali Afroozeh <ali.afroozeh@databricks.com>
Signed-off-by: herman <herman@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants