Skip to content

Conversation

mihailotim-db
Copy link
Contributor

@mihailotim-db mihailotim-db commented Feb 19, 2025

What changes were proposed in this pull request?

Refactor natural and using join key computation to a separate component so that it can be reused in single-pass resolver.

Why are the changes needed?

To reuse code in single-pass resolver.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Feb 19, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/join_refactor branch 4 times, most recently from f668314 to 9d86dc7 Compare February 19, 2025 14:35
@mihailotim-db mihailotim-db changed the title [SPARK-51081][SQL][FOLLOWUP] Refactor natural and using join keys computation [SPARK-51081][SQL][FOLLOWUP] Refactor and improve performance for natural and using join keys computation Feb 19, 2025
@mihailotim-db mihailotim-db changed the title [SPARK-51081][SQL][FOLLOWUP] Refactor and improve performance for natural and using join keys computation [SPARK-51259][SQL] Refactor and improve performance for natural and using join keys computation Feb 19, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/join_refactor branch from 9d86dc7 to c1266e8 Compare February 19, 2025 14:45
Comment on lines -3584 to -3585
val lUniqueOutput = left.output.filterNot(att => leftKeys.contains(att))
val rUniqueOutput = right.output.filterNot(att => rightKeys.contains(att))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the return type of computeKeysForNaturalOrUsingJoin from Seq to AttributeSet to avoid quadratic lookups here. I think it makes sense to make this change in this PR, but it can be moved to a followup. Wdyt @cloud-fan @vladimirg-db ?

Copy link
Contributor Author

@mihailotim-db mihailotim-db Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a separate PR: #50010

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/join_refactor branch 3 times, most recently from d0cafda to 5f74bc8 Compare February 19, 2025 14:54
@mihailotim-db mihailotim-db changed the title [SPARK-51259][SQL] Refactor and improve performance for natural and using join keys computation [SPARK-51259][SQL] Refactor natural and using join keys computation Feb 19, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/join_refactor branch from 5f74bc8 to fb354ed Compare February 19, 2025 16:08
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in a661f9f Feb 20, 2025
Pajaraja pushed a commit to Pajaraja/spark that referenced this pull request Mar 6, 2025
### What changes were proposed in this pull request?
Refactor natural and using join key computation to a separate component so that it can be reused in single-pass resolver.

### Why are the changes needed?
To reuse code in single-pass resolver.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50009 from mihailotim-db/mihailotim-db/join_refactor.

Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants