Skip to content

Conversation

mihailotim-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR replaces HashSet that is currently used with a HashMap to improve DeduplicateRelations performance.
Additionally, this PR reverts #48053 as that change is no longer needed

Why are the changes needed?

Current implementation doesn't utilize HashSet properly, but instead performs multiple linear searches on the set creating a O(n^2) complexity

Does this PR introduce any user-facing change?

How was this patch tested?

Existing tests

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label Oct 9, 2024
@HyukjinKwon
Copy link
Member

Merged to master.

himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…ove performance of `DeduplicateRelations`

### What changes were proposed in this pull request?
This PR replaces `HashSet` that is currently used with a `HashMap` to improve `DeduplicateRelations` performance.
Additionally, this PR reverts apache#48053 as that change is no longer needed

### Why are the changes needed?
Current implementation doesn't utilize `HashSet` properly, but instead performs multiple linear searches on the set creating a O(n^2) complexity

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
Existing tests

### Was this patch authored or co-authored using generative AI tooling?

Closes apache#48392 from mihailotim-db/mihailotim-db/master.

Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants