Skip to content

Conversation

@berkaysynnada
Copy link
Contributor

@berkaysynnada berkaysynnada commented Jul 5, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

HashJoins can preserve the order of the right input when the join type is right.

What changes are included in this PR?

HashJoin and NestedLoopJoin uses append_right_indices() method to combine matched and unmatched right results. This method simply appends the unmatched ones on matched ones. However, if we know that right is ordered, we can merge these indices in order to preserve order.

Are these changes tested?

Yes, with an .slt test. Added test would result as:

1 1 3 1
NULL NULL 1 0

before, which breaks the order.

Are there any user-facing changes?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 5, 2024
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

Edit: After discussing with @mustafasrepo and @berkaysynnada, I think we can add one more small optimization to append_right_indices to leverage existing probe side ordering via merging. We will send another commit to incorporate that in this PR as well

@berkaysynnada berkaysynnada marked this pull request as draft July 5, 2024 10:26
@berkaysynnada berkaysynnada marked this pull request as ready for review July 5, 2024 10:46
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added the mini optimization and this PR is ready from my perspective. Let's wait a couple days in case anyone wants to check it out and suggest improvements

@alamb
Copy link
Contributor

alamb commented Jul 6, 2024

This makes sense to me -- perhaps someone else who is familiar with the join code like @viirya @comphead or @korowa could give this a double check to verify that hash join does preserve the order of the right ordering (it makes sense to me)

@ozankabak
Copy link
Contributor

Seems like we have incorporated all the feedback. I will merge this later on today unless we get more feedback for improvements. Thanks everyone

@ozankabak ozankabak merged commit 8b84522 into apache:main Jul 8, 2024
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…che#11276)

* Preserve right hash order

* Update joins.slt

* better sorting alg

* remove result

* Review

* add plan test

* Address review feedback

* Expand test coverage and update docstring

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
…che#11276)

* Preserve right hash order

* Update joins.slt

* better sorting alg

* remove result

* Review

* add plan test

* Address review feedback

* Expand test coverage and update docstring

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants