Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] For hash joins where the build side can change use the smaller table for the build side #9442

Closed
revans2 opened this issue Oct 13, 2023 · 1 comment · Fixed by #10272
Assignees
Labels
performance A performance related task/issue

Comments

@revans2
Copy link
Collaborator

revans2 commented Oct 13, 2023

Is your feature request related to a problem? Please describe.
In some joins, like an inner join, we can pick either left or right as the build side. Right now we always pick the right table as the build side. That is not a good solution, but oddly it has worked out really well in most cases. We should have the join code so that if it pulls from what we guessed the build side to be and it is larger than a single batch, then we try to pull from the other side to see if it is smaller.

This is not really ideal because after getting a single batch we don't know if it is the last batch or not. We have some optimized code where it in some cases we can tag a batch as the final batch, but it would be good to at least have a follow on issue to try and add in this same kind of functionality as a follow on piece.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Oct 13, 2023
@mattahrens mattahrens added performance A performance related task/issue and removed feature request New feature or request labels Oct 13, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Oct 26, 2023
@jlowe jlowe self-assigned this Oct 27, 2023
@viadea
Copy link
Collaborator

viadea commented Dec 15, 2023

The ask from the user is at least to make Full Outer Join choose the right build side.
Of course, if we can make both inner join together with Full Outer Join make the right decision that is better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants