-
Notifications
You must be signed in to change notification settings - Fork 1.8k
perf: Improve NLJ for very small right side case #17562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // Construct the Cartesian product between the specified range of left rows and the entire right_batch. | ||
| // Do not apply filters or update any bitmaps here. | ||
| let right_rows = right_batch.num_rows(); | ||
| if l_row_count == 0 || right_rows == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably use this check even before calling process_left_range_join ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked and this case is not possible. I removed this condition and added an assertion outside.
| return Ok(None); | ||
| } | ||
|
|
||
| let total_rows = l_row_count * right_rows; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like cartesian?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Added more comments to make it clear
|
@comphead thanks for the review, comments are addressed. I triggered extended tests locally, and it has passed: https://github.com/2010YOUY01/arrow-datafusion/actions/runs/17754340789 |
|
Is this ready for review again ? @comphead |
|
I think we slightly lost this PR, @2010YOUY01 are you okay to rebase the PR? |
|
close/re-open to rerun CI |
Rebased 👌🏼 |
| ); | ||
|
|
||
| let l_row_cnt_ratio = self.batch_size / right_batch.num_rows(); | ||
| if l_row_cnt_ratio > 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be follow up PR to make this fine tuning param be configurable by user?
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @2010YOUY01 I think it is great.
Which issue does this PR close?
Rationale for this change
See the issue for the background. If the optimizer made the wrong join order decision, and put a very small input at the probe side of NLJ, the NLJ operator now can handle it much faster than before.
For implementation, before it's always handling
(one_left_row X right_batch)in the inner loop, this PR do join multiple left rows at once with the right batch, if the right batch is very small.The NLJ microbench result, only Q13 is for this workload:
In DF49 it's around 4ms.
What changes are included in this PR?
Are these changes tested?
This can be covered by existing tests: this additional path is not only triggered if the entire right input is small. For regular workloads, the final input batch can be also very small, so this new path can be triggered and tested.
Are there any user-facing changes?