-
Notifications
You must be signed in to change notification settings - Fork 7.8k
More fine-grained control over hash join output block size #87913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Workflow [PR], commit [cd8f877] Summary: ❌
|
17f6a9b to
d5f70dd
Compare
d5f70dd to
28079bd
Compare
f9261db to
e4f84ea
Compare
| DECLARE(UInt64, min_joined_block_size_bytes, 512 * 1024, R"( | ||
| Minimum block size in bytes for JOIN input and output blocks (if join algorithm supports it). Small blocks will be squashed. 0 means unlimited. | ||
| )", 0) \ | ||
| DECLARE(Bool, joined_block_split_single_row, false, R"( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have perf test results with this setting enabled?
| } | ||
|
|
||
| void setFilter(IColumn::Filter && filter_) { filter = std::move(filter_); } | ||
| void setFilter(const IColumn::Filter * filter_) { filter = std::cref(*filter_); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we limit ourselves to just this one overload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two branches for which I created owning and non-owning member, but seems we can move in both cases, since one that uses filter directly is supposed to be last call of next
| UInt64 out = 0; | ||
| UInt64 prev = 0; | ||
|
|
||
| for (size_t i = 0, n = offsets.size(); i < n; ++i) | ||
| { | ||
| const UInt64 curr = offsets[i]; | ||
|
|
||
| const UInt64 start = std::max(prev, shift); | ||
| const UInt64 stop = std::min(curr, end); | ||
|
|
||
| if (start < stop) | ||
| out += (stop - start); | ||
|
|
||
| out_offsets[i] = out; | ||
| prev = curr; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per see, it is equivalent to
out_offsets[i] = std::min(offsets[i], end);
out_offsets[i] = std::max<int64_t>(0, out_offsets[i] - shift);If so, the version above might be easier to read IMO.
| DECLARE(UInt64, min_joined_block_size_bytes, 512 * 1024, R"( | ||
| Minimum block size in bytes for JOIN input and output blocks (if join algorithm supports it). Small blocks will be squashed. 0 means unlimited. | ||
| )", 0) \ | ||
| DECLARE(Bool, joined_block_split_single_row, false, R"( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's, pls, add a dedicated perf test when each left row will have a good number of matches with the right side. And run it with and without the new logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the stateless test 03633_join_many_matches_limit.sql.j2, which verifies that the setting correctly limits the result block size. Do you think a performance test is still necessary? I believe the main issue this addresses is preventing memory usage from exploding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think a performance test is still necessary?
yes, we should understand the performance implications of enabling this setting
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
joined_block_split_single_rowsetting to reduce memory usage in hash joins with many matches per key. This allows hash join results to be chunked even within matches for a single left table row, which is particularly useful when one row from the left table matches thousands or millions of rows from the right table. Previously, all matches had to be materialized at once in memory. This reduces peak memory usage but may increase CPU usage.Additionally, this PR fixes #87974 bug that was discovered during implementation and reproduces on master.
Documentation entry for user-facing changes