More fine-grained control over hash join output block size #87913

vdimir · 2025-09-30T13:25:21Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add new joined_block_split_single_row setting to reduce memory usage in hash joins with many matches per key. This allows hash join results to be chunked even within matches for a single left table row, which is particularly useful when one row from the left table matches thousands or millions of rows from the right table. Previously, all matches had to be materialized at once in memory. This reduces peak memory usage but may increase CPU usage.

Additionally, this PR fixes #87974 bug that was discovered during implementation and reproduces on master.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2025-09-30T13:25:50Z

Workflow [PR], commit [cd8f877]

Summary: ❌

job_name	test_name	status
Integration tests (amd_asan, old analyzer, 2/6)		error
	test_restore_replica/test.py::test_fix_metadata_version_on_attach_part_after_restore	FAIL
	test_lost_part/test.py::test_lost_part_other_replica	FAIL
	test_concurrent_ttl_merges/test.py::test_limited_ttl_merges_in_empty_pool	FAIL
	OOM in dmesg	FAIL

nickitat · 2025-10-03T11:31:57Z

src/Core/Settings.cpp

    DECLARE(UInt64, min_joined_block_size_bytes, 512 * 1024, R"(
 Minimum block size in bytes for JOIN input and output blocks (if join algorithm supports it). Small blocks will be squashed. 0 means unlimited.
+)", 0) \
+    DECLARE(Bool, joined_block_split_single_row, false, R"(


Do we have perf test results with this setting enabled?

tests/clickhouse-test

nickitat · 2025-10-03T12:13:17Z

src/Interpreters/HashJoin/HashJoinResult.cpp

+    }
+
+    void setFilter(IColumn::Filter && filter_) { filter = std::move(filter_); }
+    void setFilter(const IColumn::Filter * filter_) { filter = std::cref(*filter_); }


Can we limit ourselves to just this one overload?

There are two branches for which I created owning and non-owning member, but seems we can move in both cases, since one that uses filter directly is supposed to be last call of next

src/Interpreters/HashJoin/HashJoinResult.cpp

src/Interpreters/HashJoin/AddedColumns.h

nickitat · 2025-10-03T13:35:44Z

src/Interpreters/HashJoin/HashJoinResult.cpp

+    UInt64 out = 0;
+    UInt64 prev = 0;
+
+    for (size_t i = 0, n = offsets.size(); i < n; ++i)
+    {
+        const UInt64 curr = offsets[i];
+
+        const UInt64 start = std::max(prev,  shift);
+        const UInt64 stop  = std::min(curr,  end);
+
+        if (start < stop)
+            out += (stop - start);
+
+        out_offsets[i] = out;
+        prev = curr;
+    }


Per see, it is equivalent to

out_offsets[i] = std::min(offsets[i], end); out_offsets[i] = std::max<int64_t>(0, out_offsets[i] - shift);

If so, the version above might be easier to read IMO.

nickitat · 2025-10-03T13:38:23Z

src/Core/Settings.cpp

    DECLARE(UInt64, min_joined_block_size_bytes, 512 * 1024, R"(
 Minimum block size in bytes for JOIN input and output blocks (if join algorithm supports it). Small blocks will be squashed. 0 means unlimited.
+)", 0) \
+    DECLARE(Bool, joined_block_split_single_row, false, R"(


Let's, pls, add a dedicated perf test when each left row will have a good number of matches with the right side. And run it with and without the new logic.

I've added the stateless test 03633_join_many_matches_limit.sql.j2, which verifies that the setting correctly limits the result block size. Do you think a performance test is still necessary? I believe the main issue this addresses is preventing memory usage from exploding

Do you think a performance test is still necessary?

yes, we should understand the performance implications of enabling this setting

clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Sep 30, 2025

vdimir force-pushed the vdimir/hash_join_granular_output branch from 17f6a9b to d5f70dd Compare September 30, 2025 13:43

More fine-grained control over hash join output block size

28079bd

vdimir force-pushed the vdimir/hash_join_granular_output branch from d5f70dd to 28079bd Compare September 30, 2025 14:44

vdimir added 3 commits September 30, 2025 16:12

add setting + fixes

e2b6dcd

rename setting

719e5b9

upd SettingsChangesHistory.cpp

e4f84ea

vdimir force-pushed the vdimir/hash_join_granular_output branch from f9261db to e4f84ea Compare September 30, 2025 17:42

vdimir added 4 commits October 1, 2025 08:16

fix

bffcdd6

fix allow_experimental_join_right_table_sorting

3032a67

fix clang tidy warning

1673248

add test 03633_join_many_matches_limit

29ab9ae

vdimir marked this pull request as ready for review October 2, 2025 08:10

nickitat self-assigned this Oct 2, 2025

nickitat reviewed Oct 3, 2025

View reviewed changes

vdimir mentioned this pull request Oct 8, 2025

Immense improvements for the grace hash join #88237

Merged

1 task

vdimir added 4 commits October 8, 2025 15:44

Merge branch 'master' into vdimir/hash_join_granular_output

5a5140b

Simplify HashJoinResult::GenerateCurrentRowState

8181b75

address review

00806a7

add tests/performance/joined_block_split_single_row.xml

cd8f877

nickitat approved these changes Oct 9, 2025

View reviewed changes

vdimir enabled auto-merge October 9, 2025 16:44

vdimir added this pull request to the merge queue Oct 10, 2025

Merged via the queue into master with commit 2fec69d Oct 10, 2025
239 of 242 checks passed

vdimir deleted the vdimir/hash_join_granular_output branch October 10, 2025 08:48

robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 10, 2025

vdimir mentioned this pull request Oct 15, 2025

Support limit in bytes for joined_block_split_single_row #88600

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More fine-grained control over hash join output block size #87913

More fine-grained control over hash join output block size #87913

Uh oh!

vdimir commented Sep 30, 2025 •

edited

Loading

Uh oh!

clickhouse-gh bot commented Sep 30, 2025 •

edited

Loading

Uh oh!

nickitat Oct 3, 2025

Uh oh!

Uh oh!

nickitat Oct 3, 2025

Uh oh!

vdimir Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nickitat Oct 3, 2025

Uh oh!

nickitat Oct 3, 2025

Uh oh!

vdimir Oct 8, 2025

Uh oh!

nickitat Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

More fine-grained control over hash join output block size #87913

More fine-grained control over hash join output block size #87913

Uh oh!

Conversation

vdimir commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickitat Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nickitat Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

vdimir Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nickitat Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

nickitat Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

vdimir Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

nickitat Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vdimir commented Sep 30, 2025 •

edited

Loading

clickhouse-gh bot commented Sep 30, 2025 •

edited

Loading