Optimize hash join probe with software prefetch#102444
Optimize hash join probe with software prefetch#102444wudidapaopao wants to merge 18 commits intoClickHouse:masterfrom
Conversation
Add `enable_software_prefetch_in_join` setting (default true) to enable software prefetch during hash join probe, following the same pattern as `enable_software_prefetch_in_aggregation`. When the hash table is large enough (>4x L2 cache), prefetch future rows' hash table slots to hide memory access latency. Uses adaptive look-ahead via `PrefetchingHelper`. Applied to all three probe functions: `joinRightColumns` (single map), `joinRightColumns` (multiple maps), and `joinRightColumnsWithAdditionalFilter`.
|
Workflow [PR], commit [206eee5] Summary: ✅ AI ReviewSummaryThis PR adds adaptive software prefetching for hash JOIN build/probe paths and wires the new Findings
ClickHouse Rules
Final VerdictStatus: Minimum required action:
|
…ed `sysconf` calls in join probe hot path
…uce header fan-out
682fc4d to
8aa2547
Compare
|
Hi @Fgrtue, this is ready for review when you have time, remaining CI failures are unrelated to this PR. Thanks! |
|
@nickitat FYI^^ |
|
I will have a final look after @Fgrtue. At first glance looks fine |
Fgrtue
left a comment
There was a problem hiding this comment.
It seems that the same prefetching technique could be used in insertFromBlockImplTypeCase function on the build side. Did you consider adding it?
Other than that it look good!
| if constexpr (can_prefetch) | ||
| { | ||
| if (use_prefetch) | ||
| { | ||
| if (row_idx == PrefetchingHelper::iterationsToMeasure()) | ||
| prefetch_look_ahead = prefetching.calcPrefetchLookAhead(); | ||
|
|
||
| if (row_idx + prefetch_look_ahead < selector_size) | ||
| { | ||
| size_t prefetch_ind = selector[row_idx + prefetch_look_ahead]; | ||
| auto key_holder = key_getter_vector[0].getKeyHolder(prefetch_ind, *pool); | ||
| mapv[0]->prefetch(std::move(key_holder)); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
It seems that the same logic repeats in all three cases, what do you think if we put it into separate function and just call it from the three places?
There was a problem hiding this comment.
Good point, done. Extracted the shared logic into helpers (shouldUseJoinPrefetch, JoinPrefetcher , makeJoinPrefetcher).
Also extract some redundant code into shared helper functions.
|
|
||
| auto ind = selector[row_idx]; | ||
| KnownRowsHolder<true> all_flag_known_rows; | ||
| KnownRowsHolder<false> single_flag_know_rows; |
There was a problem hiding this comment.
Typo: single_flag_know_rows looks like it should be single_flag_known_rows for readability/consistency with all_flag_known_rows.
There was a problem hiding this comment.
This typo is still present in the current head: single_flag_know_rows.
Please rename it to single_flag_known_rows for consistency with all_flag_known_rows.
There was a problem hiding this comment.
This still looks unresolved in the current head: the variable is still spelled single_flag_know_rows at this location (and at the corresponding use site below).
Please rename it to single_flag_known_rows to match all_flag_known_rows and avoid carrying the typo forward.
That makes sense. |
LLVM Coverage Report
Changed lines: 88.12% (141/160) · Uncovered code |
Add software prefetch during hash join probe to hide memory access latency on large hash tables. Reuses the same
PrefetchingHelperinfrastructure already used by aggregation. Controlled by a new settingenable_software_prefetch_in_join(default:true).Benchmark (TPC-H SF100, AWS r6i.4xlarge, 8C/16T, 128 GB, Ubuntu 24.04)
5 runs, median, default settings. OFF =
enable_software_prefetch_in_join = false.Q04 (−17.6%) and Q22 (−15.6%) show the largest gains — both are join-heavy queries where the right-side hash table is large enough for prefetch to effectively hide cache miss latency. Most other queries show modest 1–5% improvements; no meaningful regressions observed.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add software prefetch in hash join probe phase to reduce memory access latency for large hash tables, controlled by setting
enable_software_prefetch_in_join.Documentation entry for user-facing changes
New setting
enable_software_prefetch_in_join(Bool, defaulttrue): enables software prefetch during hash join probe to hide memory latency when the hash table exceeds L2 cache size.