Conversation
|
Workflow [PR], commit [f2f010a] Summary: ❌
AI ReviewSummaryThis PR adds per-block virtual-row emission for Findings💡 Nits
ClickHouse Rules
Final Verdict
|
77e79fe to
9443fbc
Compare
9443fbc to
ddc8174
Compare
This reverts commit 8761691.
34188b3 to
339a4ce
Compare
Replace `bool apply_virtual_row_conversions` with a three-state `VirtualRowAction` enum (`Skip`, `AsIs`, `Convert`) in `MergingSortedAlgorithm` for explicit control of virtual row handling. The `Skip` mode had two bugs where 0-row virtual row chunks created invalid cursors, permanently dropping sources from the merge queue: - In `initialize`: a virtual row as the init chunk produced a 0-row cursor that was never added to the priority queue, so the source was silently abandoned. - In `consume`: same issue — the 0-row cursor made the source disappear from the queue, and `merge` never requested more data. Fix both by returning early without consuming the chunk and adding the source to `sources_pending_data`, so the next `merge` call immediately requests fresh data from that source. The `SortingStep` merge now uses `Skip` (instead of the old `false`) because its header has expression-aliased column names (e.g. `__table1.k`) that don't match the PK column names in virtual rows (`k`), making `AsIs` produce wrong sort keys (default values). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit f837892.
1e67550 to
1c23f29
Compare
1c23f29 to
b039597
Compare
| min_free_disk_space = settings[Setting::min_free_disk_space_for_temporary_data]; | ||
| max_block_bytes = settings[Setting::prefer_external_sort_block_bytes]; | ||
| read_in_order_use_buffering = settings[Setting::read_in_order_use_buffering]; | ||
| read_in_order_use_buffering = settings[Setting::read_in_order_use_buffering] && !settings[Setting::read_in_order_use_virtual_row_per_block]; |
There was a problem hiding this comment.
read_in_order_use_virtual_row_per_block currently disables buffering globally, even when read_in_order_use_virtual_row is off (or virtual-row conversion is not applicable for this query).
That broad coupling can cause an unintended performance regression for unrelated read_in_order_use_buffering cases. The description says this setting should only matter "together with read_in_order_use_virtual_row".
Could we gate this only when virtual-row conversion is actually enabled for the SortingStep (e.g. apply_virtual_row_conversions path), instead of disabling buffering at settings-construction time?
| index_granularity_bytes = 1024, | ||
| min_index_granularity_bytes = 1024; | ||
|
|
||
| --- Write overlapping data to multiple parts, only one part matches the filter |
There was a problem hiding this comment.
Typo in comment: use -- for SQL comments instead of --- (currently it starts with three dashes).
test always fails with flaky check #101482 and passes in regular runs cidb
|
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
Fixing in #102022
... |
|
Overall looks Ok to me. |
4733194 to
f2f010a
Compare
|
|
||
| /// MergingSortedTransform supposed to consume virtual row | ||
| /// When there is no merging (only one stream) and virtual row conversions are enabled, we need to remove virtual row before output, | ||
| /// otherwise it can reach downstream steps and cause issues because of conversions are valid only for current step. |
There was a problem hiding this comment.
Minor typo in the comment: because of conversions are valid only for current step reads awkwardly.
Suggested wording:
because conversions are valid only for the current step.
LLVM Coverage Report
Changed lines: 91.58% (185/202) | lost baseline coverage: 7 line(s) · Uncovered code |
These tests seem to pass the flaky check when run individually, but they become too heavy when executed together with other tests (checked in #102530) We can either exclude them from the flaky check or reduce the level of parallelism there / run them in smaller batches (perhaps the slowdown is caused by the thread fuzzer). |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
read_in_order_use_virtual_rowis enabled together with the newread_in_order_use_virtual_row_per_blocksetting, virtual row boundary information is now emitted after each block read from MergeTree, allowing the merge to reprioritize sources mid-stream for parts whose data is fully filtered out by WHERE/PREWHERE/JOIN. Close Compute virtual row (read_in_order_use_virtual_row) for each block #99945Documentation entry for user-facing changes