Emit virtual row from read step by vdimir · Pull Request #100603 · ClickHouse/ClickHouse

vdimir · 2026-03-24T13:55:16Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

When read_in_order_use_virtual_row is enabled together with the new read_in_order_use_virtual_row_per_block setting, virtual row boundary information is now emitted after each block read from MergeTree, allowing the merge to reprioritize sources mid-stream for parts whose data is fully filtered out by WHERE/PREWHERE/JOIN. Close Compute virtual row (read_in_order_use_virtual_row) for each block #99945

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2026-03-24T13:55:57Z

Workflow [PR], commit [f2f010a]

Summary: ❌

job_name	test_name	status	info	comment
Stateless tests (amd_binary, flaky check)		failure
	03037_dynamic_merges_1_horizontal_compact_wide_tree	FAIL	cidb	IGNORED
	03037_dynamic_merges_1_horizontal_compact_merge_tree	FAIL	cidb	IGNORED
	03037_dynamic_merges_1_vertical_compact_merge_tree	FAIL	cidb	IGNORED
	03037_dynamic_merges_1_vertical_wide_merge_tree	FAIL	cidb	IGNORED
	03037_dynamic_merges_1_horizontal_compact_wide_tree	FAIL	cidb	IGNORED

AI Review

Summary

This PR adds per-block virtual-row emission for read_in_order to let MergingSortedTransform reprioritize sources more frequently under heavy filtering, and wires the behavior through MergeTreeSelectProcessor, ReadFromMergeTree, and sorting/merging pipeline pieces. I found one actionable issue in this revision: a typo/grammar problem in a newly added C++ comment, and posted it inline. Other previously raised concerns from clickhouse-gh[bot] about gating and docs appear already addressed in the current head.

Findings

💡 Nits

[src/Processors/QueryPlan/SortingStep.cpp:46] New inline comment has awkward grammar: because of conversions are valid only for current step.
Suggested fix: because conversions are valid only for the current step.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	✅
PR metadata quality	⚠️	PR marks `Documentation is written`, but this review did not find docs changes for new setting `read_in_order_use_virtual_row_per_block` in the diff.
Safe rollout	✅
Compilation time	✅
No large/binary files	✅

Final Verdict

Status: ⚠️ Request changes
Minimum required actions:
- Fix the typo/grammar in src/Processors/QueryPlan/SortingStep.cpp comment.
- Ensure the checked documentation claim is satisfied for new user-facing setting read_in_order_use_virtual_row_per_block (or update PR metadata accordingly).

This reverts commit 8761691.

Replace `bool apply_virtual_row_conversions` with a three-state `VirtualRowAction` enum (`Skip`, `AsIs`, `Convert`) in `MergingSortedAlgorithm` for explicit control of virtual row handling. The `Skip` mode had two bugs where 0-row virtual row chunks created invalid cursors, permanently dropping sources from the merge queue: - In `initialize`: a virtual row as the init chunk produced a 0-row cursor that was never added to the priority queue, so the source was silently abandoned. - In `consume`: same issue — the 0-row cursor made the source disappear from the queue, and `merge` never requested more data. Fix both by returning early without consuming the chunk and adding the source to `sources_pending_data`, so the next `merge` call immediately requests fresh data from that source. The `SortingStep` merge now uses `Skip` (instead of the old `false`) because its header has expression-aliased column names (e.g. `__table1.k`) that don't match the PK column names in virtual rows (`k`), making `AsIs` produce wrong sort keys (default values). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit f837892.

clickhouse-gh · 2026-04-01T10:29:46Z

    min_free_disk_space = settings[Setting::min_free_disk_space_for_temporary_data];
    max_block_bytes = settings[Setting::prefer_external_sort_block_bytes];
-    read_in_order_use_buffering = settings[Setting::read_in_order_use_buffering];
+    read_in_order_use_buffering = settings[Setting::read_in_order_use_buffering] && !settings[Setting::read_in_order_use_virtual_row_per_block];


read_in_order_use_virtual_row_per_block currently disables buffering globally, even when read_in_order_use_virtual_row is off (or virtual-row conversion is not applicable for this query).

That broad coupling can cause an unintended performance regression for unrelated read_in_order_use_buffering cases. The description says this setting should only matter "together with read_in_order_use_virtual_row".

Could we gate this only when virtual-row conversion is actually enabled for the SortingStep (e.g. apply_virtual_row_conversions path), instead of disabling buffering at settings-construction time?

clickhouse-gh · 2026-04-01T12:30:19Z

+    index_granularity_bytes = 1024,
+    min_index_granularity_bytes = 1024;
+
+--- Write overlapping data to multiple parts, only one part matches the filter


Typo in comment: use -- for SQL comments instead of --- (currently it starts with three dashes).

vdimir · 2026-04-01T14:02:34Z

Stateless tests (amd_asan_ubsan, flaky check) failure
03582_pr_read_in_order_hits FAIL cidb
Stateless tests (amd_tsan, flaky check) failure
03582_pr_read_in_order_hits FAIL cidb

test always fails with flaky check #101482 and passes in regular runs cidb

Integration tests (amd_llvm_coverage, 5/5) failure
test_overcommit_tracker/test.py::test_user_overcommit FAIL cidb
Stress test (arm_msan) failure

#101381

Server died FAIL cidb
MemorySanitizer: use-of-uninitialized-value (STID: 1003-358c) FAIL cidb, issue

ClickHouse/SimSIMD#18

alexey-milovidov · 2026-04-07T00:27:00Z

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

vdimir · 2026-04-08T15:23:14Z

Stateless tests (amd_tsan, parallel, 2/2) failure
04075_analyzer_inline_view_subcolumns FAIL cidb

Fixing in #102022

BuzzHouse (amd_msan) failure
MemorySanitizer: use-of-uninitialized-value (STID: 1288-3e88) FAIL cidb

...

davenger · 2026-04-08T16:31:01Z

Overall looks Ok to me.
Maybe add a test with WHERE or PREWHERE that fully filters-out some of the blocks?

clickhouse-gh · 2026-04-11T09:30:38Z

+
+/// MergingSortedTransform supposed to consume virtual row
+/// When there is no merging (only one stream) and virtual row conversions are enabled, we need to remove virtual row before output,
+/// otherwise it can reach downstream steps and cause issues because of conversions are valid only for current step.


Minor typo in the comment: because of conversions are valid only for current step reads awkwardly.

Suggested wording:
because conversions are valid only for the current step.

clickhouse-gh · 2026-04-11T12:06:39Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.00%	84.10%	+0.10%
Functions	90.90%	90.90%	+0.00%
Branches	76.50%	76.60%	+0.10%

Changed lines: 91.58% (185/202) | lost baseline coverage: 7 line(s) · Uncovered code

Full report · Diff report

vdimir · 2026-04-13T17:30:33Z

Stateless tests (amd_binary, flaky check) failure
03037_dynamic_merges_1_horizontal_compact_wide_tree FAIL cidb
03037_dynamic_merges_1_horizontal_compact_merge_tree FAIL cidb

These tests seem to pass the flaky check when run individually, but they become too heavy when executed together with other tests (checked in #102530)

We can either exclude them from the flaky check or reduce the level of parallelism there / run them in smaller batches (perhaps the slowdown is caused by the thread fuzzer).

clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label Mar 24, 2026

clickhouse-gh Bot reviewed Mar 24, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04049_virtual_row_overlap.sql Outdated

clickhouse-gh Bot reviewed Mar 24, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04049_virtual_row_overlap.sql Outdated

vdimir force-pushed the vdimir/virtual_row_per_granule branch from 77e79fe to 9443fbc Compare March 24, 2026 16:09

vdimir added 4 commits March 24, 2026 16:56

Emit virtual row from read step

edc6421

polish code

e06c486

skip optimize table in 04049_virtual_row_overlap

0aa9fdb

fix virtual row handling in BufferChunksTransform

ddc8174

vdimir force-pushed the vdimir/virtual_row_per_granule branch from 9443fbc to ddc8174 Compare March 26, 2026 16:46

vdimir marked this pull request as ready for review March 26, 2026 18:03

vdimir added 2 commits March 30, 2026 17:24

fix missing virtual row conversions

8761691

Revert "fix missing virtual row conversions"

e1c101c

This reverts commit 8761691.

vdimir force-pushed the vdimir/virtual_row_per_granule branch from 34188b3 to 339a4ce Compare March 31, 2026 16:06

vdimir and others added 2 commits March 31, 2026 17:58

Revert "fix VirtualRowAction::Skip dropping sources from merge queue"

341978b

This reverts commit f837892.

vdimir force-pushed the vdimir/virtual_row_per_granule branch from 1e67550 to 1c23f29 Compare March 31, 2026 18:04

fix virtual row when MergingSortedTransform ommited for single stream

b039597

vdimir force-pushed the vdimir/virtual_row_per_granule branch from 1c23f29 to b039597 Compare March 31, 2026 18:48

vdimir added 2 commits April 1, 2026 10:18

add setting read_in_order_use_virtual_row_per_block

7b0c19b

Merge branch 'master' into vdimir/virtual_row_per_granule

1eece61

clickhouse-gh Bot reviewed Apr 1, 2026

View reviewed changes

vdimir added 2 commits April 1, 2026 10:43

revert changes in 03031_read_in_order_optimization_with_virtual_row

23c10dd

small fix for read_in_order_use_buffering

76cfa36

clickhouse-gh Bot reviewed Apr 1, 2026

View reviewed changes

disable prelimitary merge with read_in_order_use_virtual_row_per_block

368a6b6

clickhouse-gh Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread src/Processors/QueryPlan/ReadFromMergeTree.cpp Outdated

vdimir requested a review from davenger April 2, 2026 14:39

Merge branch 'master' into vdimir/virtual_row_per_granule

8b16e42

clickhouse-gh Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/Core/Settings.cpp

Disable features for virtual_row_per_block only when applied

55c710e

davenger approved these changes Apr 8, 2026

View reviewed changes

davenger self-assigned this Apr 8, 2026

vdimir added 2 commits April 11, 2026 09:14

Merge branch 'master' into vdimir/virtual_row_per_granule

f4b18d8

Add prewhrere case to 04049_virtual_row_overlap

f2f010a

vdimir force-pushed the vdimir/virtual_row_per_granule branch from 4733194 to f2f010a Compare April 11, 2026 09:22

clickhouse-gh Bot reviewed Apr 11, 2026

View reviewed changes

vdimir added this pull request to the merge queue Apr 13, 2026

Merged via the queue into master with commit e877d30 Apr 13, 2026
162 of 164 checks passed

vdimir deleted the vdimir/virtual_row_per_granule branch April 13, 2026 17:53

robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit virtual row from read step#100603

Emit virtual row from read step#100603
vdimir merged 18 commits intomasterfrom
vdimir/virtual_row_per_granule

vdimir commented Mar 24, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot commented Mar 24, 2026 •

edited by vdimir

Loading

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot Apr 1, 2026

Uh oh!

clickhouse-gh Bot Apr 1, 2026

Uh oh!

vdimir commented Apr 1, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Apr 7, 2026

Uh oh!

Uh oh!

vdimir commented Apr 8, 2026 •

edited

Loading

Uh oh!

davenger commented Apr 8, 2026

Uh oh!

clickhouse-gh Bot Apr 11, 2026

Uh oh!

clickhouse-gh Bot commented Apr 11, 2026

Uh oh!

vdimir commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vdimir commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented Mar 24, 2026 • edited by vdimir Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

ClickHouse Rules

Final Verdict

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

vdimir commented Apr 1, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Apr 7, 2026

Uh oh!

Uh oh!

vdimir commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davenger commented Apr 8, 2026

Uh oh!

clickhouse-gh Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot commented Apr 11, 2026

LLVM Coverage Report

Uh oh!

vdimir commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vdimir commented Mar 24, 2026 •

edited

Loading

clickhouse-gh Bot commented Mar 24, 2026 •

edited by vdimir

Loading

vdimir commented Apr 8, 2026 •

edited

Loading