Skip to content

Support read in reverse order with virtual row#99198

Merged
vdimir merged 2 commits intomasterfrom
vdimir/read_in_reverse_order_virtual_row
Mar 13, 2026
Merged

Support read in reverse order with virtual row#99198
vdimir merged 2 commits intomasterfrom
vdimir/read_in_reverse_order_virtual_row

Conversation

@vdimir
Copy link
Member

@vdimir vdimir commented Mar 10, 2026

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

  • Support the read_in_order_use_virtual_row optimization for reverse-order reads.

Note

Medium Risk
Touches MergeTree read-in-order execution paths and primary-key index access, which could affect query correctness/performance for reverse-order reads if edge cases around marks/index sizes are missed.

Overview
Enables the read_in_order_use_virtual_row optimization when reading MergeTree parts in reverse order (ReadType::InReverseOrder) by selecting the appropriate mark position from the part’s primary-key index and guarding against missing PK values (e.g., final mark / index bounds).

Updates the 03711_read_in_order_through_join stateless test to run the reverse-order join case with the same log_comment for both LEFT and INNER joins and adjusts expected OK rows accordingly.

Written by Cursor Bugbot for commit 7ecea2c. This will update automatically on new commits. Configure here.

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Mar 10, 2026

Workflow [PR], commit [7ecea2c]

Summary:

job_name test_name status info comment
Stress test (amd_msan) failure
Logical error: There is a race condition between creation and removal of replicated table. It's a bug (STID: 2508-1e6e) FAIL cidb, issue ISSUE CREATED
Upgrade check (amd_release) failure
Error message in clickhouse-server.log (see upgrade_error_messages.txt) FAIL cidb IGNORED

AI Review

1) Summary

This PR extends read_in_order_use_virtual_row to reverse-order reads by taking the virtual-row key from the range end mark when final marks are available, and updates 03711_read_in_order_through_join expectations so reverse-order INNER join also keeps read-in-order optimization. High-level verdict: the change is coherent and I did not find high-confidence correctness/safety issues in the final diff.

2) Missing context (if any)

  • No Praktika CI report links or failed-job logs were provided in the task context.
  • No benchmark results were provided for reverse-order read-in-order behavior.

3) Findings (by severity)

  • Blockers: None.
  • Majors: None.

4) Tests & Evidence

  • Coverage assessment: the updated stateless test now validates both LEFT and INNER join cases for reverse-order read-in-order path and checks query-log behavior under a fixed log_comment, which is directly relevant to this change.
  • Negative/error-path tests: not added in this PR (e.g., mixed parts with and without final marks).
  • Additional tests to consider (follow-up, non-blocking): add a targeted case with parts lacking final marks to confirm safe fallback (no virtual row for those parts) and stable read-in-order correctness with joins.

5) ClickHouse Compliance Checklist (Yes/No + short note)

  • Data deletions logged? Yes — no deletion-path changes in this PR.
  • Serialization formats versioned? Yes — no serialization/protocol format changes.
  • Experimental setting gate present? Yes — no new feature flag required; this extends existing guarded virtual-row behavior.
  • Settings exposed for constants/thresholds? Yes — no new hardcoded tunables introduced.
  • Backward compatibility preserved? Yes — behavior remains conditional on existing settings and part metadata.
  • SettingsHistory.cpp updated for new/changed settings? Yes — no settings added/changed in final diff.
  • Existing tests untouched (only additions)? No — one existing test was adjusted (expected for behavior extension).
  • Docs/user-facing notes updated? N/A — internal optimizer behavior/test update only.
  • Core-area change got extra scrutiny? Yes — reviewed ReadFromMergeTree read-in-order virtual-row logic specifically.

6) Performance & Safety Notes

  • Hot-path impact appears low and localized: one additional branch for reverse-order virtual-row eligibility and mark-position selection.
  • Memory/concurrency risk in the final diff is minimal (no new shared state, allocations, or threading primitives introduced).
  • Benchmark data is missing; minimal reproducible benchmark: compare ORDER BY ... DESC LIMIT N with INNER JOIN on multi-part MergeTree datasets (with/without final marks), capturing read_rows and elapsed time.

7) User-Lens Review

  • User-visible behavior becomes more intuitive: reverse-order INNER join can now keep the same read-in-order optimization path as LEFT in covered scenarios.
  • Error/log surface is unchanged; no regressions in operator observability were found in the diff.

8) Final Verdict

  • Status: Approve

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Mar 10, 2026
@vdimir vdimir force-pushed the vdimir/read_in_reverse_order_virtual_row branch from 38ed069 to 835da66 Compare March 10, 2026 15:17
@vdimir vdimir changed the title wip support read in reverse order with virtual row Support read in reverse order with virtual row Mar 11, 2026
@vdimir vdimir marked this pull request as ready for review March 11, 2026 09:22
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Mar 11, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.20% 83.80% +0.60%
Functions 23.60% 23.80% +0.20%
Branches 75.30% 76.30% +1.00%

PR changed lines: PR changed-lines coverage: 100.00% (21/21)
Diff coverage report
Uncovered code

@Fgrtue Fgrtue self-assigned this Mar 12, 2026
@vdimir vdimir added pr-performance Pull request with some performance improvements and removed pr-improvement Pull request with some product improvements labels Mar 12, 2026
Copy link
Contributor

@Fgrtue Fgrtue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good in my opinion. The only thing that concerns me is the Stress test failure. @vdimir do you think it could be related?

@vdimir vdimir added this pull request to the merge queue Mar 13, 2026
@vdimir
Copy link
Member Author

vdimir commented Mar 13, 2026

The change looks good in my opinion. The only thing that concerns me is the Stress test failure. @vdimir do you think it could be related?

I doubt this change can cause it, I created new issue but probably it's duplicate of #85771, since same error happens in CI with different stack trace

Merged via the queue into master with commit 1082973 Mar 13, 2026
161 of 164 checks passed
@vdimir vdimir deleted the vdimir/read_in_reverse_order_virtual_row branch March 13, 2026 12:26
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 13, 2026
zlareb1 added a commit to zlareb1/ClickHouse that referenced this pull request Mar 17, 2026
PR ClickHouse#99198 extends `read_in_order_use_virtual_row` to `InReverseOrder` reads
(ReadFromMergeTree.cpp:745-777). The PR's own test covers the reverse path
only via joins; this adds a dedicated non-join test with multiple parts.

`04045` exercises: DESC with preliminary merge, without preliminary merge,
with a WHERE filter, and with a multi-column sort key. All four queries use
two non-overlapping MergeTree parts to activate the reverse-order virtual
row merge path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
zlareb1 added a commit to zlareb1/ClickHouse that referenced this pull request Mar 17, 2026
PR ClickHouse#99198 extends `read_in_order_use_virtual_row` to `InReverseOrder` reads
(ReadFromMergeTree.cpp:745-777). The PR's own test covers the reverse path
only via joins; this adds a dedicated non-join test with multiple parts.

`04045` exercises: DESC with preliminary merge, without preliminary merge,
with a WHERE filter, and with a multi-column sort key. All four queries use
two non-overlapping MergeTree parts to activate the reverse-order virtual
row merge path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 19, 2026
…al-row

Add test for DESC order virtual row optimization (PR #99198)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants