Skip to content

Fix std::out_of_range when reading from Buffer table with sample#99141

Merged
kssenii merged 2 commits intomasterfrom
fix-out-of-range
Mar 12, 2026
Merged

Fix std::out_of_range when reading from Buffer table with sample#99141
kssenii merged 2 commits intomasterfrom
fix-out-of-range

Conversation

@kssenii
Copy link
Member

@kssenii kssenii commented Mar 10, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix crash on usage of Buffer table with SAMPLE when destination does not support it.

Documentation entry for user-facing changes

Closes #99134

  • Documentation is written (mandatory for new features)

Note

Low Risk
Low risk bug fix that adds a guard and a regression test; behavior changes only when SAMPLE is used on tables without a sampling key, now returning a clear SAMPLING_NOT_SUPPORTED error instead of potentially crashing.

Overview
Prevents a crash when executing SAMPLE against a MergeTree-backed Buffer table whose destination MergeTree has no SAMPLE BY key.

MergeTreeDataSelectExecutor::getSampling() now explicitly checks metadata_snapshot->hasSamplingKey() before accessing the sampling key and throws ErrorCodes::SAMPLING_NOT_SUPPORTED with a clear message when absent.

Adds a stateless regression test covering the failing Buffer+MergeTree case (expects SAMPLING_NOT_SUPPORTED) and confirms SAMPLE still works for Buffer tables without a destination table (in-memory only).

Written by Cursor Bugbot for commit f5aec3e. This will update automatically on new commits. Configure here.

@kssenii kssenii requested a review from azat March 10, 2026 11:48
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Mar 10, 2026

Workflow [PR], commit [f5aec3e]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 10, 2026
/// If there is no destination, only the buffer is read, so sampling is always supported.
if (auto destination = getDestinationTable())
return destination->supportsSampling();
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is the following possible?

SELECT * FROM buffer:

  • getSampling() == return true -- because underlying table supports sampling
  • then another thread re-create table w/o sampling, in the middle of that SELECT
  • then StorageBuffer::read calls getDestinationTable again and got table that does not support it

I think so. Then we need a check in MergeTreeDataSelectExecutor::getSampling

@azat azat self-assigned this Mar 10, 2026
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Mar 10, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.80% +0.00%
Functions 23.90% 23.90% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: 100.00% (7/7)
Diff coverage report
Uncovered code

@kssenii kssenii requested a review from azat March 11, 2026 12:17
Copy link
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW do we have the same problem for i.e. views?

DROP TABLE IF EXISTS t0;
DROP TABLE IF EXISTS t1;

CREATE TABLE t0 (c0 Int) ENGINE = MergeTree ORDER BY tuple();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it also make sense to test Buffer over MergeTree table with sampling, just in case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there is such test already

CREATE TABLE hits_dst AS test.hits
ENGINE = MergeTree
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS storage_policy = 'default';
CREATE TABLE hits_buffer AS hits_dst ENGINE = Buffer(current_database(), hits_dst, 8, 600, 600, 1000000, 1000000, 100000000, 1000000000);

@kssenii
Copy link
Member Author

kssenii commented Mar 12, 2026

BTW do we have the same problem for i.e. views?

I'll check, but I'd merge this PR for now.

@kssenii kssenii added this pull request to the merge queue Mar 12, 2026
Merged via the queue into master with commit cee21f3 Mar 12, 2026
163 checks passed
@kssenii kssenii deleted the fix-out-of-range branch March 12, 2026 12:10
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: 'std::exception. Code: 1001, type: std::out_of_range, e.what() = vector (version 26.3.1.535 (official build)), Stack trace: (STID: None)

3 participants