Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE by alexey-milovidov · Pull Request #97264 · ClickHouse/ClickHouse

alexey-milovidov · 2026-02-18T09:26:40Z

Summary

Fix a LOGICAL_ERROR exception in SerializationSparse::deserializeBinaryBulkWithMultipleStreams (offsets_column->size() + 1 != values_column->size()) when reading .size subcolumn of a sparse Nullable(String) inside a Tuple together with the full Tuple via PREWHERE.
The root cause: SerializationStringSize::deserializeWithStringData cached the accumulated string_state.column (which grows across marks) instead of only the current range's data. On mark 1+, insertDataFromCachedColumn saw the size mismatch and replaced ColumnSparse's values column entirely, breaking the sparse invariant.
Fix caches only the current range via cut(prev_size, num_read_rows), preserving the sizes-only optimization for queries that only need string sizes (length(s), empty(s), etc.).

Found by AST fuzzer: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=97242&sha=60722e5f8a3edd99281b6bd858dee3f0f9bf84d7&name_0=PR&name_1=AST%20fuzzer%20%28amd_ubsan%29

Test plan

Verified fix with the minimal reproducer from the fuzzer
Verified sizes-only optimization still works (length(), empty(), standalone .size reads)
Added regression test 03924_sparse_string_size_subcolumn_prewhere

Changelog category (leave one):

Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix LOGICAL_ERROR exception when reading .size subcolumn of a sparse Nullable(String) in a Tuple with PREWHERE.

🤖 Generated with Claude Code

…ng)` with PREWHERE When reading `.size` subcolumn of a sparse `Nullable(String)` inside a Tuple together with the full Tuple via PREWHERE, a LOGICAL_ERROR was thrown in `SerializationSparse::deserializeBinaryBulkWithMultipleStreams` because `offsets_column->size() + 1 != values_column->size()`. Two issues in `SerializationStringSize`: 1. `deserializeBinaryBulkStatePrefix` only set `need_string_data = true` when there was no state cache. When both `t.a.size` and `t` share a `SubstreamsCache` (keyed by column name), reading only sizes cached a `ColumnUInt64` under the `Substream::Regular` key, poisoning the cache for `SerializationString` which expects a `ColumnString`. 2. `deserializeWithStringData` cached the accumulated `string_state.column` directly. This column grows across marks (persistent state), so on mark 1+ it contained elements from all previous marks. When `insertDataFromCachedColumn` saw `cached_column->size() != num_read_rows`, it replaced `ColumnSparse`'s values column entirely, breaking the sparse invariant. The fix always sets `need_string_data = true` and caches only the current range via `cut(prev_size, num_read_rows)` instead of the full accumulated column. Found by AST fuzzer: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=97242&sha=60722e5f8a3edd99281b6bd858dee3f0f9bf84d7&name_0=PR&name_1=AST%20fuzzer%20%28amd_ubsan%29 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert need_string_data back to conditional (only when !cache). SerializationString::deserializeBinaryBulkStatePrefix already upgrades the shared state to need_string_data=true when both the .size subcolumn and the full column are read. Forcing it unconditionally would disable the optimization for queries that only need string sizes (e.g., `length(s)`, `empty(s)`, `s != ''`). The actual bug fix (caching only the current range via cut) is sufficient on its own. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

clickhouse-gh · 2026-02-18T09:27:24Z

Workflow [PR], commit [e646986]

Summary: ✅

alexey-milovidov · 2026-02-18T09:28:02Z

tests/queries/0_stateless/03924_sparse_string_size_subcolumn_prewhere.sql

+-- a Tuple together with the full Tuple via PREWHERE used to cause a LOGICAL_ERROR
+-- because the cached accumulated ColumnString broke ColumnSparse invariants.
+
+DROP TABLE IF EXISTS t_sparse_string_size;


Yes, it reproduces the issue.

alexey-milovidov · 2026-02-18T09:28:30Z

src/DataTypes/Serializations/SerializationStringSize.cpp

+        /// If we cache the full accumulated column with num_read_rows < column->size(),
+        /// insertDataFromCachedColumn will see the size mismatch and replace the result
+        /// column entirely (e.g. ColumnSparse's values), breaking invariants.
+        auto column_for_cache = string_state.column->cut(prev_size, num_read_rows);


Does it affect performance?

The test used `ORDER BY ()` in the table and no `ORDER BY` in the SELECT query. With randomized insert settings, the INSERT may produce multiple parts, and `OPTIMIZE TABLE FINAL` merges them with undefined row order (empty sorting key). Adding `ORDER BY id` to the SELECT makes the output deterministic regardless of storage order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alexey-milovidov · 2026-02-19T16:14:11Z

Re: performance question on cut() at line 135 — the cut() only runs when need_string_data is true, which happens when both the .size subcolumn and the full string column are read in the same query (e.g., SELECT t.a.size ... WHERE toString(t)). In the common case — reading only length(s) / empty(s) / .size without the full string — the sizes-only path (deserializeWithoutStringData) is taken and there is no cut(). The cost is proportional to index_granularity rows per mark, not accumulated data.

Cherry pick #97264 to 25.11: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE

…of sparse Nullable(String) with PREWHERE

Cherry pick #97264 to 25.12: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE

…of sparse Nullable(String) with PREWHERE

Cherry pick #97264 to 26.1: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE

…f sparse Nullable(String) with PREWHERE

Backport #97264 to 26.1: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE

Avogar · 2026-02-20T17:51:37Z

tests/queries/0_stateless/03924_sparse_string_size_subcolumn_prewhere.sql

+
+-- The bug manifested when reading t.a.size and t in the same readRows call
+-- across multiple granules with PREWHERE.
+SELECT t.a.size, id FROM t_sparse_string_size PREWHERE id % 11 = 0 WHERE toString(t) != '' ORDER BY id LIMIT 3;


It doesn't reproduce the issue, you need to remove the LIMIT 3 to reprouce it

Avogar · 2026-02-20T17:55:10Z

It's a bad fix. The problem is not in SerializationStringSize. The problem is in SerializationSparse. Before deserializing sparse elements we need to set settings.insert_only_rows_in_current_range_from_substreams_cache to true. The issue happens because values in ColumnSparse always have one additional row with default value in the beginning, so we cannot use column from the substreams cache, because it doesnt contain it.

I will revert this PR and close all the backports and create a proper fix. @alexey-milovidov please, wait for the review before merging such fixes.

Avogar · 2026-02-20T18:08:39Z

Better fix: #97515

alexey-milovidov and others added 2 commits February 18, 2026 10:11

clickhouse-gh bot added pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! labels Feb 18, 2026

alexey-milovidov commented Feb 18, 2026

View reviewed changes

alexey-milovidov requested a review from CurtizJ February 18, 2026 09:28

alexey-milovidov added the pr-performance Pull request with some performance improvements label Feb 19, 2026

alexey-milovidov requested a review from Avogar February 19, 2026 23:24

alexey-milovidov self-assigned this Feb 19, 2026

alexey-milovidov merged commit 0d3b93d into master Feb 19, 2026
146 of 147 checks passed

alexey-milovidov deleted the fix-sparse-string-size-subcolumn-prewhere branch February 19, 2026 23:24

robot-ch-test-poll3 added pr-synced-to-cloud The PR is synced to the cloud repo pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Feb 19, 2026

robot-ch-test-poll2 added a commit that referenced this pull request Feb 20, 2026

Merge pull request #97445 from ClickHouse/cherrypick/25.11/97264

5280aed

Cherry pick #97264 to 25.11: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE

robot-clickhouse added a commit that referenced this pull request Feb 20, 2026

Backport #97264 to 25.11: Fix exception when reading .size subcolumn …

f201e6c

…of sparse Nullable(String) with PREWHERE

This was referenced Feb 20, 2026

Backport #97264 to 25.11: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE #97446

Closed

Cherry pick #97264 to 25.12: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE #97447

Merged

robot-ch-test-poll2 added a commit that referenced this pull request Feb 20, 2026

Merge pull request #97447 from ClickHouse/cherrypick/25.12/97264

953d69f

Cherry pick #97264 to 25.12: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE

robot-clickhouse added a commit that referenced this pull request Feb 20, 2026

Backport #97264 to 25.12: Fix exception when reading .size subcolumn …

f0daa16

…of sparse Nullable(String) with PREWHERE

This was referenced Feb 20, 2026

Backport #97264 to 25.12: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE #97448

Closed

Cherry pick #97264 to 26.1: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE #97449

Merged

robot-ch-test-poll2 added a commit that referenced this pull request Feb 20, 2026

Merge pull request #97449 from ClickHouse/cherrypick/26.1/97264

8ab5379

Cherry pick #97264 to 26.1: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE

robot-clickhouse added a commit that referenced this pull request Feb 20, 2026

Backport #97264 to 26.1: Fix exception when reading .size subcolumn o…

ac25e02

…f sparse Nullable(String) with PREWHERE

robot-ch-test-poll2 mentioned this pull request Feb 20, 2026

Backport #97264 to 26.1: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE #97450

Merged

clickhouse-gh bot added a commit that referenced this pull request Feb 20, 2026

Merge pull request #97450 from ClickHouse/backport/26.1/97264

67f798f

Backport #97264 to 26.1: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE

Avogar reviewed Feb 20, 2026

View reviewed changes

Avogar mentioned this pull request Feb 20, 2026

Revert "Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE" #97512

Merged

robot-clickhouse-ci-2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE#97264

Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE#97264
alexey-milovidov merged 3 commits intomasterfrom
fix-sparse-string-size-subcolumn-prewhere

alexey-milovidov commented Feb 18, 2026

Uh oh!

clickhouse-gh bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

alexey-milovidov Feb 18, 2026

Uh oh!

alexey-milovidov Feb 18, 2026

Uh oh!

alexey-milovidov commented Feb 19, 2026

Uh oh!

Uh oh!

Avogar Feb 20, 2026

Uh oh!

Avogar commented Feb 20, 2026 •

edited

Loading

Uh oh!

Avogar commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alexey-milovidov commented Feb 18, 2026

Summary

Test plan

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Uh oh!

clickhouse-gh bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-milovidov Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Feb 19, 2026

Uh oh!

Uh oh!

Avogar Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Avogar commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Avogar commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

clickhouse-gh bot commented Feb 18, 2026 •

edited

Loading

Avogar commented Feb 20, 2026 •

edited

Loading