Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE#97264
Conversation
…ng)` with PREWHERE When reading `.size` subcolumn of a sparse `Nullable(String)` inside a Tuple together with the full Tuple via PREWHERE, a LOGICAL_ERROR was thrown in `SerializationSparse::deserializeBinaryBulkWithMultipleStreams` because `offsets_column->size() + 1 != values_column->size()`. Two issues in `SerializationStringSize`: 1. `deserializeBinaryBulkStatePrefix` only set `need_string_data = true` when there was no state cache. When both `t.a.size` and `t` share a `SubstreamsCache` (keyed by column name), reading only sizes cached a `ColumnUInt64` under the `Substream::Regular` key, poisoning the cache for `SerializationString` which expects a `ColumnString`. 2. `deserializeWithStringData` cached the accumulated `string_state.column` directly. This column grows across marks (persistent state), so on mark 1+ it contained elements from all previous marks. When `insertDataFromCachedColumn` saw `cached_column->size() != num_read_rows`, it replaced `ColumnSparse`'s values column entirely, breaking the sparse invariant. The fix always sets `need_string_data = true` and caches only the current range via `cut(prev_size, num_read_rows)` instead of the full accumulated column. Found by AST fuzzer: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=97242&sha=60722e5f8a3edd99281b6bd858dee3f0f9bf84d7&name_0=PR&name_1=AST%20fuzzer%20%28amd_ubsan%29 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revert need_string_data back to conditional (only when !cache). SerializationString::deserializeBinaryBulkStatePrefix already upgrades the shared state to need_string_data=true when both the .size subcolumn and the full column are read. Forcing it unconditionally would disable the optimization for queries that only need string sizes (e.g., `length(s)`, `empty(s)`, `s != ''`). The actual bug fix (caching only the current range via cut) is sufficient on its own. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| -- a Tuple together with the full Tuple via PREWHERE used to cause a LOGICAL_ERROR | ||
| -- because the cached accumulated ColumnString broke ColumnSparse invariants. | ||
|
|
||
| DROP TABLE IF EXISTS t_sparse_string_size; |
There was a problem hiding this comment.
Yes, it reproduces the issue.
| /// If we cache the full accumulated column with num_read_rows < column->size(), | ||
| /// insertDataFromCachedColumn will see the size mismatch and replace the result | ||
| /// column entirely (e.g. ColumnSparse's values), breaking invariants. | ||
| auto column_for_cache = string_state.column->cut(prev_size, num_read_rows); |
There was a problem hiding this comment.
Does it affect performance?
The test used `ORDER BY ()` in the table and no `ORDER BY` in the SELECT query. With randomized insert settings, the INSERT may produce multiple parts, and `OPTIMIZE TABLE FINAL` merges them with undefined row order (empty sorting key). Adding `ORDER BY id` to the SELECT makes the output deterministic regardless of storage order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Re: performance question on |
Cherry pick #97264 to 25.11: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE
…of sparse Nullable(String) with PREWHERE
Cherry pick #97264 to 25.12: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE
…of sparse Nullable(String) with PREWHERE
Cherry pick #97264 to 26.1: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE
…f sparse Nullable(String) with PREWHERE
Backport #97264 to 26.1: Fix exception when reading .size subcolumn of sparse Nullable(String) with PREWHERE
|
|
||
| -- The bug manifested when reading t.a.size and t in the same readRows call | ||
| -- across multiple granules with PREWHERE. | ||
| SELECT t.a.size, id FROM t_sparse_string_size PREWHERE id % 11 = 0 WHERE toString(t) != '' ORDER BY id LIMIT 3; |
There was a problem hiding this comment.
It doesn't reproduce the issue, you need to remove the LIMIT 3 to reprouce it
|
It's a bad fix. The problem is not in I will revert this PR and close all the backports and create a proper fix. @alexey-milovidov please, wait for the review before merging such fixes. |
|
Better fix: #97515 |
Summary
LOGICAL_ERRORexception inSerializationSparse::deserializeBinaryBulkWithMultipleStreams(offsets_column->size() + 1 != values_column->size()) when reading.sizesubcolumn of a sparseNullable(String)inside a Tuple together with the full Tuple via PREWHERE.SerializationStringSize::deserializeWithStringDatacached the accumulatedstring_state.column(which grows across marks) instead of only the current range's data. On mark 1+,insertDataFromCachedColumnsaw the size mismatch and replacedColumnSparse's values column entirely, breaking the sparse invariant.cut(prev_size, num_read_rows), preserving the sizes-only optimization for queries that only need string sizes (length(s),empty(s), etc.).Found by AST fuzzer: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=97242&sha=60722e5f8a3edd99281b6bd858dee3f0f9bf84d7&name_0=PR&name_1=AST%20fuzzer%20%28amd_ubsan%29
Test plan
length(),empty(), standalone.sizereads)03924_sparse_string_size_subcolumn_prewhereChangelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix
LOGICAL_ERRORexception when reading.sizesubcolumn of a sparseNullable(String)in a Tuple with PREWHERE.🤖 Generated with Claude Code