Skip to content

Fix incorrect getSerializedValueSize for ColumnLowCardinality with nullable dictionary#97647

Merged
alexey-milovidov merged 2 commits intomasterfrom
fix-lc-nullable-serialized-value-size
Feb 23, 2026
Merged

Fix incorrect getSerializedValueSize for ColumnLowCardinality with nullable dictionary#97647
alexey-milovidov merged 2 commits intomasterfrom
fix-lc-nullable-serialized-value-size

Conversation

@alexey-milovidov
Copy link
Member

Summary

  • ColumnLowCardinality did not override getSerializedValueSize, defaulting to byteSizeAt which doesn't account for the null flag byte written by ColumnUnique::serializeValueIntoArena/Memory for nullable dictionaries
  • For LowCardinality(Nullable(Int64)) with a NULL value, byteSizeAt returned 8 but the actual serialized size is 1 (just the null flag), leaving 7 uninitialized bytes in the arena
  • This caused key deserialization corruption in aggregation operations with ROLLUP/CUBE when LowCardinality(Nullable(...)) appeared inside a Nullable(Tuple(...)) GROUP BY key
  • Fixed by overriding getSerializedValueSize in both ColumnUnique (to account for the null flag byte) and ColumnLowCardinality (to delegate to the dictionary)

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=b053840ef38b3b36bc7fb44fa6d5fb129571b2cd&name_0=MasterCI&name_1=AST+fuzzer+%28arm_asan%29

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix logical error exception during GROUP BY ... WITH ROLLUP/CUBE when keys include LowCardinality(Nullable(...)) inside Nullable(Tuple(...)).

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

🤖 Generated with Claude Code

…h nullable dictionary

`ColumnLowCardinality` did not override `getSerializedValueSize`, so it
used the default `byteSizeAt` which does not account for the null flag
byte that `ColumnUnique::serializeValueIntoArena/Memory` writes for
nullable dictionaries. For example, for `LowCardinality(Nullable(Int64))`
with a NULL value, `byteSizeAt` returned 8 but the actual serialized
size is 1 (just the null flag).

This caused `IColumnHelper::serializeValueIntoArenaWithNull` to allocate
the wrong amount of space (too much), leaving uninitialized bytes in the
serialized key. When a subsequent key column (e.g. a `ColumnString`) tried
to deserialize from the ReadBuffer, it read garbage as the string size,
triggering a logical error exception in ASan builds or
`CANNOT_ALLOCATE_MEMORY` in release builds.

The fix overrides `getSerializedValueSize` in both `ColumnUnique` (to
account for the null flag byte) and `ColumnLowCardinality` (to delegate
to the dictionary).

https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=b053840ef38b3b36bc7fb44fa6d5fb129571b2cd&name_0=MasterCI&name_1=AST+fuzzer+%28arm_asan%29

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Feb 22, 2026

Workflow [PR], commit [c6e491b]

Summary:

@clickhouse-gh clickhouse-gh bot added pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! labels Feb 22, 2026
CREATE TABLE t_rollup_lc_nullable (value Nullable(Tuple(LowCardinality(Nullable(Int64))))) ENGINE = Memory;
INSERT INTO t_rollup_lc_nullable VALUES ((NULL));

SELECT 1 FROM t_rollup_lc_nullable GROUP BY value, 'foo' WITH ROLLUP;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it reproduces the issue.

@alexey-milovidov alexey-milovidov self-assigned this Feb 22, 2026
@alexey-milovidov
Copy link
Member Author

The change looks good to me.

Comment on lines +479 to +480
if (!nested_size)
return std::nullopt;
Copy link
Member

@nihalzp nihalzp Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the types that ColumnUnique supports cannot return std::nullopt. I guess this might be better because if in the future ColumnUnique expands supported types, this will not break.

@nihalzp nihalzp self-assigned this Feb 23, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Feb 23, 2026
Merged via the queue into master with commit e7b468d Feb 23, 2026
148 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-lc-nullable-serialized-value-size branch February 23, 2026 03:57
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 23, 2026
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Feb 23, 2026
robot-clickhouse added a commit that referenced this pull request Feb 23, 2026
Cherry pick #97647 to 25.12: Fix incorrect getSerializedValueSize for ColumnLowCardinality with nullable dictionary
robot-clickhouse added a commit that referenced this pull request Feb 23, 2026
…lumnLowCardinality with nullable dictionary
robot-clickhouse added a commit that referenced this pull request Feb 23, 2026
Cherry pick #97647 to 26.1: Fix incorrect getSerializedValueSize for ColumnLowCardinality with nullable dictionary
robot-clickhouse added a commit that referenced this pull request Feb 23, 2026
…umnLowCardinality with nullable dictionary
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Feb 23, 2026
nihalzp added a commit that referenced this pull request Feb 23, 2026
Backport #97647 to 26.1: Fix incorrect getSerializedValueSize for ColumnLowCardinality with nullable dictionary
Algunenano pushed a commit to Algunenano/ClickHouse that referenced this pull request Feb 24, 2026
…erialized-value-size

Fix incorrect getSerializedValueSize for ColumnLowCardinality with nullable dictionary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants