Skip to content

Do size checks when deserializing data from aggregation states and other sources#90031

Merged
Algunenano merged 8 commits intoClickHouse:masterfrom
Algunenano:deserialization_checks
Nov 19, 2025
Merged

Do size checks when deserializing data from aggregation states and other sources#90031
Algunenano merged 8 commits intoClickHouse:masterfrom
Algunenano:deserialization_checks

Conversation

@Algunenano
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Do size checks when deserializing data from aggregation states and other sources

Closes #86882

I'll do a review before unmarking it as draft

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@Algunenano Algunenano requested a review from Copilot November 14, 2025 10:35
@Algunenano Algunenano self-assigned this Nov 14, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Nov 14, 2025

Workflow [PR], commit [7b2820f]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan, targeted) failure
03358_block_structure_match FAIL cidb
Stress test (amd_debug) failure
Server died FAIL cidb
Hung check failed, possible deadlock found (see hung_check.log) FAIL cidb
Killed by signal (in clickhouse-server.log) FAIL cidb
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL cidb
Killed by signal (output files) FAIL cidb
Found signal in gdb.log FAIL cidb
Upgrade check (amd_msan) failure
Killed by signal (in clickhouse-server.log) FAIL cidb
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL cidb
Killed by signal (output files) FAIL cidb
Found signal in gdb.log FAIL cidb
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: FAIL cidb
BuzzHouse (amd_ubsan) failure
/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/IO/VarInt.h:32:5: runtime error: store to null pointer of type 'char' FAIL cidb
Performance Comparison (amd_release, master_head, 3/6) failure
Select historical data failure

@clickhouse-gh clickhouse-gh bot added pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! labels Nov 14, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a critical security issue by implementing proper size checks when deserializing data from aggregation states and other untrusted sources. The changes prevent potential crashes and data corruption by validating buffer boundaries before reading serialized data.

Key Changes:

  • Modified deserialization methods across all column types to use ReadBuffer instead of raw pointers for safer boundary checking
  • Added size validation checks before reading serialized data to prevent buffer overruns
  • Updated test cases to verify proper error handling for malformed data

Reviewed Changes

Copilot reviewed 56 out of 56 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/Columns/*.{h,cpp} Refactored deserialization methods to use ReadBuffer API with proper bounds checking
src/Interpreters/AggregationMethod.cpp Updated key deserialization to use ReadBuffer for size validation
src/Dictionaries/*.{h,cpp} Modified dictionary deserialization to validate buffer sizes before reading
src/Functions/array/arrayIntersect.cpp Added buffer size checks when deserializing array elements
src/AggregateFunctions/*.{h,cpp} Updated aggregate function deserialization to use safer ReadBuffer-based approach
tests/queries/0_stateless/03716_topk_bad_data.* Added test cases to verify proper error handling for malformed aggregation state data

const char * IColumnDummy::skipSerializedInArena(const char * pos) const
void IColumnDummy::skipSerializedInArena(ReadBuffer &) const
{
return pos;
Copy link
Copy Markdown
Member Author

@Algunenano Algunenano Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks odd. It seems it was incorrect before (and consequently now too), since it should skip one byte.

@Algunenano Algunenano marked this pull request as ready for review November 14, 2025 15:43
@Algunenano Algunenano removed their assignment Nov 14, 2025
@Avogar Avogar self-assigned this Nov 14, 2025
@Algunenano
Copy link
Copy Markdown
Member Author

Algunenano commented Nov 17, 2025

Failures:

@Algunenano Algunenano requested a review from Avogar November 18, 2025 17:27
robot-clickhouse-ci-1 added a commit that referenced this pull request Nov 19, 2025
Cherry pick #90031 to 25.11: Do size checks when deserializing data from aggregation states and other sources
robot-clickhouse added a commit that referenced this pull request Nov 19, 2025
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 19, 2025
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Nov 19, 2025
clickhouse-gh bot added a commit that referenced this pull request Nov 19, 2025
Backport #90031 to 25.11: Do size checks when deserializing data from aggregation states and other sources
robot-ch-test-poll4 added a commit that referenced this pull request Nov 20, 2025
Cherry pick #90031 to 25.8: Do size checks when deserializing data from aggregation states and other sources
robot-clickhouse added a commit that referenced this pull request Nov 20, 2025
robot-ch-test-poll4 added a commit that referenced this pull request Nov 20, 2025
Cherry pick #90031 to 25.9: Do size checks when deserializing data from aggregation states and other sources
robot-clickhouse added a commit that referenced this pull request Nov 20, 2025
robot-ch-test-poll4 added a commit that referenced this pull request Nov 20, 2025
Cherry pick #90031 to 25.10: Do size checks when deserializing data from aggregation states and other sources
robot-clickhouse added a commit that referenced this pull request Nov 20, 2025
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Nov 20, 2025
Algunenano added a commit that referenced this pull request Nov 20, 2025
Backport #90031 to 25.9: Do size checks when deserializing data from aggregation states and other sources
clickhouse-gh bot added a commit that referenced this pull request Nov 20, 2025
Backport #90031 to 25.8: Do size checks when deserializing data from aggregation states and other sources
Algunenano added a commit that referenced this pull request Nov 21, 2025
Backport #90031 to 25.10: Do size checks when deserializing data from aggregation states and other sources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fatal using approx_top_k and finalizeAggregation

6 participants