Skip to content

improvement: use INCORRECT_DATA instead of LOGICAL_ERROR for data-reachable deserialization checks#99822

Merged
motsc merged 14 commits intoClickHouse:masterfrom
motsc:fix/deserialization-logical-error-abort
Apr 2, 2026
Merged

improvement: use INCORRECT_DATA instead of LOGICAL_ERROR for data-reachable deserialization checks#99822
motsc merged 14 commits intoClickHouse:masterfrom
motsc:fix/deserialization-logical-error-abort

Conversation

@motsc
Copy link
Copy Markdown
Contributor

@motsc motsc commented Mar 17, 2026

In sanitizer and debug builds, LOGICAL_ERROR calls abortOnFailedAssertionabort() before the exception propagates to the caller's catch(...). The row-count mismatch check in NativeReader is reachable from user-supplied binary data and should use INCORRECT_DATA so it is recognized as a data error rather than a logic error, preventing false-positive aborts during fuzzing.

Changes

  • NativeReader.cpp: LOGICAL_ERRORINCORRECT_DATA for row count mismatch after column deserialization (user-supplied Native format stream).
  • SerializationObject.h: add explicit #include <Common/VectorWithMemoryTracking.h> (previously relied on transitive include).
  • ColumnUnique.h: typo fix ("grated" → "greater") in error message; error code unchanged.
  • SerializationArray.cpp: error code declaration reorder only; no functional change.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix false-positive abort in NativeReader when deserializing a Native format stream with a row-count mismatch: changed from LOGICAL_ERROR to INCORRECT_DATA so the error is handled as a data error rather than triggering abort() in sanitizer/debug builds.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

CI report: https://s3.amazonaws.com/clickhouse-test-reports/PRs/99822/76660ad7a27047de3d00837309ab28f384318007/result_pr.json

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 17, 2026

Workflow [PR], commit [76660ad]

Summary:


AI Review

Summary

This PR updates error classification in NativeReader from LOGICAL_ERROR to INCORRECT_DATA for a user-reachable deserialization mismatch, plus a typo fix and non-functional cleanup. The main behavior change is correct and improves robustness in sanitizer/debug builds by avoiding a false-positive abort path for malformed input. I did not find correctness, safety, concurrency, compatibility, or performance issues requiring changes.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 17, 2026
Comment thread src/DataTypes/Serializations/SerializationObjectSharedData.cpp Outdated
@@ -637,7 +638,7 @@ static void checkIndexes(const ColumnVector<IndexType> & indexes, size_t max_dic
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Typo in the exception message: grated or equal should be greater or equal.

This text is user-visible in diagnostics, so correcting it will reduce confusion when malformed input is reported.

@motsc motsc added bug Confirmed user-visible misbehaviour in official release crash Crash / segfault / abort labels Mar 17, 2026
@Avogar Avogar self-assigned this Mar 17, 2026
@alexey-milovidov alexey-milovidov removed bug Confirmed user-visible misbehaviour in official release crash Crash / segfault / abort labels Mar 18, 2026
@motsc motsc added pr-improvement Pull request with some product improvements and removed pr-bugfix Pull request with bugfix, not backported by default labels Mar 18, 2026
@motsc motsc changed the title Fix process abort on malformed binary data: LOGICAL_ERROR → INCORRECT_DATA in deserialization paths improvement: use INCORRECT_DATA instead of LOGICAL_ERROR for data-reachable deserialization checks Mar 18, 2026
Comment thread src/DataTypes/Serializations/SerializationSparse.cpp Outdated
Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand these changes. Almost all LOGICAL_ERRORS that were changed to INCORRECT_DATA in this PR are actual logical errors that can happen only because of bugs in deserialization from MergeTree parts. INCORRECT_DATA makes sense only for cases when we use Native format reader and data is serialized incorrectly. As you can see, in some places we even check for settings.native_format to decide do we need to throw INCORRECT_DATA or LOGICAL_ERRORS. The only place where change can be justified is probably lines 198 and 224 in SerializationReplicated.cpp.

If we just change all these LOGICAL_ERRORS to INCORRECT_DATA we might silently ignore actual bugs in the code, and we have alerts on LOGICAL_ERRORS both in CI and in production.

@Avogar
Copy link
Copy Markdown
Member

Avogar commented Mar 18, 2026

Also this PR has some unrelated changes of the code that reverts recent bugfixes in serializations

Comment thread src/DataTypes/Serializations/SerializationReplicated.cpp Outdated
Copy link
Copy Markdown
Member

@Avogar Avogar Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in this file reverts recent bug fixes

Comment thread src/Common/Allocator.cpp Outdated
motsc added 8 commits March 19, 2026 18:21
In ASan and fuzzer builds, `LOGICAL_ERROR` calls `abortOnFailedAssertion`
→ `abort()` before the exception propagates to the caller's `catch(...)`.
Any condition that can be triggered by malformed binary input must use
`INCORRECT_DATA` (or another non-`LOGICAL_ERROR` code) so it throws a
catchable exception instead of aborting the process.

The following sites were changed because the condition is fully determined
by the content of the data stream, not by a programming error:

- `NativeReader.cpp`: row count mismatch after column deserialization
- `SerializationNullable.cpp`: null map / nested column size mismatch
  (was `native_format ? INCORRECT_DATA : LOGICAL_ERROR`; the non-native
  branch is also reachable from malformed input)
- `SerializationTuple.cpp`: tuple element size mismatch after deserialization
  (same `native_format` conditional pattern)
- `SerializationArray.cpp`: element column longer than last offset value
- `SerializationSparse.cpp`: inconsistent offsets / values sizes
- `SerializationString.cpp`: empty data stream in bulk deserialization
- `SerializationObject.cpp`: missing data stream and size mismatches in
  `STRING`, typed, and dynamic deserialization paths
- `SerializationObjectSharedData.cpp`: ~20 sites across `deserializeStructure`,
  `deserializePathsInfos`, `deserializePathsData`, and bucket deserialization —
  all "got empty stream", row-count mismatches, and path-not-found conditions
- `SerializationReplicated.cpp`: `num_rows != limit` (read from stream),
  invalid `size_of_indexes_type`, missing `ReplicatedElements` substream
- `ColumnUnique.h`: null in non-nullable dictionary; index out of range in
  dictionary lookup
- `Allocator.cpp`: `checkSize` threw `LOGICAL_ERROR` for sizes ≥ 2^63;
  changed to `CANNOT_ALLOCATE_MEMORY` since the condition is triggered by
  an attacker-controlled allocation size, not a bug in the allocator itself

Serialization-side throws and invariant guards that represent true programming
errors (e.g., uninitialized callbacks, wrong codec mode) are left as
`LOGICAL_ERROR`.

Discovered by fuzzing with `native_reader_fuzzer` and
`nested_type_serialization_fuzzer` in ASan build.

**Bug Fix (user-visible misbehavior in official stable or prestable release)**
- Fixed process abort in Native format deserialization when reading malformed data containing inconsistent column sizes, missing substreams, or other structural errors.
….cpp

The previous commit accidentally included an incomplete refactoring of
SerializationObject that removed constructor parameters from the .cpp but
not from the .h or call sites, causing a compilation error. Restore
SerializationObject.cpp to match origin/master's constructor signature
while preserving the three LOGICAL_ERROR → INCORRECT_DATA changes.
…Data and fix typo

- Match .cpp constructor definition to .h declaration by restoring the
  `dynamic_serialization_` parameter (was accidentally dropped, causing
  an out-of-line definition mismatch that breaks compilation).

- Fix typo in `ColumnUnique::checkIndexes` exception message:
  `grated or equal` → `greater or equal`.
…ngs in SerializationSparse

- Remove unused `extern const int LOGICAL_ERROR` from four files where all
  throw sites were changed to `INCORRECT_DATA` in the prior commit but the
  declaration was left behind, causing style-check failures.

- Remove `values_settings.insert_only_rows_in_current_range_from_substreams_cache`
  block added by accident to `SerializationSparse::deserializeBinaryBulkWithMultipleStreams`.
  The flag, when propagated to nested serializers, caused the `Dynamic` element
  of a `Tuple` to read 0 rows from the substream cache instead of 1, producing an
  "Unexpected size of tuple element" mismatch during `CHECK TABLE` on a table with
  sparse-serialized `Tuple(Dynamic, Tuple(Int))` columns.
  This regressed test `04038_check_table_sparse_tuple_dynamic`.
…tionSparse

Commit e338032 accidentally removed the 'if (max_rows_to_read == 0) return 0'
early return that was introduced by PR ClickHouse#99351 to fix CHECK TABLE on Tuple columns
containing Dynamic elements with sparse-serialized sub-columns (issue ClickHouse#96588).

The removed guard was replaced with 'if (max_rows_to_read && ...)' guards on the
two conditions below, but those do not short-circuit the function: the read loop
still executes when limit=0, reading an extra row instead of returning immediately.
This regresses test 04038_check_table_sparse_tuple_dynamic.

Restore the original early return and remove the spurious 'max_rows_to_read &&'
guard from the loop condition, since it is now unreachable when limit=0.
Per Avogar's review: most LOGICAL_ERROR sites changed in the initial
commit are actual programming errors in MergeTree deserialization, not
data-format errors. The `INCORRECT_DATA` pattern is appropriate only
when reading from an explicitly untrusted stream (Native format over the
wire). Changing them unconditionally would silently swallow real bugs
and remove the production alerts on LOGICAL_ERROR.

Reverted files (error codes restored to LOGICAL_ERROR or native_format
conditional):
- SerializationNullable.cpp  — restore native_format ternary
- SerializationTuple.cpp     — restore native_format ternary
- SerializationArray.cpp     — restore LOGICAL_ERROR
- SerializationString.cpp    — restore LOGICAL_ERROR
- SerializationObject.cpp    — restore LOGICAL_ERROR / native_format
- SerializationObjectSharedData.cpp — restore all LOGICAL_ERROR
- SerializationSparse.cpp    — restore LOGICAL_ERROR + max_rows_to_read
  guard (max_rows_to_read && ...) which was also incorrectly changed
- ColumnUnique.h             — restore LOGICAL_ERROR (kept typo fix
  "grated" → "greater" separately)

Also removed the 03925_sparse_values_in_substreams_cache_bug test that
was added to cover the sparse deserialization change, which is also
being reverted.

Remaining changes (kept):
- NativeReader.cpp          — row count mismatch is purely data-driven
- SerializationReplicated.cpp — explicitly endorsed by reviewer (lines
  198 and 224: num_rows != limit, invalid size_of_indexes_type)
- Allocator.cpp             — CANNOT_ALLOCATE_MEMORY for attacker-
  controlled oversized allocation (separate concern, not serialization)
…st, drop 04038

- `SerializationReplicated`: use `settings.native_format ? INCORRECT_DATA : LOGICAL_ERROR`
  for the `num_rows != limit` and `size_of_indexes_type` checks; revert
  the empty-elements-stream throw back to `LOGICAL_ERROR` (unreachable in
  Native format per Avogar).
- Restore `03925_sparse_values_in_substreams_cache_bug` test that was
  accidentally removed by the prior revert commit.
- Remove `04038_check_table_sparse_tuple_dynamic` test — the underlying
  issue (ClickHouse#96588) is not fixed in this PR and the test fails in CI.
@motsc motsc force-pushed the fix/deserialization-logical-error-abort branch from 32ba4d9 to c78a222 Compare March 20, 2026 01:22
Comment thread src/DataTypes/Serializations/SerializationSparse.cpp Outdated
motsc added 2 commits March 19, 2026 21:42
… LOGICAL_ERROR, fix 03925 reference, restore 04038 test

Per Avogar review:
- Revert SerializationSparse.cpp to master: restore limit=0 early return and
  insert_only_rows_in_current_range_from_substreams_cache flag (bug fix for ClickHouse#96588)
- Restore LOGICAL_ERROR in Allocator.cpp checkSize (not reachable via user data)
- Restore LOGICAL_ERROR in SerializationReplicated.cpp (not reachable in Native format)
- Restore deleted regression test 04038_check_table_sparse_tuple_dynamic
- Fix 03925 reference: non-Nullable String.size is 0 not \N, tuple shows '' not NULL
Comment thread tests/queries/0_stateless/03925_sparse_values_in_substreams_cache_bug.sql Outdated
@motsc motsc marked this pull request as ready for review March 23, 2026 15:17
@motsc motsc force-pushed the fix/deserialization-logical-error-abort branch from a4408ed to 6d2df50 Compare March 23, 2026 15:28
Comment thread src/DataTypes/Serializations/SerializationObject.cpp Outdated
@motsc motsc force-pushed the fix/deserialization-logical-error-abort branch from a896ce9 to 9defa44 Compare March 23, 2026 16:46
motsc added 2 commits March 23, 2026 10:56
The test was inadvertently changed from Nullable(String)/Nullable(UInt64) with
nullable_serialization_version = 'allow_sparse' to plain String/UInt64, weakening
the regression coverage for the nullable+sparse substream cache bug.

Since this PR no longer touches SerializationSparse.cpp, the original test form
passes correctly. Restore to match master exactly.
@motsc
Copy link
Copy Markdown
Contributor Author

motsc commented Mar 24, 2026

Hi @Avogar — the PR has been substantially narrowed based on your review. The broad LOGICAL_ERRORINCORRECT_DATA changes across serialization code have been removed. The current diff contains only:

  • NativeReader.cpp: changes LOGICAL_ERRORINCORRECT_DATA for the row-count mismatch after deserialization from a Native format stream — this is user-provided data, matching exactly the case you described as justified ("INCORRECT_DATA makes sense only for cases when we use Native format reader").
  • SerializationObject.cpp/.h: uses VectorWithMemoryTracking<String> instead of std::vector<String> for sorted_dynamic_paths to bound memory when deserializing a JSON column with many dynamic paths from an untrusted source.
  • ColumnUnique.h: typo fix only ("grated" → "greater"); error code unchanged.
  • SerializationArray.cpp: declaration reorder only; no error code changes.

CI is green on the current commit. Would you mind taking another look?

Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also improve the PR description as it contains old changes

Comment thread src/IO/ReadHelpers.h
…Helpers.h

The declaration was added without a corresponding implementation and is
not used anywhere. Remove per code review feedback.
@clickhouse-gh clickhouse-gh Bot added pr-bugfix Pull request with bugfix, not backported by default and removed pr-improvement Pull request with some product improvements labels Mar 24, 2026
@motsc
Copy link
Copy Markdown
Contributor Author

motsc commented Mar 25, 2026

/recheck

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 25, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 78.20% 84.10% +5.90%
Functions 23.30% 24.40% +1.10%
Branches 70.70% 76.60% +5.90%

PR changed lines: PR changed-lines coverage: 80.00% (4/5, 0 noise lines excluded)
Diff coverage report
Uncovered code

@motsc
Copy link
Copy Markdown
Contributor Author

motsc commented Mar 25, 2026

/recheck

@motsc motsc requested a review from Avogar March 25, 2026 21:42
@motsc motsc enabled auto-merge March 26, 2026 00:19
@motsc motsc added this pull request to the merge queue Apr 2, 2026
Merged via the queue into ClickHouse:master with commit deb76c4 Apr 2, 2026
449 of 451 checks passed
@motsc motsc deleted the fix/deserialization-logical-error-abort branch April 2, 2026 18:09
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants