improvement: use INCORRECT_DATA instead of LOGICAL_ERROR for data-reachable deserialization checks#99822
Conversation
|
Workflow [PR], commit [76660ad] Summary: ✅ AI ReviewSummaryThis PR updates error classification in ClickHouse Rules
Final VerdictStatus: ✅ Approve |
| @@ -637,7 +638,7 @@ static void checkIndexes(const ColumnVector<IndexType> & indexes, size_t max_dic | |||
| { | |||
There was a problem hiding this comment.
💡 Typo in the exception message: grated or equal should be greater or equal.
This text is user-visible in diagnostics, so correcting it will reduce confusion when malformed input is reported.
Avogar
left a comment
There was a problem hiding this comment.
I don't understand these changes. Almost all LOGICAL_ERRORS that were changed to INCORRECT_DATA in this PR are actual logical errors that can happen only because of bugs in deserialization from MergeTree parts. INCORRECT_DATA makes sense only for cases when we use Native format reader and data is serialized incorrectly. As you can see, in some places we even check for settings.native_format to decide do we need to throw INCORRECT_DATA or LOGICAL_ERRORS. The only place where change can be justified is probably lines 198 and 224 in SerializationReplicated.cpp.
If we just change all these LOGICAL_ERRORS to INCORRECT_DATA we might silently ignore actual bugs in the code, and we have alerts on LOGICAL_ERRORS both in CI and in production.
|
Also this PR has some unrelated changes of the code that reverts recent bugfixes in serializations |
There was a problem hiding this comment.
Changes in this file reverts recent bug fixes
In ASan and fuzzer builds, `LOGICAL_ERROR` calls `abortOnFailedAssertion` → `abort()` before the exception propagates to the caller's `catch(...)`. Any condition that can be triggered by malformed binary input must use `INCORRECT_DATA` (or another non-`LOGICAL_ERROR` code) so it throws a catchable exception instead of aborting the process. The following sites were changed because the condition is fully determined by the content of the data stream, not by a programming error: - `NativeReader.cpp`: row count mismatch after column deserialization - `SerializationNullable.cpp`: null map / nested column size mismatch (was `native_format ? INCORRECT_DATA : LOGICAL_ERROR`; the non-native branch is also reachable from malformed input) - `SerializationTuple.cpp`: tuple element size mismatch after deserialization (same `native_format` conditional pattern) - `SerializationArray.cpp`: element column longer than last offset value - `SerializationSparse.cpp`: inconsistent offsets / values sizes - `SerializationString.cpp`: empty data stream in bulk deserialization - `SerializationObject.cpp`: missing data stream and size mismatches in `STRING`, typed, and dynamic deserialization paths - `SerializationObjectSharedData.cpp`: ~20 sites across `deserializeStructure`, `deserializePathsInfos`, `deserializePathsData`, and bucket deserialization — all "got empty stream", row-count mismatches, and path-not-found conditions - `SerializationReplicated.cpp`: `num_rows != limit` (read from stream), invalid `size_of_indexes_type`, missing `ReplicatedElements` substream - `ColumnUnique.h`: null in non-nullable dictionary; index out of range in dictionary lookup - `Allocator.cpp`: `checkSize` threw `LOGICAL_ERROR` for sizes ≥ 2^63; changed to `CANNOT_ALLOCATE_MEMORY` since the condition is triggered by an attacker-controlled allocation size, not a bug in the allocator itself Serialization-side throws and invariant guards that represent true programming errors (e.g., uninitialized callbacks, wrong codec mode) are left as `LOGICAL_ERROR`. Discovered by fuzzing with `native_reader_fuzzer` and `nested_type_serialization_fuzzer` in ASan build. **Bug Fix (user-visible misbehavior in official stable or prestable release)** - Fixed process abort in Native format deserialization when reading malformed data containing inconsistent column sizes, missing substreams, or other structural errors.
….cpp The previous commit accidentally included an incomplete refactoring of SerializationObject that removed constructor parameters from the .cpp but not from the .h or call sites, causing a compilation error. Restore SerializationObject.cpp to match origin/master's constructor signature while preserving the three LOGICAL_ERROR → INCORRECT_DATA changes.
…Data and fix typo - Match .cpp constructor definition to .h declaration by restoring the `dynamic_serialization_` parameter (was accidentally dropped, causing an out-of-line definition mismatch that breaks compilation). - Fix typo in `ColumnUnique::checkIndexes` exception message: `grated or equal` → `greater or equal`.
…ngs in SerializationSparse - Remove unused `extern const int LOGICAL_ERROR` from four files where all throw sites were changed to `INCORRECT_DATA` in the prior commit but the declaration was left behind, causing style-check failures. - Remove `values_settings.insert_only_rows_in_current_range_from_substreams_cache` block added by accident to `SerializationSparse::deserializeBinaryBulkWithMultipleStreams`. The flag, when propagated to nested serializers, caused the `Dynamic` element of a `Tuple` to read 0 rows from the substream cache instead of 1, producing an "Unexpected size of tuple element" mismatch during `CHECK TABLE` on a table with sparse-serialized `Tuple(Dynamic, Tuple(Int))` columns. This regressed test `04038_check_table_sparse_tuple_dynamic`.
…tionSparse Commit e338032 accidentally removed the 'if (max_rows_to_read == 0) return 0' early return that was introduced by PR ClickHouse#99351 to fix CHECK TABLE on Tuple columns containing Dynamic elements with sparse-serialized sub-columns (issue ClickHouse#96588). The removed guard was replaced with 'if (max_rows_to_read && ...)' guards on the two conditions below, but those do not short-circuit the function: the read loop still executes when limit=0, reading an extra row instead of returning immediately. This regresses test 04038_check_table_sparse_tuple_dynamic. Restore the original early return and remove the spurious 'max_rows_to_read &&' guard from the loop condition, since it is now unreachable when limit=0.
Per Avogar's review: most LOGICAL_ERROR sites changed in the initial commit are actual programming errors in MergeTree deserialization, not data-format errors. The `INCORRECT_DATA` pattern is appropriate only when reading from an explicitly untrusted stream (Native format over the wire). Changing them unconditionally would silently swallow real bugs and remove the production alerts on LOGICAL_ERROR. Reverted files (error codes restored to LOGICAL_ERROR or native_format conditional): - SerializationNullable.cpp — restore native_format ternary - SerializationTuple.cpp — restore native_format ternary - SerializationArray.cpp — restore LOGICAL_ERROR - SerializationString.cpp — restore LOGICAL_ERROR - SerializationObject.cpp — restore LOGICAL_ERROR / native_format - SerializationObjectSharedData.cpp — restore all LOGICAL_ERROR - SerializationSparse.cpp — restore LOGICAL_ERROR + max_rows_to_read guard (max_rows_to_read && ...) which was also incorrectly changed - ColumnUnique.h — restore LOGICAL_ERROR (kept typo fix "grated" → "greater" separately) Also removed the 03925_sparse_values_in_substreams_cache_bug test that was added to cover the sparse deserialization change, which is also being reverted. Remaining changes (kept): - NativeReader.cpp — row count mismatch is purely data-driven - SerializationReplicated.cpp — explicitly endorsed by reviewer (lines 198 and 224: num_rows != limit, invalid size_of_indexes_type) - Allocator.cpp — CANNOT_ALLOCATE_MEMORY for attacker- controlled oversized allocation (separate concern, not serialization)
…st, drop 04038 - `SerializationReplicated`: use `settings.native_format ? INCORRECT_DATA : LOGICAL_ERROR` for the `num_rows != limit` and `size_of_indexes_type` checks; revert the empty-elements-stream throw back to `LOGICAL_ERROR` (unreachable in Native format per Avogar). - Restore `03925_sparse_values_in_substreams_cache_bug` test that was accidentally removed by the prior revert commit. - Remove `04038_check_table_sparse_tuple_dynamic` test — the underlying issue (ClickHouse#96588) is not fixed in this PR and the test fails in CI.
32ba4d9 to
c78a222
Compare
… LOGICAL_ERROR, fix 03925 reference, restore 04038 test Per Avogar review: - Revert SerializationSparse.cpp to master: restore limit=0 early return and insert_only_rows_in_current_range_from_substreams_cache flag (bug fix for ClickHouse#96588) - Restore LOGICAL_ERROR in Allocator.cpp checkSize (not reachable via user data) - Restore LOGICAL_ERROR in SerializationReplicated.cpp (not reachable in Native format) - Restore deleted regression test 04038_check_table_sparse_tuple_dynamic - Fix 03925 reference: non-Nullable String.size is 0 not \N, tuple shows '' not NULL
a4408ed to
6d2df50
Compare
a896ce9 to
9defa44
Compare
…s accidentally dropped during rebase
The test was inadvertently changed from Nullable(String)/Nullable(UInt64) with nullable_serialization_version = 'allow_sparse' to plain String/UInt64, weakening the regression coverage for the nullable+sparse substream cache bug. Since this PR no longer touches SerializationSparse.cpp, the original test form passes correctly. Restore to match master exactly.
|
Hi @Avogar — the PR has been substantially narrowed based on your review. The broad
CI is green on the current commit. Would you mind taking another look? |
Avogar
left a comment
There was a problem hiding this comment.
Let's also improve the PR description as it contains old changes
…Helpers.h The declaration was added without a corresponding implementation and is not used anywhere. Remove per code review feedback.
|
/recheck |
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 80.00% (4/5, 0 noise lines excluded) |
|
/recheck |
deb76c4
In sanitizer and debug builds,
LOGICAL_ERRORcallsabortOnFailedAssertion→abort()before the exception propagates to the caller'scatch(...). The row-count mismatch check inNativeReaderis reachable from user-supplied binary data and should useINCORRECT_DATAso it is recognized as a data error rather than a logic error, preventing false-positive aborts during fuzzing.Changes
NativeReader.cpp:LOGICAL_ERROR→INCORRECT_DATAfor row count mismatch after column deserialization (user-supplied Native format stream).SerializationObject.h: add explicit#include <Common/VectorWithMemoryTracking.h>(previously relied on transitive include).ColumnUnique.h: typo fix ("grated" → "greater") in error message; error code unchanged.SerializationArray.cpp: error code declaration reorder only; no functional change.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix false-positive abort in
NativeReaderwhen deserializing a Native format stream with a row-count mismatch: changed fromLOGICAL_ERRORtoINCORRECT_DATAso the error is handled as a data error rather than triggeringabort()in sanitizer/debug builds.Documentation entry for user-facing changes
CI report: https://s3.amazonaws.com/clickhouse-test-reports/PRs/99822/76660ad7a27047de3d00837309ab28f384318007/result_pr.json