Extend cast_keep_nullable to work with Dynamic/JSON types#96504
Extend cast_keep_nullable to work with Dynamic/JSON types#96504seva-potapov merged 2 commits intoClickHouse:masterfrom
Conversation
|
Workflow [PR], commit [c76a7f5] Summary: ✅ AI ReviewSummaryThis PR extends Missing context
Findings
Tests
ClickHouse Rules
Performance & Safety
Final Verdict
|
e066ed3 to
a183204
Compare
eeb45b2 to
417dcb6
Compare
417dcb6 to
52cbccb
Compare
|
|
||
| if (keep_nullable | ||
| && (arguments.front().type->isNullable() || arguments.front().type->isLowCardinalityNullable()) | ||
| && (arguments.front().type->isNullable() || arguments.front().type->isLowCardinalityNullable() || isDynamic(*arguments.front().type)) |
There was a problem hiding this comment.
Variant also supports NULLs.
There was a problem hiding this comment.
I remember we have some method, e.g., canContainNulls
There was a problem hiding this comment.
yes, my previous attempt switched to canContainNulls but then it became overcomplicated and testing casting Variant to Columns became impossible to do. see the difference: https://github.com/ClickHouse/ClickHouse/compare/417dcb633628a4d47c4d688e3d1e65acdec030d8..52cbccb777427f6d03d9e98a9bbf5357eeb1eb9d
6f28f40 to
c76a7f5
Compare
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 93.18% (41/44, 0 noise lines excluded) |
8cd7229
The `timeSeries*ToGrid` aggregate functions (`timeSeriesResampleToGridWithStaleness`, `timeSeriesChangesToGrid`, `timeSeriesResetsToGrid`, `timeSeriesDeltaToGrid`, `timeSeriesRateToGrid`, etc.) compute `bucket_count = (end - start) / step + 1` and later `index = (timestamp - start + step - 1) / step` using signed Int64 arithmetic on the normalized parameters. When `start_timestamp` is very negative (e.g. a `DateTime64` near `INT64_MIN`, reachable from an adversarial fuzzer-generated query), `end - start` and `timestamp - start` overflow, producing negative Int64 values. Casting those to `size_t` yields a huge corrupted bucket count and an index that slightly exceeds it, tripping the `chassert(index < bucket_count)` assertion at line 125 and aborting the server in debug / sanitizer builds. Found by AST fuzzer (serverfuzz) — STID 2508-3c50 / 2508-3f3c / 2508-20f4 across PRs ClickHouse#96504, ClickHouse#103189 and master over 90 days. Reproducer (triggers the assertion on unpatched master): ```sql SET allow_experimental_ts_to_grid_aggregate_function = 1; CREATE TABLE ts (timestamp DateTime64(0), value Float64) ENGINE = MergeTree ORDER BY tuple(); INSERT INTO ts VALUES ('2020-01-01 00:00:00', 1.0); SELECT timeSeriesResampleToGridWithStaleness(-9223372036854775808, 256, 2147483646, 2147483648) (timestamp, value) FROM ts FORMAT Null; ``` The fix computes both the bucket count and the per-timestamp index using unsigned 64-bit arithmetic. Since we already verify `end_timestamp >= start_timestamp` in `bucketCount` and `timestamp > start_timestamp` (via the existing early-return) in `bucketIndexForTimestamp`, the unsigned subtraction produces the correct non-negative delta for all representable inputs. A `MAX_BUCKET_COUNT = 16M` cap also prevents DoS from absurdly large grids (with `start = INT64_MIN, end = 256, step = 2147483646` the unsigned arithmetic returns `count = 4294967301`, which is rejected early instead of allocating `~34 GiB` of result storage); the cap matches `MAX_ARRAY_SIZE` already used by `AggregateFunctionGroupArray`, `AggregateFunctionIntervalLengthSum`, and siblings. After the fix the query above returns a clean `BAD_ARGUMENTS` exception with a helpful message instead of aborting the server. Regression test: `04106_timeseries_bucket_count_overflow`. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=103189&sha=5c8108136604f29982272d80663c9a97bbd67e39&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%2C%20targeted%2C%20old_compatibility%29 ### Changelog category (leave one): - Bug Fix (user-visible misbehavior in an official stable release) ### Changelog entry (a [user-readable short description](https://github.com/ClickHouse/ClickHouse/blob/master/docs/changelog_entry_guidelines.md) of the changes that goes into CHANGELOG.md): Fix `Logical error: 'index < bucket_count'` in the `timeSeries*ToGrid` aggregate function family (e.g. `timeSeriesResampleToGridWithStaleness`, `timeSeriesChangesToGrid`, `timeSeriesResetsToGrid`, `timeSeriesRateToGrid`) when called with extreme timestamp parameters that would overflow signed 64-bit arithmetic in the bucket count computation. Also cap the total number of grid buckets at 16 million to prevent accidental large-memory allocation from adversarial inputs. ### Documentation entry for user-facing changes - [ ] Documentation is written (mandatory for new features)

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Extend
cast_keep_nullableto work with Dynamic/JSON types. When set, casting NULL from types that can be Nullable will return NULL, otherwise NULL will throwCANNOT_INSERT_NULL_IN_ORDINARY_COLUMNerror.Details
A customer reported two inconsistencies when casting JSON null values:
Query
select v.a::Array(String) from (select '{"a":null}'::JSON as v) settings cast_keep_nullable=1returns[]instead of signaling an error.The user set
cast_keep_nullable=1explicitly requesting null preservation, but the system silently swallowed the null into a default value.Direct cast Nullable(String) NULL to Array(String) throws
CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, but when casting Nullable(String) NULL to Dynamic to Array(String) silently returns[]. The same null value produces different results depending on the cast path.Root cause:
cast_keep_nullableonly checksisNullable()on the source type. Dynamic (used by JSON) stores nulls via an internal discriminator, not the Nullable wrapper, so the setting has no effect. AndConvertImplFromDynamicToColumnunconditionally callsinsertDefault()for null values regardless of any settings.Note
Medium Risk
Changes
CAST/::behavior forDynamic/JSONsources whencast_keep_nullable=1, potentially turning previously silent NULL-to-default conversions intoCANNOT_INSERT_NULL_IN_ORDINARY_COLUMNerrors for non-nullable targets.Overview
Extends
cast_keep_nullablehandling toDynamic/JSONcasts so that NULL preservation is enforced consistently.When
cast_keep_nullable=1, castingDynamic-backed NULLs now returns NULL only for targets that can be nullable, and throwsCANNOT_INSERT_NULL_IN_ORDINARY_COLUMNwhen converting NULL into non-Nullable targets (instead of inserting defaults like[]). Adds a dedicated stateless test to cover the new Dynamic/JSON null-casting behavior and backward-compatibility when the setting is off.Written by Cursor Bugbot for commit c49951291f45693c333945271b91e6286ee36e1e. This will update automatically on new commits. Configure here.