Skip to content

Extend cast_keep_nullable to work with Dynamic/JSON types#96504

Merged
seva-potapov merged 2 commits intoClickHouse:masterfrom
seva-potapov:fix-cast-keep-nullable
Mar 26, 2026
Merged

Extend cast_keep_nullable to work with Dynamic/JSON types#96504
seva-potapov merged 2 commits intoClickHouse:masterfrom
seva-potapov:fix-cast-keep-nullable

Conversation

@seva-potapov
Copy link
Copy Markdown
Contributor

@seva-potapov seva-potapov commented Feb 10, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Extend cast_keep_nullable to work with Dynamic/JSON types. When set, casting NULL from types that can be Nullable will return NULL, otherwise NULL will throw CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN error.

Details

A customer reported two inconsistencies when casting JSON null values:

  1. Query select v.a::Array(String) from (select '{"a":null}'::JSON as v) settings cast_keep_nullable=1 returns [] instead of signaling an error.
    The user set cast_keep_nullable=1 explicitly requesting null preservation, but the system silently swallowed the null into a default value.

  2. Direct cast Nullable(String) NULL to Array(String) throws CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, but when casting Nullable(String) NULL to Dynamic to Array(String) silently returns []. The same null value produces different results depending on the cast path.

Root cause: cast_keep_nullable only checks isNullable() on the source type. Dynamic (used by JSON) stores nulls via an internal discriminator, not the Nullable wrapper, so the setting has no effect. And ConvertImplFromDynamicToColumn unconditionally calls insertDefault() for null values regardless of any settings.


Note

Medium Risk
Changes CAST/:: behavior for Dynamic/JSON sources when cast_keep_nullable=1, potentially turning previously silent NULL-to-default conversions into CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN errors for non-nullable targets.

Overview
Extends cast_keep_nullable handling to Dynamic/JSON casts so that NULL preservation is enforced consistently.

When cast_keep_nullable=1, casting Dynamic-backed NULLs now returns NULL only for targets that can be nullable, and throws CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN when converting NULL into non-Nullable targets (instead of inserting defaults like []). Adds a dedicated stateless test to cover the new Dynamic/JSON null-casting behavior and backward-compatibility when the setting is off.

Written by Cursor Bugbot for commit c49951291f45693c333945271b91e6286ee36e1e. This will update automatically on new commits. Configure here.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Feb 10, 2026

Workflow [PR], commit [c76a7f5]

Summary:


AI Review

Summary

This PR extends cast_keep_nullable handling for Dynamic/JSON casts and adds a stateless regression test. High-level verdict: there is one blocker in NULL-capability detection for target types. The new throw_on_null predicate treats Variant/Dynamic targets as non-null-capable because it relies on canBeInsideNullable, which is not the right semantic check for these types.

Missing context
  • ⚠️ No CI logs/results were provided in this review context.
Findings
  • ❌ Blockers
    • [src/Functions/FunctionsConversion.cpp:1481, src/Functions/FunctionsConversion.h:3219] throw_on_null is enabled using !result_type->canBeInsideNullable(), which incorrectly classifies Variant and Dynamic as unable to represent NULL. With cast_keep_nullable=1, Dynamic(NULL) -> Variant(...) can incorrectly throw CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN instead of preserving NULL in the target type.
    • Suggested fix: use a semantic NULL-capability check (!canContainNull(*result_type)) instead of !canBeInsideNullable(), and keep the condition consistent in both call sites.
Tests
  • ⚠️ Add regression coverage for Dynamic(NULL) -> Variant(...) with cast_keep_nullable=1 (must not throw) and a non-nullable target that truly cannot contain NULL (must throw), to lock the intended boundary.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility ⚠️ Potential behavior change for Dynamic(NULL) -> Variant(...) with cast_keep_nullable=1.
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout ⚠️ Incorrect NULL-capability predicate risks surprising cast exceptions in rollout.
Compilation time
Performance & Safety
  • The concern is semantic safety/correctness: NULL handling can become unexpectedly stricter for targets that are actually NULL-capable, causing user-visible exceptions.
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Replace throw_on_null predicate with a NULL-capability check (e.g. !canContainNull(*result_type)) in both affected call sites.
    • Add regression tests for Dynamic(NULL) cast to Variant(...) under cast_keep_nullable=1.

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Feb 10, 2026
Comment thread src/Functions/FunctionsConversion.h
Comment thread src/Functions/FunctionsConversion.h Outdated
@seva-potapov seva-potapov force-pushed the fix-cast-keep-nullable branch 2 times, most recently from e066ed3 to a183204 Compare March 12, 2026 03:51
Comment thread src/Functions/FunctionsConversion.cpp Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Comment thread src/Functions/FunctionsConversion.cpp
Comment thread src/Functions/FunctionsConversion.cpp
Comment thread src/Functions/CastOverloadResolver.cpp Outdated
Comment thread src/Functions/FunctionsConversion.cpp Outdated
@seva-potapov seva-potapov force-pushed the fix-cast-keep-nullable branch from 417dcb6 to 52cbccb Compare March 19, 2026 02:03

if (keep_nullable
&& (arguments.front().type->isNullable() || arguments.front().type->isLowCardinalityNullable())
&& (arguments.front().type->isNullable() || arguments.front().type->isLowCardinalityNullable() || isDynamic(*arguments.front().type))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variant also supports NULLs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we have some method, e.g., canContainNulls

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, my previous attempt switched to canContainNulls but then it became overcomplicated and testing casting Variant to Columns became impossible to do. see the difference: https://github.com/ClickHouse/ClickHouse/compare/417dcb633628a4d47c4d688e3d1e65acdec030d8..52cbccb777427f6d03d9e98a9bbf5357eeb1eb9d

@alexey-milovidov alexey-milovidov self-assigned this Mar 20, 2026
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests please.

@seva-potapov seva-potapov force-pushed the fix-cast-keep-nullable branch from 6f28f40 to c76a7f5 Compare March 25, 2026 05:23
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 25, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 24.50% 24.30% -0.20%
Branches 76.60% 76.60% +0.00%

PR changed lines: PR changed-lines coverage: 93.18% (41/44, 0 noise lines excluded)
Diff coverage report
Uncovered code

@seva-potapov seva-potapov added this pull request to the merge queue Mar 26, 2026
Merged via the queue into ClickHouse:master with commit 8cd7229 Mar 26, 2026
223 of 302 checks passed
@seva-potapov seva-potapov deleted the fix-cast-keep-nullable branch March 26, 2026 03:25
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 26, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 24, 2026
The `timeSeries*ToGrid` aggregate functions (`timeSeriesResampleToGridWithStaleness`,
`timeSeriesChangesToGrid`, `timeSeriesResetsToGrid`, `timeSeriesDeltaToGrid`,
`timeSeriesRateToGrid`, etc.) compute `bucket_count = (end - start) / step + 1` and later
`index = (timestamp - start + step - 1) / step` using signed Int64 arithmetic on the
normalized parameters. When `start_timestamp` is very negative (e.g. a `DateTime64` near
`INT64_MIN`, reachable from an adversarial fuzzer-generated query), `end - start` and
`timestamp - start` overflow, producing negative Int64 values. Casting those to `size_t`
yields a huge corrupted bucket count and an index that slightly exceeds it, tripping the
`chassert(index < bucket_count)` assertion at line 125 and aborting the server in debug
/ sanitizer builds.

Found by AST fuzzer (serverfuzz) — STID 2508-3c50 / 2508-3f3c / 2508-20f4 across PRs ClickHouse#96504,
ClickHouse#103189 and master over 90 days. Reproducer (triggers the assertion on unpatched master):

```sql
SET allow_experimental_ts_to_grid_aggregate_function = 1;
CREATE TABLE ts (timestamp DateTime64(0), value Float64) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO ts VALUES ('2020-01-01 00:00:00', 1.0);
SELECT timeSeriesResampleToGridWithStaleness(-9223372036854775808, 256, 2147483646, 2147483648)
    (timestamp, value) FROM ts FORMAT Null;
```

The fix computes both the bucket count and the per-timestamp index using unsigned 64-bit
arithmetic. Since we already verify `end_timestamp >= start_timestamp` in `bucketCount` and
`timestamp > start_timestamp` (via the existing early-return) in `bucketIndexForTimestamp`,
the unsigned subtraction produces the correct non-negative delta for all representable
inputs. A `MAX_BUCKET_COUNT = 16M` cap also prevents DoS from absurdly large grids (with
`start = INT64_MIN, end = 256, step = 2147483646` the unsigned arithmetic returns
`count = 4294967301`, which is rejected early instead of allocating `~34 GiB` of result
storage); the cap matches `MAX_ARRAY_SIZE` already used by `AggregateFunctionGroupArray`,
`AggregateFunctionIntervalLengthSum`, and siblings.

After the fix the query above returns a clean `BAD_ARGUMENTS` exception with a helpful
message instead of aborting the server.

Regression test: `04106_timeseries_bucket_count_overflow`.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=103189&sha=5c8108136604f29982272d80663c9a97bbd67e39&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%2C%20targeted%2C%20old_compatibility%29

### Changelog category (leave one):
- Bug Fix (user-visible misbehavior in an official stable release)

### Changelog entry (a [user-readable short description](https://github.com/ClickHouse/ClickHouse/blob/master/docs/changelog_entry_guidelines.md) of the changes that goes into CHANGELOG.md):
Fix `Logical error: 'index < bucket_count'` in the `timeSeries*ToGrid` aggregate function family (e.g. `timeSeriesResampleToGridWithStaleness`, `timeSeriesChangesToGrid`, `timeSeriesResetsToGrid`, `timeSeriesRateToGrid`) when called with extreme timestamp parameters that would overflow signed 64-bit arithmetic in the bucket count computation. Also cap the total number of grid buckets at 16 million to prevent accidental large-memory allocation from adversarial inputs.

### Documentation entry for user-facing changes

- [ ] Documentation is written (mandatory for new features)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants