Fix extractAll and similar functions on Nullable inputs by alexey-milovidov · Pull Request #104326 · ClickHouse/ClickHouse

alexey-milovidov · 2026-05-07T21:38:21Z

For functions that use the default Nulls implementation but return a type that cannot be Nullable (e.g. Array, Tuple, Map), the framework previously called makeNullable on the return type, which threw "Nested type is not allowed inside Nullable type". This made extractAll, splitByChar, splitByRegexp, extractAllGroups, and similar functions unusable on Nullable columns.

IFunctionOverloadResolver::getReturnTypeWithoutLowCardinality now uses makeNullableSafe, and IExecutableFunction::defaultImplementationForNulls skips the wrapInNullable step when the result type is not Nullable. Null input rows are evaluated over the default value of the nested column (e.g. an empty string), producing the natural default of the result type (e.g. [] for extractAll).

Closes #56977

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Functions that return a non-Nullable type (such as Array, Tuple, or Map) now accept Nullable arguments. Affected functions include extractAll, extractAllGroups, extractAllGroupsHorizontal, extractAllGroupsVertical, extractGroups, splitByChar, splitByString, splitByRegexp, splitByWhitespace, splitByNonAlpha, and alphaTokens. NULL input rows produce the default value of the result type (e.g. an empty array) instead of raising "Nested type is not allowed inside Nullable type".

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

For functions that use the default Nulls implementation but return a type that cannot be Nullable (e.g. `Array`, `Tuple`, `Map`), the framework previously called `makeNullable` on the return type, which threw "Nested type is not allowed inside Nullable type". This made `extractAll`, `splitByChar`, `splitByRegexp`, `extractAllGroups`, and similar functions unusable on Nullable columns. Switch `IFunctionOverloadResolver::getReturnTypeWithoutLowCardinality` to use `makeNullableSafe`, and skip the `wrapInNullable` step in `IExecutableFunction::defaultImplementationForNulls` when the result type is not Nullable. For null input rows the function is evaluated over the default value of the nested column (e.g. an empty string), producing the natural default of the result type (e.g. an empty array) instead of raising a type-check error. Closes #56977 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-05-07T21:38:56Z

Workflow [PR], commit [40f3891]

Summary: ❌
Unit tests (tsan, function_prop_fuzzer): issue

job_name	test_name	status	info
AST fuzzer (amd_debug, targeted)		FAIL
	Assertion `right_argument.type->isNullable()' failed (STID: 2735-3ac4)	FAIL	cidb
AST fuzzer (amd_debug, targeted, old_compatibility)		FAIL
	Assertion `right_argument.type->isNullable()' failed (STID: 2735-356b)	FAIL	cidb
Stateless tests (arm_binary, parallel)		FAIL
	02903_rmt_retriable_merge_exception	FAIL	cidb
Unit tests (asan_ubsan, function_prop_fuzzer)		FAIL
Unit tests (tsan, function_prop_fuzzer)		FAIL	issue
Unit tests (msan, function_prop_fuzzer)		FAIL

AI Review

Summary

This PR fixes default null-handling for functions that use IExecutableFunction::defaultImplementationForNulls but return non-Nullable types (for example Array), by switching return-type wrapping to makeNullableSafe and by avoiding wrapInNullable on non-Nullable results. The new stateless coverage (including short-circuit-null evaluation paths) is aligned with the intended behavior, and I did not find blocker or major issues in the current patch.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅
No large/binary files	✅

Final Verdict

Status: ✅ Approve

george-larionov · 2026-05-08T09:46:57Z

The comment in IFunction.h above useDefaultImplementationForNulls() says that the function may be executed with garbage input instead of null values, and this PR removes the wrapInNullable() and just returns res without applying the null_map. I think there is a possibility of returning the output over these garbage values in some cases, reproducer (the first line should just output null for extractAll but it outputs ['hello','world']):

WITH if(number = 0, nullIf(materialize('hello world'), materialize('hello world')), CAST('foo bar', 'Nullable(String)')) AS s
SELECT
    number,
    s,
    isNull(s),
    extractAll(s, '(\\w+)')
FROM numbers(3)

Query id: dc6e2b35-04c7-483c-a6a0-7212450a2ad8

   ┌─number─┬─s───────┬─isNull(s)─┬─extractAll(s, '(\\w+)')─┐
1. │      0 │ ᴺᵁᴸᴸ    │         1 │ ['hello','world']       │
2. │      1 │ foo bar │         0 │ ['foo','bar']           │
3. │      2 │ foo bar │         0 │ ['foo','bar']           │
   └────────┴─────────┴───────────┴─────────────────────────┘

…esult `IExecutableFunction::defaultImplementationForNulls` had three optimization paths that returned `default(result_type)` for null input rows: the early return when a const-null argument is detected, the all-null block early return, and the short-circuit `filter`/`expand` branch. For non-Nullable result types this differs from the `f(default(input))` semantics described in the PR — for example, `splitByChar(',', toNullable(''))` should produce `['']`, but the short-circuit branch (with `short_circuit_function_evaluation_for_nulls_threshold = 0.5`) would produce `[]` because the default of `Array(String)` is the empty array. Disable each of these optimizations when the result type is not Nullable so null rows fall through to the regular evaluation path, which runs the function on the nested column where null rows hold the default of the input type. Spotted by clickhouse-gh review on #104326. Extend `04209_extractAll_nullable` with a short-circuit case and update the all-null reference outputs to reflect the corrected semantics (`splitByChar(',', NULL)` = `['']`, `splitByWhitespace(NULL)` = `[]`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Before, `arrayAutocorrelation([1, 2, 3], CAST(2, 'Nullable(UInt32)'))` failed with `ILLEGAL_TYPE_OF_ARGUMENT` only because `makeNullable(Array(Float64))` rejected the wrapping in the framework's `defaultImplementationForNulls`. With the framework fix in #104326 the function legitimately accepts a Nullable lag (the framework strips `Nullable` before invoking `getReturnTypeImpl`), so the assertion no longer holds. Replace the negative-test expectation with the actual result `[1,0]`. The Nullable element-type case (`arrayAutocorrelation(CAST([1,2,3], 'Array(Nullable(UInt32))'))`) is rejected by `getReturnTypeImpl` itself and continues to error. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104326&sha=f5d225718dcbb18021b0230351e79f836c31395c&name_0=PR&name_1=Fast%20test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

george-larionov

I have some concerns with the handling of null inputs containing garbage vals, see comments.

george-larionov · 2026-05-08T12:20:24Z

+            /// return it as-is. For null input rows, the function will be evaluated
+            /// over the default values of the nested column and produce the corresponding default result
+            /// (e.g. an empty array), instead of failing the type check.
+            return makeNullableSafe(return_type);


makeNullableSafe() returns the return_type untouched if it cannot be made Nullable. Since useDefaultImplementationForNulls() is true by default for derived classes, this means that functions that return non-nullable types used to fail and force the author to think about whether they should be using useDefaultImplementationForNulls or not. With this change, they will just run with potentially incorrect results. Maybe we should set useDefaultImplementationForNulls to false by default then?

george-larionov · 2026-05-08T12:24:05Z

            /// Each row should be evaluated if there are no nulls or short circuiting is disabled.
            auto res = executeWithoutLowCardinalityColumns(temporary_columns, temporary_result_type, input_rows_count, dry_run);
+            if (!result_is_nullable)
+                return res;


This res may contain outputs from garbage inputs (there is no guarantee that the null rows actually contain default vals AFAIK). Unlike in the code on/after line 322, there is no handling of this. Maybe in the case of !result_is_nullable it should always go through the logic on line 322-339? See this comment.

george-larionov · 2026-05-08T12:24:28Z


            auto res = executeWithoutLowCardinalityColumns(temporary_columns, temporary_result_type, input_rows_count, dry_run);
+            if (!result_is_nullable)
+                return res;


This res may contain outputs from garbage inputs (there is no guarantee that the null rows actually contain default vals AFAIK). Unlike in the code on/after line 322, there is no handling of this. Maybe in the case of !result_is_nullable it should always go through the logic on line 322-339? See this comment.

george-larionov · 2026-05-08T12:38:23Z

Another example with incorrect output, should evaluate to []:

SELECT extractAll(nullIf(materialize('hello world'), materialize('hello world')), '(\\w+)')

Query id: b8d2c747-ccf4-4fbd-a72a-4057da808c96

   ┌─extractAll(n⋯, '(\\w+)')─┐
1. │ ['hello','world']        │
   └──────────────────────────┘

clickhouse-gh · 2026-05-08T13:00:54Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.10%	84.10%	+0.00%
Functions	91.10%	91.10%	+0.00%
Branches	76.60%	76.60%	+0.00%

Changed lines: 97.37% (37/38) · Uncovered code

Full report · Diff report

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 7, 2026

clickhouse-gh Bot reviewed May 7, 2026

View reviewed changes

Comment thread src/Functions/IFunction.cpp

george-larionov self-assigned this May 8, 2026

alexey-milovidov and others added 2 commits May 8, 2026 10:01

george-larionov requested changes May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix extractAll and similar functions on Nullable inputs#104326

Fix extractAll and similar functions on Nullable inputs#104326
alexey-milovidov wants to merge 3 commits intomasterfrom
fix-nullable-array-extract

alexey-milovidov commented May 7, 2026

Uh oh!

clickhouse-gh Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

george-larionov commented May 8, 2026

Uh oh!

george-larionov left a comment

Uh oh!

george-larionov May 8, 2026

Uh oh!

george-larionov May 8, 2026

Uh oh!

george-larionov May 8, 2026

Uh oh!

george-larionov commented May 8, 2026

Uh oh!

clickhouse-gh Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexey-milovidov commented May 7, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

ClickHouse Rules

Final Verdict

Uh oh!

Uh oh!

george-larionov commented May 8, 2026

Uh oh!

george-larionov left a comment

Choose a reason for hiding this comment

Uh oh!

george-larionov May 8, 2026

Choose a reason for hiding this comment

Uh oh!

george-larionov May 8, 2026

Choose a reason for hiding this comment

Uh oh!

george-larionov May 8, 2026

Choose a reason for hiding this comment

Uh oh!

george-larionov commented May 8, 2026

Uh oh!

clickhouse-gh Bot commented May 8, 2026

LLVM Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clickhouse-gh Bot commented May 7, 2026 •

edited

Loading