Skip to content

Fix FunctionVariantAdaptor wrapping non-type exceptions as LOGICAL_ERROR#102855

Merged
Avogar merged 1 commit intoClickHouse:masterfrom
groeneai:fix/variant-adaptor-propagate-exceptions
Apr 16, 2026
Merged

Fix FunctionVariantAdaptor wrapping non-type exceptions as LOGICAL_ERROR#102855
Avogar merged 1 commit intoClickHouse:masterfrom
groeneai:fix/variant-adaptor-propagate-exceptions

Conversation

@groeneai
Copy link
Copy Markdown
Contributor

ExecutableFunctionVariantAdaptor::executeImpl() has 7 catch (const Exception & e) blocks around castColumn() calls that catch ALL exceptions and re-throw them as LOGICAL_ERROR. Non-type-conversion exceptions like MEMORY_LIMIT_EXCEEDED are incorrectly wrapped, causing abortOnFailedAssertion() to crash the server instead of returning a user-facing error.

Root cause: When a function applied to a Variant column triggers OOM inside castColumn() (which converts the result to the expected Variant type), the OOM exception is caught and re-thrown as:

Logical error: 'Cannot convert nested result of function moduloOrZero with type Float64
to the expected result type Variant(Float64, Int64): Query memory limit exceeded...'

This LOGICAL_ERROR triggers the assertion handler and kills the server.

Fix: Added error code filtering to all 7 catch blocks. Only type-conversion errors (ILLEGAL_TYPE_OF_ARGUMENT, TYPE_MISMATCH, CANNOT_CONVERT_TYPE, NO_COMMON_TYPE) are wrapped as LOGICAL_ERROR. All other exceptions propagate as-is. This matches the existing pattern in the FunctionBaseVariantAdaptor constructor (which already filters these exact codes).

Reproduction:

CREATE TABLE t (v Variant(Float64, Int64)) ENGINE = Memory;
INSERT INTO t SELECT (number * 1.0001)::Float64::Variant(Float64, Int64) FROM numbers(5000000);
-- Before fix: server crashes with LOGICAL_ERROR
-- After fix: returns MEMORY_LIMIT_EXCEEDED (code 241), server stays alive
SELECT moduloOrZero(v, 65536) FROM t SETTINGS max_memory_usage = 10000000 FORMAT Null;

All 3 execution paths in executeImpl are covered:

  1. Single variant type, no NULLs (line 138 — the path triggered by CI)
  2. Single variant type + NULLs (line 236, 265, 288)
  3. Multiple variant types (lines 411, 571)

Fixes #93960

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix server crash (LOGICAL_ERROR assertion) when a function on a Variant column hits a memory limit or other non-type-conversion exception during result casting in FunctionVariantAdaptor. The exception is now propagated correctly instead of being misclassified as an internal error.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

ExecutableFunctionVariantAdaptor::executeImpl() has 7 catch blocks around
castColumn() calls that catch ALL exceptions and re-throw them as
LOGICAL_ERROR. This is wrong: non-type-conversion exceptions like
MEMORY_LIMIT_EXCEEDED should propagate as-is, not trigger
abortOnFailedAssertion() and crash the server.

The fix adds error code filtering to all 7 catch blocks: only type-related
exceptions (ILLEGAL_TYPE_OF_ARGUMENT, TYPE_MISMATCH, CANNOT_CONVERT_TYPE,
NO_COMMON_TYPE) are wrapped as LOGICAL_ERROR. All other exceptions are
re-thrown unchanged. This matches the pattern already used in the
FunctionBaseVariantAdaptor constructor.

Fixes: ClickHouse#93960
@groeneai
Copy link
Copy Markdown
Contributor Author

Pre-PR Validation Gate (session: cron:clickhouse-ci-task-worker:20260415-194500)

a) Deterministic repro?SELECT moduloOrZero(v, 65536) FROM test_variant_oom SETTINGS max_memory_usage = 10000000 FORMAT Null crashes 100% on unfixed binary (server dies with LOGICAL_ERROR assertion).

b) Root cause explained?ExecutableFunctionVariantAdaptor::executeImpl() has 7 catch (const Exception & e) blocks around castColumn() calls. They catch ALL exceptions and re-throw as LOGICAL_ERROR. When castColumn hits OOM (MEMORY_LIMIT_EXCEEDED), the OOM is wrapped as LOGICAL_ERROR → abortOnFailedAssertion() → server crash. The catch blocks should only wrap type-conversion errors.

c) Fix matches root cause? ✅ Added error code filtering to all 7 catch blocks: only ILLEGAL_TYPE_OF_ARGUMENT, TYPE_MISMATCH, CANNOT_CONVERT_TYPE, NO_COMMON_TYPE get wrapped as LOGICAL_ERROR. All other exceptions propagate as-is. Same pattern already used in FunctionBaseVariantAdaptor constructor (lines 639-648).

d) New tests added?04101_variant_adaptor_propagate_exceptions.sql covers all 3 code paths (single variant, single+NULLs, multi-variant) and verifies MEMORY_LIMIT_EXCEEDED propagates correctly. Also verifies normal operation.

e) Demonstrated in both directions? ✅ Before fix: server crashes. After fix: returns code 241 MEMORY_LIMIT_EXCEEDED, server stays alive. 10/10 test runs pass.

f) Fix is general? ✅ All 7 catch blocks fixed, covering every execution path in executeImpl. Uses the same error code set as the existing constructor pattern.

@groeneai
Copy link
Copy Markdown
Contributor Author

cc @Avogar @alexey-milovidov — could you review this? It fixes a server crash where FunctionVariantAdaptor catches non-type-conversion exceptions (like OOM) from castColumn() and wraps them as LOGICAL_ERROR, triggering abortOnFailedAssertion(). The fix adds error code filtering to all 7 catch blocks, matching the pattern already in the constructor.

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Apr 15, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 15, 2026

Workflow [PR], commit [4b7f9ee]

Summary:


AI Review

Summary

This PR fixes exception classification in ExecutableFunctionVariantAdaptor::executeImpl: only type-conversion exceptions are wrapped as LOGICAL_ERROR, while non-type exceptions (for example MEMORY_LIMIT_EXCEEDED) now propagate correctly. The change is targeted, applied consistently across all affected castColumn catch sites, and accompanied by a new stateless test covering all three execution paths. I did not find correctness, safety, or compatibility regressions in the diff.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 15, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 15, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 23.81% (10/42) · Uncovered code

Full report · Diff report

@Avogar Avogar self-assigned this Apr 16, 2026
@Avogar
Copy link
Copy Markdown
Member

Avogar commented Apr 16, 2026

CH Inc sync - #102890

@Avogar Avogar added this pull request to the merge queue Apr 16, 2026
Merged via the queue into ClickHouse:master with commit 2afda72 Apr 16, 2026
161 checks passed
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: Cannot convert nested result of function A with type B to the expected result type C: D (STID: 4827-671c)

4 participants