Fix dictGetOrNull corrupting input columns with a Nullable key#104327
Fix dictGetOrNull corrupting input columns with a Nullable key#104327groeneai wants to merge 2 commits intoClickHouse:masterfrom
Conversation
`dictGetOrNull` with a `Nullable` key column silently overwrote other
columns in the SELECT projection with `NULL` whenever a key was missing
in the dictionary. Reproducer:
```sql
CREATE DICTIONARY d (id UInt64, name String)
PRIMARY KEY id SOURCE(CLICKHOUSE(QUERY $$SELECT c1 AS id, c2 AS name FROM
VALUES((1, 'one'), (2, 'two'))$$ )) LAYOUT(FLAT()) LIFETIME(0);
SELECT x, dictGetOrNull('d', 'name', x) FROM
(SELECT toNullable(arrayJoin([0, 1])) AS x);
```
Before the fix, the row for `x = 0` returned `(NULL, NULL)` instead of
`(0, NULL)` — the input column `x` was clobbered alongside the missing
dictionary value.
Root cause: when the third argument to `dictGetOrNull` is `Nullable`,
`FunctionDictGetNoType::executeImpl` calls `wrapInNullable`, which
produces a `ColumnNullable` whose null map shares storage with the input
key column's null map. `FunctionDictGetOrNull::executeImpl` then went
through `assumeMutable` and mutated that null map in place via
`addNullMap` (OR-ing in the keys-not-in-dictionary mask), corrupting
the input column.
The fix uses `IColumn::mutate(std::move(...))` instead of
`assumeMutable`. `IColumn::mutate` performs a deep clone of any shared
sub-columns before returning a mutable handle, so the result's null map
is unshared by the time we mutate it. This applies the proper
copy-on-write contract at the point of mutation.
Closes ClickHouse#73633
Pre-PR validation gatePer the worker process, here are explicit answers to the five mandatory pre-PR questions plus the bug-scope question. a) Deterministic repro? Yes — the reproducer below fails on master CREATE DICTIONARY d (id UInt64, name String) PRIMARY KEY id
SOURCE(CLICKHOUSE(QUERY $$SELECT c1 AS id, c2 AS name FROM VALUES((1, 'one'), (2, 'two'))$$ ))
LAYOUT(FLAT()) LIFETIME(0);
SELECT x, dictGetOrNull('d', 'name', x) AS dx FROM (SELECT toNullable(arrayJoin([0, 1])) AS x);Master returns b) Root cause explained? Yes. c) Fix matches root cause? Yes. We replace d) Test intent preserved? / New tests added? Yes. New regression test e) Demonstrated in both directions? Yes. Verified by reverting the fix, rebuilding, observing the bug; restoring the fix, rebuilding, observing the row preserved. Reproducer is deterministic. f) Fix is general, not a narrow patch? Yes. The deep mutation is applied at the outer column, so it covers both the single-attribute branch (line 1040) and the tuple-of-attributes branch (line 1011 through 1037). The pattern of "mutate a column we got from another function via Session: cron:clickhouse-ci-task-worker:20260507-224500 |
|
cc @vitlibar @vdimir @rschu1ze — could you take a look? This is a small targeted fix for cc @den-crane — fix per your suggestion in the issue. PTAL when you get a chance. |
|
Workflow [PR], commit [49fc4f8] Summary: ❌
AI ReviewSummaryThis PR fixes a real correctness bug in ClickHouse Rules
Final Verdict
|
The test creates a dictionary in the test's auto-generated database
and then calls `dictGetOrNull('d_73633', ...)` from the same query.
On `Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage,
parallel)`, parallel replica worker nodes do not have the dictionary
defined, so the worker side of the query fails with
`Code: 36 BAD_ARGUMENTS: Dictionary ('d_73633') not found`.
Adding `no-parallel-replicas` matches the established pattern for
dictionary tests (e.g. `03671_dict_in_subquery_in_index_analysis_context_expired.sql`,
`03703_optimize_inverse_dictionary_lookup_dictget_family.sql`).
The test is verifying a column-aliasing bug fix in `dictGetOrNull`,
not parallel-replicas compatibility.
CI report (failing run on commit ad33b36):
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104327&sha=ad33b3641b67aaeaec2b6559e74ba5ad6b71d314&name_0=PR&name_1=Stateless%20tests%20%28amd_llvm_coverage%2C%20ParallelReplicas%2C%20s3%20storage%2C%20parallel%29
PR: ClickHouse#104327
|
Pushed CI failure: Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel) — Root cause: the test creates dictionary Fix: added The fix itself (commit |
LLVM Coverage Report
Changed lines: 100.00% (8/8) | lost baseline coverage: 3 line(s) · Uncovered code |
dictGetOrNullwith aNullablekey column silently overwrote other columns in the SELECT projection withNULLwhenever a key was missing in the dictionary.Reproducer (from @den-crane in issue #73633):
Before the fix, the row for
x = 0returned(NULL, NULL)instead of(0, NULL)— the input columnxwas clobbered alongside the missing dictionary value. The bug reproduces onMemory,MergeTree, and subquery sources, with or without anifNullwrapper around the key.Root cause
When the third argument to
dictGetOrNullisNullable,FunctionDictGetNoType::executeImplcallswrapInNullable, which produces aColumnNullablewhose null map shares storage with the input key column's null map (no clone — seewrapInNullableinsrc/Functions/FunctionHelpers.cpp).FunctionDictGetOrNull::executeImplthen went throughassumeMutableand mutated that null map in place viaaddNullMap(OR-ing in the keys-not-in-dictionary mask). Because the null map was aliased witharguments[2]'s null map, the input column was mutated too — every row whose key was missing in the dictionary appeared asNULLin the originalxcolumn of the projection.assumeMutableonly casts awayconstwithout checking sharing, so it is unsafe when the column has shared sub-columns.Fix
Use
IColumn::mutate(std::move(...))instead ofassumeMutable.IColumn::mutatedeep-clones any shared sub-columns (forEachMutableSubcolumnwalksnested_columnandnull_mapforColumnNullable), so by the time we mutate the result's null map it is unshared. This applies the proper copy-on-write contract at the point of mutation. The fix covers both the single-attribute and the tuple-of-attributes branches because the deep clone is applied at the outer column.Verification
arrayJoin,Memory,Memory + ifNull,MergeTree,MergeTree + ifNull) now return the correct rows.Nullable(String)JSON column passed viaJSONExtractInt) preservesjson_string.sum(x)over a query that also callsdictGetOrNull('d', 'name', x)) report the correct sum.01780_dict_get_or_null,02014_dict_get_nullable_key,02125_dict_get_type_nullable_fix,02176_dict_get_has_implicit_key_cast,01941_dict_get_has_complex_single_key,01129_dict_get_join_lose_constness,02384_analyzer_dict_get_join_get,03009_range_dict_get_or_default,03741_dict_get_in_cte_with_no_arguments_old_analyzerall still pass.04210_dict_get_or_null_nullable_key_no_corruption.sqlexercises every variant.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix
dictGetOrNullsilently overwriting other columns in theSELECTprojection withNULLwhen called with aNullablekey column whose values are missing in the dictionary. The function was mutating an input-aliased null map in place; it now deep-clones the result column before mutation. Closes #73633.Documentation entry for user-facing changes