Skip to content

Fix dictGetOrNull corrupting input columns with a Nullable key#104327

Open
groeneai wants to merge 2 commits intoClickHouse:masterfrom
groeneai:fix-73633-dict-get-or-null-nullable-key-corruption
Open

Fix dictGetOrNull corrupting input columns with a Nullable key#104327
groeneai wants to merge 2 commits intoClickHouse:masterfrom
groeneai:fix-73633-dict-get-or-null-nullable-key-corruption

Conversation

@groeneai
Copy link
Copy Markdown
Contributor

@groeneai groeneai commented May 7, 2026

dictGetOrNull with a Nullable key column silently overwrote other columns in the SELECT projection with NULL whenever a key was missing in the dictionary.

Reproducer (from @den-crane in issue #73633):

CREATE DICTIONARY d (id UInt64, name String)
PRIMARY KEY id SOURCE(CLICKHOUSE(QUERY $$SELECT c1 AS id, c2 AS name FROM
VALUES((1, 'one'), (2, 'two'))$$ )) LAYOUT(FLAT()) LIFETIME(0);

SELECT x, dictGetOrNull('d', 'name', x) AS dx
FROM (SELECT toNullable(arrayJoin([0, 1])) AS x);

Before the fix, the row for x = 0 returned (NULL, NULL) instead of (0, NULL) — the input column x was clobbered alongside the missing dictionary value. The bug reproduces on Memory, MergeTree, and subquery sources, with or without an ifNull wrapper around the key.

Root cause

When the third argument to dictGetOrNull is Nullable, FunctionDictGetNoType::executeImpl calls wrapInNullable, which produces a ColumnNullable whose null map shares storage with the input key column's null map (no clone — see wrapInNullable in src/Functions/FunctionHelpers.cpp). FunctionDictGetOrNull::executeImpl then went through assumeMutable and mutated that null map in place via addNullMap (OR-ing in the keys-not-in-dictionary mask). Because the null map was aliased with arguments[2]'s null map, the input column was mutated too — every row whose key was missing in the dictionary appeared as NULL in the original x column of the projection.

assumeMutable only casts away const without checking sharing, so it is unsafe when the column has shared sub-columns.

Fix

Use IColumn::mutate(std::move(...)) instead of assumeMutable. IColumn::mutate deep-clones any shared sub-columns (forEachMutableSubcolumn walks nested_column and null_map for ColumnNullable), so by the time we mutate the result's null map it is unshared. This applies the proper copy-on-write contract at the point of mutation. The fix covers both the single-attribute and the tuple-of-attributes branches because the deep clone is applied at the outer column.

Verification

  • All five variants from den-crane's reproducer (subquery + arrayJoin, Memory, Memory + ifNull, MergeTree, MergeTree + ifNull) now return the correct rows.
  • The original reporter's shape (Nullable(String) JSON column passed via JSONExtractInt) preserves json_string.
  • Aggregations over the input column (e.g. sum(x) over a query that also calls dictGetOrNull('d', 'name', x)) report the correct sum.
  • Existing tests 01780_dict_get_or_null, 02014_dict_get_nullable_key, 02125_dict_get_type_nullable_fix, 02176_dict_get_has_implicit_key_cast, 01941_dict_get_has_complex_single_key, 01129_dict_get_join_lose_constness, 02384_analyzer_dict_get_join_get, 03009_range_dict_get_or_default, 03741_dict_get_in_cte_with_no_arguments_old_analyzer all still pass.
  • New regression test 04210_dict_get_or_null_nullable_key_no_corruption.sql exercises every variant.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix dictGetOrNull silently overwriting other columns in the SELECT projection with NULL when called with a Nullable key column whose values are missing in the dictionary. The function was mutating an input-aliased null map in place; it now deep-clones the result column before mutation. Closes #73633.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

`dictGetOrNull` with a `Nullable` key column silently overwrote other
columns in the SELECT projection with `NULL` whenever a key was missing
in the dictionary. Reproducer:

```sql
CREATE DICTIONARY d (id UInt64, name String)
PRIMARY KEY id SOURCE(CLICKHOUSE(QUERY $$SELECT c1 AS id, c2 AS name FROM
VALUES((1, 'one'), (2, 'two'))$$ )) LAYOUT(FLAT()) LIFETIME(0);

SELECT x, dictGetOrNull('d', 'name', x) FROM
    (SELECT toNullable(arrayJoin([0, 1])) AS x);
```

Before the fix, the row for `x = 0` returned `(NULL, NULL)` instead of
`(0, NULL)` — the input column `x` was clobbered alongside the missing
dictionary value.

Root cause: when the third argument to `dictGetOrNull` is `Nullable`,
`FunctionDictGetNoType::executeImpl` calls `wrapInNullable`, which
produces a `ColumnNullable` whose null map shares storage with the input
key column's null map. `FunctionDictGetOrNull::executeImpl` then went
through `assumeMutable` and mutated that null map in place via
`addNullMap` (OR-ing in the keys-not-in-dictionary mask), corrupting
the input column.

The fix uses `IColumn::mutate(std::move(...))` instead of
`assumeMutable`. `IColumn::mutate` performs a deep clone of any shared
sub-columns before returning a mutable handle, so the result's null map
is unshared by the time we mutate it. This applies the proper
copy-on-write contract at the point of mutation.

Closes ClickHouse#73633
@groeneai
Copy link
Copy Markdown
Contributor Author

groeneai commented May 7, 2026

Pre-PR validation gate

Per the worker process, here are explicit answers to the five mandatory pre-PR questions plus the bug-scope question.

a) Deterministic repro? Yes — the reproducer below fails on master c93903f9fd6 and passes after this change:

CREATE DICTIONARY d (id UInt64, name String) PRIMARY KEY id
SOURCE(CLICKHOUSE(QUERY $$SELECT c1 AS id, c2 AS name FROM VALUES((1, 'one'), (2, 'two'))$$ ))
LAYOUT(FLAT()) LIFETIME(0);

SELECT x, dictGetOrNull('d', 'name', x) AS dx FROM (SELECT toNullable(arrayJoin([0, 1])) AS x);

Master returns (NULL, NULL) for the first row (x corrupted). With the fix it returns (0, NULL).

b) Root cause explained? Yes. FunctionDictGetNoType::executeImpl calls wrapInNullable when the key is Nullable. wrapInNullable reuses the input column's null-map column pointer (no clone) when there is a single nullable arg — it just shares storage. FunctionDictGetOrNull::executeImpl then went through assumeMutable (a const_cast that does not check use_count) and mutated that shared null map in place via addNullMap. The aliasing meant every row whose key was missing in the dictionary appeared as NULL in the original x column of the projection.

c) Fix matches root cause? Yes. We replace assumeMutable with IColumn::mutate(std::move(...)). The latter walks forEachMutableSubcolumn and recursively detaches any sub-column whose use_count exceeds 1 — for ColumnNullable that means cloning the null map when it is shared. By the time we mutate the result's null map it is unshared and the input column is preserved.

d) Test intent preserved? / New tests added? Yes. New regression test 04210_dict_get_or_null_nullable_key_no_corruption.sql covers every shape from den-crane's reproducer (subquery + arrayJoin, Memory, Memory + ifNull, MergeTree, MergeTree + ifNull), an aggregate over x after dictGetOrNull, and the original reporter's JSONExtractInt(Nullable(String)) shape. No existing tests were weakened; ran 01780, 02014, 02125, 02176, 01941, 01129, 03009, 03741, 02384 locally — all pass.

e) Demonstrated in both directions? Yes. Verified by reverting the fix, rebuilding, observing the bug; restoring the fix, rebuilding, observing the row preserved. Reproducer is deterministic.

f) Fix is general, not a narrow patch? Yes. The deep mutation is applied at the outer column, so it covers both the single-attribute branch (line 1040) and the tuple-of-attributes branch (line 1011 through 1037). The pattern of "mutate a column we got from another function via addNullMap/getNullMapData" is the actual COW violation; the fix corrects that contract instead of guarding the symptom.

Session: cron:clickhouse-ci-task-worker:20260507-224500

@groeneai
Copy link
Copy Markdown
Contributor Author

groeneai commented May 7, 2026

cc @vitlibar @vdimir @rschu1ze — could you take a look? This is a small targeted fix for dictGetOrNull corrupting input columns when the key is Nullable (#73633). The change is in FunctionDictGetOrNull::executeImpl: replaces assumeMutable with IColumn::mutate(std::move(...)) so the result's null map is detached before we mutate it.

cc @den-crane — fix per your suggestion in the issue. PTAL when you get a chance.

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label May 8, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 8, 2026

Workflow [PR], commit [49fc4f8]

Summary:

job_name test_name status info comment
Stateless tests (amd_tsan, parallel, 2/2) FAIL
04103_user_network_bandwidth_throttler FAIL cidb

AI Review

Summary

This PR fixes a real correctness bug in dictGetOrNull for Nullable keys: the function could mutate a shared null map and silently nullify unrelated projected columns. The switch to IColumn::mutate before null-map modification correctly enforces copy-on-write semantics, and the new stateless regression test covers the reported reproducer variants plus an aggregate check.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 8, 2026
The test creates a dictionary in the test's auto-generated database
and then calls `dictGetOrNull('d_73633', ...)` from the same query.
On `Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage,
parallel)`, parallel replica worker nodes do not have the dictionary
defined, so the worker side of the query fails with
`Code: 36 BAD_ARGUMENTS: Dictionary ('d_73633') not found`.

Adding `no-parallel-replicas` matches the established pattern for
dictionary tests (e.g. `03671_dict_in_subquery_in_index_analysis_context_expired.sql`,
`03703_optimize_inverse_dictionary_lookup_dictget_family.sql`).
The test is verifying a column-aliasing bug fix in `dictGetOrNull`,
not parallel-replicas compatibility.

CI report (failing run on commit ad33b36):
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104327&sha=ad33b3641b67aaeaec2b6559e74ba5ad6b71d314&name_0=PR&name_1=Stateless%20tests%20%28amd_llvm_coverage%2C%20ParallelReplicas%2C%20s3%20storage%2C%20parallel%29

PR: ClickHouse#104327
@groeneai
Copy link
Copy Markdown
Contributor Author

groeneai commented May 8, 2026

Pushed 49fc4f8 to fix the only CI failure on this PR.

CI failure: Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel)04210_dict_get_or_null_nullable_key_no_corruption failed 3/3 reruns.

Root cause: the test creates dictionary d_73633 in the test's auto-generated database, then calls dictGetOrNull('d_73633', ...). On parallel replica configurations, worker nodes do not have the dictionary defined, so the remote-side execution fails with Code: 36 BAD_ARGUMENTS: Dictionary ('d_73633') not found.

Fix: added -- Tags: no-parallel-replicas, matching the established pattern for sibling dictionary tests (03671_dict_in_subquery_in_index_analysis_context_expired.sql, 03703_optimize_inverse_dictionary_lookup_dictget_family.sql, 04032_query_tree_pass_order.sql). The test verifies a column-aliasing bug fix in dictGetOrNull, not parallel-replicas compatibility.

The fix itself (commit ad33b36) is unchanged — only the test header gets the tag plus an explanatory comment.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 8, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.60% 76.70% +0.10%

Changed lines: 100.00% (8/8) | lost baseline coverage: 3 line(s) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reading from dictionary with dictGetOrNull() changes values in other columns.

2 participants