Skip to content

Fix segfault in dictionary internal query#100839

Merged
mstetsyuk merged 6 commits intomasterfrom
fix-segfault-in-dictionary-reload
Mar 30, 2026
Merged

Fix segfault in dictionary internal query#100839
mstetsyuk merged 6 commits intomasterfrom
fix-segfault-in-dictionary-reload

Conversation

@mstetsyuk
Copy link
Copy Markdown
Member

@mstetsyuk mstetsyuk commented Mar 26, 2026

Linked issues

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix null pointer dereference segfault when loading dictionaries during server shutdown. Context::getUserDefinedSQLObjectsStorage (dereferences user_defined_sql_objects_storage) is called by dictionary threads concurrently with the main thread calling Context::shutdown (sets user_defined_sql_objects_storage to null). We need to make sure we disable future updates in the dictionaries loader, kill the currently running dictionary queries and join the dictionary loading threads - all before running Context::shutdown. Similar to what we do with normal queries.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@mstetsyuk mstetsyuk requested a review from azat March 26, 2026 19:11
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 26, 2026

Workflow [PR], commit [a765f3b]

Summary:

job_name test_name status info comment
Integration tests (amd_llvm_coverage, 5/5) failure
test_server_reload/test.py::test_change_listen_host FAIL cidb
Finish Workflow failure
python3 ./ci/jobs/scripts/workflow_hooks/new_tests_check.py failure

AI Review

Summary

This PR fixes a shutdown-time null dereference by making Context::shutdown stop dictionary periodic updates, cancel in-flight dictionary-related queries via ProcessList::killAllQueries, and wait for dictionary loading threads via ExternalLoader::joinLoadingThreads before tearing down context-owned objects. The change is focused, consistent with the stated incident, and I did not find additional correctness, safety, or compatibility issues in the final diff.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! labels Mar 26, 2026
Comment thread programs/server/Server.cpp Outdated
@mstetsyuk
Copy link
Copy Markdown
Member Author

Confirmed against core dump:

(lldb) p ((DB::Context*)0x00007d0226169400)->shared->shutdown_called
warning: `this' is not accessible (substituting 0). Couldn't load 'this' because its value couldn't be evaluated
(std::atomic<bool>) {
  Value = true
}
(lldb) p ((DB::Context*)0x00007d0226169400)->shared->user_defined_sql_objects_storage
warning: `this' is not accessible (substituting 0). Couldn't load 'this' because its value couldn't be evaluated
(std::unique_ptr<DB::IUserDefinedSQLObjectsStorage>) nullptr {
  pointer = nullptr
}
(lldb)

Comment thread programs/server/Server.cpp
Comment thread programs/server/Server.cpp Outdated
@mstetsyuk mstetsyuk requested a review from azat March 26, 2026 20:39
Comment thread src/Interpreters/Context.cpp Outdated
Comment on lines +890 to +891
if (!server_settings[ServerSetting::shutdown_wait_unfinished_queries])
process_list.killAllQueries();
Copy link
Copy Markdown
Member

@azat azat Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this, we have safeExit in case there are alive connections

And it looks like a hack

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this, we have safeExit in case there are alive connections

This is needed for dictionary queries.

We need to (1) disable future updates, (2) kill all the currently running queries, and (3) join the threads.

If we skip (2), (3) can take a long time to finish.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then let's add a comment and also I guess we need to do this unconditionally, since between previous KillAllQueries and this one dictionary can start reloading?

By the way, is it hard to make proper query termination from joinLoadingThreads?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, is it hard to make proper query termination from joinLoadingThreads?

It is quite hard. There are arguably too many layers of abstraction between ExternalDictionariesLoader and the code that runs the selects.

I guess we need to do this unconditionally, since between previous KillAllQueries and this one dictionary can start reloading?

Yes, there can be new queries. My initial idea was to still diligently respect the shutdown_wait_unfinished_queries setting, but it probably doesn't matter, so I removed the check.

Comment thread src/Interpreters/Context.cpp
Comment thread src/Interpreters/Context.cpp Outdated
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 27, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.20% 84.10% -0.10%
Functions 24.50% 24.30% -0.20%
Branches 76.70% 76.60% -0.10%

Changed lines: 92.59% (25/27) · Uncovered code

Full report · Diff report

@mstetsyuk mstetsyuk added this pull request to the merge queue Mar 30, 2026
Merged via the queue into master with commit 2ed7d5f Mar 30, 2026
150 of 153 checks passed
@mstetsyuk mstetsyuk deleted the fix-segfault-in-dictionary-reload branch March 30, 2026 09:10
@robot-clickhouse robot-clickhouse added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Mar 30, 2026
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 30, 2026
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 30, 2026
clickhouse-gh Bot added a commit that referenced this pull request Mar 30, 2026
Backport #100839 to 26.2: Fix segfault in dictionary internal query
mstetsyuk added a commit that referenced this pull request Mar 31, 2026
Backport #100839 to 26.1: Fix segfault in dictionary internal query
mstetsyuk added a commit that referenced this pull request Mar 31, 2026
Backport #100839 to 26.3: Fix segfault in dictionary internal query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants