Skip to content

(not present in released builds) Segfault in ClusterDiscovery::getNodeNames (null ZooKeeper dereference) in PR #1414/#1390 #1483

@alsugiliazova

Description

@alsugiliazova

ClusterDiscovery segfault (causes cascade failures)

After the node failure tests restart ClickHouse, clickhouse1 crashes on startup with a segfault in ClusterDiscovery::getNodeNames (ClusterDiscovery.cpp:302) — a null shared_ptr<ZooKeeper> dereference.

The code that crashes is from PR #1414, not from #1390 — PR #1390 does not modify ClusterDiscovery.cpp. This bug wasn’t caught during #1414 verification because the node failure tests couldn’t run back then (the object_storage_cluster setting that #1390 adds was missing).

Trace:

2026.03.05 14:12:40.736043 [ 21401 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
2026.03.05 14:12:40.736056 [ 21401 ] {} <Fatal> BaseDaemon: (version 26.1.3.20001.altinityantalya, build id: 4AF3CE21FCDA3C93566C5D78D4F05C303ED0F81E, git hash: 72cad568a405344cbc3ad21e9de26bf5aec976e3, architecture: x86_64) (from thread 22078) Received signal 11
2026.03.05 14:12:40.736059 [ 21401 ] {} <Fatal> BaseDaemon: Signal description: Segmentation fault
2026.03.05 14:12:40.736061 [ 21401 ] {} <Fatal> BaseDaemon: Address: 0xffffffffffffffe8. Access: read. Address not mapped to object.
2026.03.05 14:12:40.736065 [ 21401 ] {} <Fatal> BaseDaemon: Stack trace: 0x000058027572ef79 0x0000580275736b16 0x0000580275733a8e 0x000058027573dc85 0x0000580275740658 0x000058026fc060f5 0x000058026fc0b97b 0x00007d11c0133ac3 0x00007d11c01c5850
2026.03.05 14:12:40.736067 [ 21401 ] {} <Fatal> BaseDaemon: ########################################
2026.03.05 14:12:40.736070 [ 21401 ] {} <Fatal> BaseDaemon: (version 26.1.3.20001.altinityantalya, build id: 4AF3CE21FCDA3C93566C5D78D4F05C303ED0F81E, git hash: 72cad568a405344cbc3ad21e9de26bf5aec976e3) (from thread 22078) (no query) Received signal Segmentation fault (11)
2026.03.05 14:12:40.736072 [ 21401 ] {} <Fatal> BaseDaemon: Address: 0xffffffffffffffe8. Access: read. Address not mapped to object.
2026.03.05 14:12:40.736073 [ 21401 ] {} <Fatal> BaseDaemon: Stack trace: 0x000058027572ef79 0x0000580275736b16 0x0000580275733a8e 0x000058027573dc85 0x0000580275740658 0x000058026fc060f5 0x000058026fc0b97b 0x00007d11c0133ac3 0x00007d11c01c5850
2026.03.05 14:12:40.775878 [ 21401 ] {} <Fatal> BaseDaemon: 3.0. inlined from ./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:476: shared_ptr
2026.03.05 14:12:40.775900 [ 21401 ] {} <Fatal> BaseDaemon: 3. ./ci/tmp/build/./src/Interpreters/ClusterDiscovery.cpp:295: DB::ClusterDiscovery::getNodeNames(std::shared_ptr<zkutil::ZooKeeper>&, String const&, String const&, int*, bool, unsigned long) @ 0x00000000191e2f79
2026.03.05 14:12:40.810625 [ 21401 ] {} <Fatal> BaseDaemon: 4. ./ci/tmp/build/./src/Interpreters/ClusterDiscovery.cpp:432: DB::ClusterDiscovery::upsertCluster(DB::ClusterDiscovery::ClusterInfo&)::$_0::operator()() const @ 0x00000000191eab16
2026.03.05 14:12:40.844554 [ 21401 ] {} <Fatal> BaseDaemon: 5. ./ci/tmp/build/./src/Interpreters/ClusterDiscovery.cpp:465: DB::ClusterDiscovery::upsertCluster(DB::ClusterDiscovery::ClusterInfo&) @ 0x00000000191e7a8e
2026.03.05 14:12:40.883035 [ 21401 ] {} <Fatal> BaseDaemon: 6. ./ci/tmp/build/./src/Interpreters/ClusterDiscovery.cpp:740: DB::ClusterDiscovery::runMainThread(std::function<void ()>) @ 0x00000000191f1c85
2026.03.05 14:12:40.925740 [ 21401 ] {} <Fatal> BaseDaemon: 7.0. inlined from ./ci/tmp/build/./src/Interpreters/ClusterDiscovery.cpp:660: operator()
2026.03.05 14:12:40.925769 [ 21401 ] {} <Fatal> BaseDaemon: 7.1. inlined from ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:87: std::__invoke_result_impl<void, DB::ClusterDiscovery::start()::$_0&>::type std::__invoke[abi:ne210105]<DB::ClusterDiscovery::start()::$_0&>(DB::ClusterDiscovery::start()::$_0&)
2026.03.05 14:12:40.925777 [ 21401 ] {} <Fatal> BaseDaemon: 7.2. inlined from ./contrib/llvm-project/libcxx/include/tuple:1380: decltype(auto) std::__apply_tuple_impl[abi:ne210105]<DB::ClusterDiscovery::start()::$_0&, std::tuple<>&>(DB::ClusterDiscovery::start()::$_0&, std::tuple<>&, std::__tuple_indices<...>)
2026.03.05 14:12:40.925781 [ 21401 ] {} <Fatal> BaseDaemon: 7.3. inlined from ./contrib/llvm-project/libcxx/include/tuple:1384: decltype(auto) std::apply[abi:ne210105]<DB::ClusterDiscovery::start()::$_0&, std::tuple<>&>(DB::ClusterDiscovery::start()::$_0&, std::tuple<>&)
2026.03.05 14:12:40.925783 [ 21401 ] {} <Fatal> BaseDaemon: 7.4. inlined from ./src/Common/ThreadPool.h:312: operator()
2026.03.05 14:12:40.925790 [ 21401 ] {} <Fatal> BaseDaemon: 7.5. inlined from ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:87: std::__invoke_result_impl<void, ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::ClusterDiscovery::start()::$_0>(DB::ClusterDiscovery::start()::$_0&&)::'lambda'()&>::type std::__invoke[abi:ne210105]<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::ClusterDiscovery::start()::$_0>(DB::ClusterDiscovery::start()::$_0&&)::'lambda'()&>(ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::ClusterDiscovery::start()::$_0>(DB::ClusterDiscovery::start()::$_0&&)::'lambda'()&)
2026.03.05 14:12:40.925796 [ 21401 ] {} <Fatal> BaseDaemon: 7.6. inlined from ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:342: void std::__invoke_void_return_wrapper<void, true>::__call[abi:ne210105]<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::ClusterDiscovery::start()::$_0>(DB::ClusterDiscovery::start()::$_0&&)::'lambda'()&>(ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::ClusterDiscovery::start()::$_0>(DB::ClusterDiscovery::start()::$_0&&)::'lambda'()&)
2026.03.05 14:12:40.925800 [ 21401 ] {} <Fatal> BaseDaemon: 7.7. inlined from ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:348: DB::ClusterDiscovery::start()::$_0 std::__invoke_r[abi:ne210105]<void, ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::ClusterDiscovery::start()::$_0>(DB::ClusterDiscovery::start()::$_0&&)::'lambda'()&>()
2026.03.05 14:12:40.925802 [ 21401 ] {} <Fatal> BaseDaemon: 7. ./contrib/llvm-project/libcxx/include/__functional/function.h:450: ? @ 0x00000000191f4658
2026.03.05 14:12:40.934183 [ 21401 ] {} <Fatal> BaseDaemon: 8.0. inlined from ./contrib/llvm-project/libcxx/include/__functional/function.h:508: ?
2026.03.05 14:12:40.934202 [ 21401 ] {} <Fatal> BaseDaemon: 8.1. inlined from ./contrib/llvm-project/libcxx/include/__functional/function.h:772: ?
2026.03.05 14:12:40.934205 [ 21401 ] {} <Fatal> BaseDaemon: 8. ./ci/tmp/build/./src/Common/ThreadPool.cpp:811: ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x00000000136ba0f5
2026.03.05 14:12:40.947716 [ 21401 ] {} <Fatal> BaseDaemon: 9.0. inlined from ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:0: std::__invoke_result_impl<void, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>::type std::__invoke[abi:ne210105]<void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>(void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*&&)
2026.03.05 14:12:40.947739 [ 21401 ] {} <Fatal> BaseDaemon: 9.1. inlined from ./contrib/llvm-project/libcxx/include/__thread/thread.h:159: void std::__thread_execute[abi:ne210105]<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*, 2ul>(std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>&, std::__tuple_indices<2ul>)
2026.03.05 14:12:40.947742 [ 21401 ] {} <Fatal> BaseDaemon: 9. ./contrib/llvm-project/libcxx/include/__thread/thread.h:168: void* std::__thread_proxy[abi:ne210105]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x00000000136bf97b
2026.03.05 14:12:40.947771 [ 21401 ] {} <Fatal> BaseDaemon: 10. ? @ 0x0000000000094ac3
2026.03.05 14:12:40.947777 [ 21401 ] {} <Fatal> BaseDaemon: 11. ? @ 0x0000000000126850
2026.03.05 14:12:40.947781 [ 21401 ] {} <Fatal> BaseDaemon: Integrity check of the executable skipped because the reference checksum could not be read.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions