Merged
Conversation
Fixes: http://tracker.ceph.com/issues/23966 Signed-off-by: Kefu Chai <kchai@redhat.com>
librados: block MgrClient::start_command until mgrmap Reviewed-by: Josh Durgin <jdurgin@redhat.com>
crush, osd: handle multiple parents properly when applying pg upmaps Reviewed-by: huangjun <hjwsm1989@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
Fixes: http://tracker.ceph.com/issues/23921 Signed-off-by: xiexingguo <xie.xingguo@gmail.com> (cherry picked from commit 482ec97)
pybind/ceph_daemon: expand the order of magnitude of daemonperf statistics to ZB Reviewed-by: Kefu Chai <kchai@redhat.com>
tools/rados: allow list objects in a specific pg in a pool Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Kefu Chai <kchai@redhat.com>
misc: mark constructors as explicit Reviewed-by: Jason Dillaman <dillaman@redhat.com> Reviewed-by: Kefu Chai <kchai@redhat.com>
Fixes: http://tracker.ceph.com/issues/24008 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* after upgrading to gcc-8, seems dpdk requires more settings to compile. in which, CONFIG_RTE_MAX_VFIO_GROUPS=64 is copied from config/common_base. * librte_bus_pci.a depends on librte_pci.a, so reorder the libraries in DPDK_LIBRARIES. Signed-off-by: Kefu Chai <kchai@redhat.com>
should catch polymorphic exceptions by reference. this silences warnings
like:
error: catching polymorphic type ‘class std::RetryException’ by value
[-Werror=catch-value=]
} catch (std::RetryException) {
Signed-off-by: Kefu Chai <kchai@redhat.com>
this silences warnings like: warning: ‘void* memset(void*, int, size_t)’ writing to an object of non-trivial type ‘struct uuid_d’; use assignment instead [-Wcl\ ass-memaccess] uuid_d only contains boost::uuids::uuid, which is "nil" initialized in uuid_d's ctor. so we don't need to bother with memset() it with 0. the same applies to entity_inst_t. Signed-off-by: Kefu Chai <kchai@redhat.com>
this change silences following error
stl_tree.h:452:21: error: static assertion failed: comparison object
must be invocable with two arguments of key type
static_assert(__is_invocable<_Compare&, const _Key&, const _Key&>{},
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
qa/tests: Added mimic runs, removed large suites (rados, rbd, etc) ru… Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
cmake,common,filestore: silence gcc-8 warnings/errors Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
prometheus: Expose number of degraded/misplaced/unfound objects Reveiwed-by: John Spray <john.spray@redhat.com> Reviewed-by: Jan Fajerski <jfajerski@suse.com>
prometheus: Fix prometheus shutdown/restart Reveiwed-by: John Spray <john.spray@redhat.com> Reviewed-by: Jan Fajerski <jfajerski@suse.com>
Gist of this commit is to eliminate use of PaxosFSMap::get_working_fsmap which allowed looking at the pending FSMap depending if it's the leader. Instead, pass the FSMap being worked on to helper functions. Also, ensure that we are only looking at the committed (i.e. current) FSMap in the preprocess_* service methods. Fixes: http://tracker.ceph.com/issues/23768 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
this change silences warnings like:
warning: ‘void* memset(void*, int, size_t)’ writing to an object of
non-trivial type ‘struct uuid_d’; use assignment instead [-Wcla
ss-memaccess]
memset(&z.cluster_fsid, 1, sizeof(z.cluster_fsid));
^
Signed-off-by: Kefu Chai <kchai@redhat.com>
this change silences following warning:
warning: unnecessary parentheses in declaration of ‘os’ [-Wparentheses]
std::ostringstream(os);
^
Signed-off-by: Kefu Chai <kchai@redhat.com>
this change silences following warnings:
: warning: ‘%s’ directive output may be truncated writing up to 127
bytes into a region of size between 109 and 119 [-W
format-truncation=]
snprintf(m_id_str, SysTestRunnable::ID_STR_SZ, "process_%d%s",
m_id, extra);
^~~~~~~~~~~~~~
~~~~~
Signed-off-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Li Wang <laurence.liwang@gmail.com>
Signed-off-by: Li Wang <laurence.liwang@gmail.com>
- After SSL support has been added to Ceph Dashboard via PR #21627, the developer HTTP proxy sample config must be adapted. - Adapt HACKING.rst to use HTTPS. Signed-off-by: Volker Theile <vtheile@suse.com>
test/strtol: add test case for parsing hex numbers Reviewed-by: Kefu Chai <kchai@redhat.com>
mgr/dashboard: Refactor perf counters Reviewed-by: Ricardo Marques <rimarques@suse.com> Reviewed-by: Tiago Melo <tmelo@suse.com>
- Do some polishing in the docs. - Refactor RgwClient::is_service_online() method. The system is considered as online if the response structure is valid. The response content itself is not validated in this case. - Relocate NoCredentialsException and derive it from RequestException. Signed-off-by: Volker Theile <vtheile@suse.com>
qa/test: Added rados, rbd and fs to run two time a week only
Credit to Yan Zheng for identifying the race condition [1]. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1572555#c7 Test-for: http://tracker.ceph.com/issues/24072 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Thanks to Yan Zheng for identifying and reviewing the fix. Fixes: http://tracker.ceph.com/issues/24087 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
doc: mimic rc1 release notes
* refs/pull/21966/head: mds: handle imported session race mds: check for session import race Reviewed-by: Greg Farnum <gfarnum@redhat.com> Reviewed-by: Zheng Yan <zyan@redhat.com>
qa/tests: added --filter-out="ubuntu_14.04"
tests/qa: trying $ distro mix Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Mgr/dashboard: Remove husky package Reviewed-by: Stephan Müller <smueller@suse.com> Reviewed-by: Volker Theile <vtheile@suse.com>
mgr/dashboard: Navbar dropdown button does not respond for mobile browsers Reviewed-by: Ricardo Marques <rimarques@suse.com>
mgr/dashboard: Fix unit tests cli warnings Reviewed-by: Ricardo Marques <rimarques@suse.com> Reviewed-by: Volker Theile <vtheile@suse.com>
mon,osd: dump "compression_algorithms" in "mon metadata" Reveiwed-by: John Spray <john.spray@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
doc/releases/mimic: upgrade steps Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Revert "common: add an async SharedMutex" Reviewed-by: Adam C. Emerson <aemerson@redhat.com> Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
* refs/pull/21842/head: MDSMonitor: clean up use of pending fsmap in uncommitted ops MDSMonitor: refactor last_beacons to use mono_clock mds: refactor MDSMap init mds: refactor FSMap init mds: refactor Filesystem init mds: move compat set methods to MDSMap Reviewed-by: João Eduardo Luís <joao@suse.de>
Fixed typo and slightly reworked the description of the mgr dashboard in the "Major changes" section. Updated the description in the "Notable changes" section as well by adding a note about creating an SSL certificate. Signed-off-by: Lenz Grimmer <lgrimmer@suse.com>
doc/releases/mimic: Updated dashboard description
* refs/pull/21924/head: align lspools output in file-layouts and manual-freebsd-deployment Fixed configuration setting standby_replay Fixed the order of configuration Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
emmericp
pushed a commit
that referenced
this pull request
Jul 2, 2019
I am seeing this trace, which matches except for the
'fun:_ZN15AsyncConnection7processEv' frame.
<error>
<unique>0x2399</unique>
<tid>11</tid>
<threadname>msgr-worker-1</threadname>
<kind>UninitCondition</kind>
<what>Conditional jump or move depends on uninitialised value(s)</what>
<stack>
<frame>
<ip>0x5366B18</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v14_2_0::list&&, unsigned int)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>crypto_onwire.cc</file>
<line>274</line>
</frame>
<frame>
<ip>0x5355E60</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr<ceph::buffer::v14_2_0::ptr_node, ceph::buffer::v14_2_0::ptr_node::disposer>&&, int)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>1311</line>
</frame>
<frame>
<ip>0x533E2A3</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ProtocolV2::run_continuation(Ct<ProtocolV2>&)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>45</line>
</frame>
<frame>
<ip>0x534FB1C</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ProtocolV2::reuse_connection(boost::intrusive_ptr<AsyncConnection> const&, ProtocolV2*)::{lambda(ConnectedSocket&)#3}::operator()(ConnectedSocket&)::{lambda()#2}::operator()()</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>2739</line>
</frame>
<frame>
<ip>0x534FF57</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ProtocolV2::reuse_connection(boost::intrusive_ptr<AsyncConnection> const&, ProtocolV2*)::{lambda(ConnectedSocket&)#3}::operator()(ConnectedSocket&)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>2745</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>__invoke_impl<void, ProtocolV2::reuse_connection(const AsyncConnectionRef&, ProtocolV2*)::<lambda(ConnectedSocket&)>&, ConnectedSocket&></fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
<file>invoke.h</file>
<line>60</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>__invoke<ProtocolV2::reuse_connection(const AsyncConnectionRef&, ProtocolV2*)::<lambda(ConnectedSocket&)>&, ConnectedSocket&></fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
<file>invoke.h</file>
<line>95</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>__call<void, 0></fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
<file>functional</file>
<line>400</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>operator()<></fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
<file>functional</file>
<line>484</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>EventCenter::C_submit_event<std::_Bind<ProtocolV2::reuse_connection(boost::intrusive_ptr<AsyncConnection> const&, ProtocolV2*)::{lambda(ConnectedSocket&)#3} (ConnectedSocket)> >::do_request(unsigned long)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>Event.h</file>
<line>227</line>
</frame>
<frame>
<ip>0x535FCD6</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>Event.cc</file>
<line>441</line>
</frame>
<frame>
<ip>0x5365086</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>operator()</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>Stack.cc</file>
<line>53</line>
</frame>
<frame>
<ip>0x5365086</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>std::_Function_handler<void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}>::_M_invoke(std::_Any_data const&)</fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
<file>std_function.h</file>
<line>297</line>
</frame>
<frame>
<ip>0x55F519E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>execute_native_thread_routine</fn>
</frame>
<frame>
<ip>0x1076BDD4</ip>
<obj>/usr/lib64/libpthread-2.17.so</obj>
<fn>start_thread</fn>
</frame>
<frame>
<ip>0x118E0EAC</ip>
<obj>/usr/lib64/libc-2.17.so</obj>
<fn>clone</fn>
</frame>
</stack>
</error>
Signed-off-by: Sage Weil <sage@redhat.com>
emmericp
pushed a commit
that referenced
this pull request
Jul 2, 2019
We just took the curmap ref above; do not call get_osdmap() again. I think it may explain a weird segv I saw here in ~shared_ptr, although I'm not quite certain. Regardless, this change is correct and better. (gdb) bt #0 raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00005596e5a98261 in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:326 #2 handle_fatal_signal(int) () at ./src/global/signal_handler.cc:326 #3 <signal handler called> #4 0x00005596f4fe80e0 in ?? () #5 0x00005596e5464068 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5596f4b7cf60) at /usr/include/c++/9/bits/shared_ptr_base.h:148 #6 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5596f4b7cf60) at /usr/include/c++/9/bits/shared_ptr_base.h:148 #7 0x00005596e543377f in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f2b25044e28, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169 ceph#8 std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f2b25044e20, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169 ceph#9 std::shared_ptr<OSDMap const>::~shared_ptr (this=0x7f2b25044e20, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103 ceph#10 OSD::handle_osd_ping(MOSDPing*) () at ./src/osd/OSD.cc:4662 Signed-off-by: Sage Weil <sage@redhat.com>
emmericp
pushed a commit
that referenced
this pull request
Aug 26, 2019
I am seeing this trace, which matches except for the
'fun:_ZN15AsyncConnection7processEv' frame.
<error>
<unique>0x2399</unique>
<tid>11</tid>
<threadname>msgr-worker-1</threadname>
<kind>UninitCondition</kind>
<what>Conditional jump or move depends on uninitialised value(s)</what>
<stack>
<frame>
<ip>0x5366B18</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v14_2_0::list&&, unsigned int)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>crypto_onwire.cc</file>
<line>274</line>
</frame>
<frame>
<ip>0x5355E60</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr<ceph::buffer::v14_2_0::ptr_node, ceph::buffer::v14_2_0::ptr_node::disposer>&&, int)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>1311</line>
</frame>
<frame>
<ip>0x533E2A3</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ProtocolV2::run_continuation(Ct<ProtocolV2>&)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>45</line>
</frame>
<frame>
<ip>0x534FB1C</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ProtocolV2::reuse_connection(boost::intrusive_ptr<AsyncConnection> const&, ProtocolV2*)::{lambda(ConnectedSocket&)#3}::operator()(ConnectedSocket&)::{lambda()#2}::operator()()</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>2739</line>
</frame>
<frame>
<ip>0x534FF57</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>ProtocolV2::reuse_connection(boost::intrusive_ptr<AsyncConnection> const&, ProtocolV2*)::{lambda(ConnectedSocket&)#3}::operator()(ConnectedSocket&)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>ProtocolV2.cc</file>
<line>2745</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>__invoke_impl<void, ProtocolV2::reuse_connection(const AsyncConnectionRef&, ProtocolV2*)::<lambda(ConnectedSocket&)>&, ConnectedSocket&></fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
<file>invoke.h</file>
<line>60</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>__invoke<ProtocolV2::reuse_connection(const AsyncConnectionRef&, ProtocolV2*)::<lambda(ConnectedSocket&)>&, ConnectedSocket&></fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
<file>invoke.h</file>
<line>95</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>__call<void, 0></fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
<file>functional</file>
<line>400</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>operator()<></fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
<file>functional</file>
<line>484</line>
</frame>
<frame>
<ip>0x535001E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>EventCenter::C_submit_event<std::_Bind<ProtocolV2::reuse_connection(boost::intrusive_ptr<AsyncConnection> const&, ProtocolV2*)::{lambda(ConnectedSocket&)#3} (ConnectedSocket)> >::do_request(unsigned long)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>Event.h</file>
<line>227</line>
</frame>
<frame>
<ip>0x535FCD6</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>Event.cc</file>
<line>441</line>
</frame>
<frame>
<ip>0x5365086</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>operator()</fn>
<dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
<file>Stack.cc</file>
<line>53</line>
</frame>
<frame>
<ip>0x5365086</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>std::_Function_handler<void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}>::_M_invoke(std::_Any_data const&)</fn>
<dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
<file>std_function.h</file>
<line>297</line>
</frame>
<frame>
<ip>0x55F519E</ip>
<obj>/usr/lib64/ceph/libceph-common.so.0</obj>
<fn>execute_native_thread_routine</fn>
</frame>
<frame>
<ip>0x1076BDD4</ip>
<obj>/usr/lib64/libpthread-2.17.so</obj>
<fn>start_thread</fn>
</frame>
<frame>
<ip>0x118E0EAC</ip>
<obj>/usr/lib64/libc-2.17.so</obj>
<fn>clone</fn>
</frame>
</stack>
</error>
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f019fc0)
ifed01
pushed a commit
that referenced
this pull request
Dec 6, 2025
…ives
Add suppression rules for two categories of false positive warnings
encountered during ASan-enabled testing:
1. PyModule_ExecDef memory leaks: ASan incorrectly interprets Python's
module loading behavior as memory leaks when the interpreter loads
extension modules.
2. __cxa_throw interception failures: ASan's interceptor cannot properly
intercept exception handling when libstdc++.so is loaded after the
ASan shared library, causing CHECK failures.
3. ErasureCodePluginRegistry::load:
`ceph::ErasureCodePluginRegistry::load()` is known to leak, as we
don't free the memory allocated by the ec plugins which are
registered in the `ErasureCodePluginRegistry` singleton. this is a
known issue, but since the `ErasureCodePluginRegistry` instance is a
singleton. we can live with it. in this change, we add the rule to
suppress the leak report from LeakSanitizer. this rule also exist in
qa/valgrind.supp.
All warnings are confirmed false positives that should be suppressed
to reduce noise in test output.
Example warnings:
```
Direct leak of 3264 byte(s) in 1 object(s) allocated from:
#0 0x7f6027d20cb5 in malloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x7f60277557ad (/usr/lib/libpython3.13.so.1.0+0x1557ad) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
#2 0x7f6027756067 (/usr/lib/libpython3.13.so.1.0+0x156067) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
#3 0x7f60278471a0 (/usr/lib/libpython3.13.so.1.0+0x2471a0) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
#4 0x7f602774d031 (/usr/lib/libpython3.13.so.1.0+0x14d031) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
#5 0x7b60234093bb in __Pyx_modinit_type_init_code.constprop.0 /home/kefu/dev/ceph/build/src/pybind/rados/rados.c:82066
#6 0x7b602340a826 in __pyx_pymod_exec_rados /home/kefu/dev/ceph/build/src/pybind/rados/rados.c:82755
#7 0x7f6027856777 in PyModule_ExecDef (/usr/lib/libpython3.13.so.1.0+0x256777) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#8 0x7f602785baa3 (/usr/lib/libpython3.13.so.1.0+0x25baa3) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#9 0x7f6027793df2 (/usr/lib/libpython3.13.so.1.0+0x193df2) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#10 0x7f6027777cbe in _PyEval_EvalFrameDefault (/usr/lib/libpython3.13.so.1.0+0x177cbe) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#11 0x7f60277957de (/usr/lib/libpython3.13.so.1.0+0x1957de) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#12 0x7f60277d11b9 in PyObject_CallMethodObjArgs (/usr/lib/libpython3.13.so.1.0+0x1d11b9) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#13 0x7f60277d0ee4 in PyImport_ImportModuleLevelObject (/usr/lib/libpython3.13.so.1.0+0x1d0ee4) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#14 0x7f6027779c0c in _PyEval_EvalFrameDefault (/usr/lib/libpython3.13.so.1.0+0x179c0c) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#15 0x7f602784e2c8 in PyEval_EvalCode (/usr/lib/libpython3.13.so.1.0+0x24e2c8) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#16 0x7f602788c88b (/usr/lib/libpython3.13.so.1.0+0x28c88b) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#17 0x7f602788985c (/usr/lib/libpython3.13.so.1.0+0x28985c) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#18 0x7f6027886f57 (/usr/lib/libpython3.13.so.1.0+0x286f57) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#19 0x7f6027886211 (/usr/lib/libpython3.13.so.1.0+0x286211) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#20 0x7f6027885b82 (/usr/lib/libpython3.13.so.1.0+0x285b82) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#21 0x7f6027883e50 in Py_RunMain (/usr/lib/libpython3.13.so.1.0+0x283e50) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#22 0x7f602783bbea in Py_BytesMain (/usr/lib/libpython3.13.so.1.0+0x23bbea) (BuildId: bea05fc2c8bd66145b159f10dcd810ebe813af39)
ceph#23 0x7f6027227674 (/usr/lib/libc.so.6+0x27674) (BuildId: 4fe011c94a88e8aeb6f2201b9eb369f42b4a1e9e)
ceph#24 0x7f6027227728 in __libc_start_main (/usr/lib/libc.so.6+0x27728) (BuildId: 4fe011c94a88e8aeb6f2201b9eb369f42b4a1e9e)
ceph#25 0x55dae17e6044 in _start (/usr/bin/python3.13+0x1044) (BuildId: 8c0dc848f5b978c56ebeb07255bb332b4b37ae4e)
```
```
AddressSanitizer: CHECK failed: asan_interceptors.cpp:335 "((__interception::real___cxa_throw)) != (0)" (0x0, 0x0) (tid=3246455)
#0 0x7f345ea81979 in CheckUnwind ../../../../src/libsanitizer/asan/asan_rtl.cpp:69
#1 0x7f345eaa790d in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86
#2 0x7f345e9e1d54 in __interceptor___cxa_throw ../../../../src/libsanitizer/asan/asan_interceptors.cpp:335
#3 0x7f345e9e1d54 in __interceptor___cxa_throw ../../../../src/libsanitizer/asan/asan_interceptors.cpp:334
#4 0x7f3458623def in void boost::throw_exception<boost::bad_lexical_cast>(boost::bad_lexical_cast const&) /opt/ceph/include/boost/throw_exception.hpp:165
#5 0x7f345997ad3b in void boost::conversion::detail::throw_bad_cast<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long>() /opt/ceph/include/boost/lexical_cast/bad_lexical_cast.hpp:93
#6 0x7f3459979d35 in unsigned long boost::lexical_cast<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /opt/ceph/include/boost/lexical_cast.hpp:43`
```
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
ifed01
pushed a commit
that referenced
this pull request
Dec 6, 2025
The static std::map max_prio_map was defined in the osd_types.h header
file, causing every translation unit that included this header to get
its own copy of the variable. This led to One Definition Rule (ODR)
violations where multiple instances of the same variable existed at
runtime.
During program cleanup, destructors for these multiple instances would
attempt to free the same memory regions, resulting in segmentation
faults in tcmalloc/memory allocator as seen with ceph-dencoder.
This issue surfaced after a yet-merged-change which converts erasure_code
and json_spirit to OBJECT libraries. Before that change, these were
STATIC libraries that were linked via target_link_libraries. The
incorrect linkage meant their object files (and thus their copies of
max_prio_map) were kept separate and didn't conflict at runtime.
After converting to OBJECT libraries and properly incorporating them
into libceph-common.so (commit 8b0e3fb2c23), the multiple copies of
max_prio_map from different translation units all ended up in the same
shared library, exposing the ODR violation. During program exit, the
dynamic linker attempted to run destructors for all instances, leading
to double-free crashes.
Fix by moving the map into a static helper function in PeeringState.cc
(the only file that uses it). The map is now a function-local static
const variable, ensuring a single instance that is properly initialized
and destructed.
Backtrace before fix:
```
#0 0x00007ffff7dbb1a0 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /lib/x86_64-linux-gnu/libtcmalloc.so.4
#1 0x00007ffff7dbb57f in tcmalloc::ThreadCache::Scavenge() () from /lib/x86_64-linux-gnu/libtcmalloc.so.4
#2 0x00007ffff6bc8aa2 in std::__new_allocator<std::_Rb_tree_node<std::pair<int const, int> > >::deallocate (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890, __n=1)
#3 0x00007ffff6bc89f9 in std::allocator<std::_Rb_tree_node<std::pair<int const, int> > >::deallocate (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890, __n=1)
#4 std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<int const, int> > > >::deallocate (__a=..., __p=0x555555f43890, __n=1)
#5 std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_put_node (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890)
#6 0x00007ffff6bc892e in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_drop_node (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890)
#7 0x00007ffff6bc886e in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_erase (this=0x7ffff7d48f78 <max_prio_map>, __x=0x555555f43890)
ceph#8 0x00007ffff6bc8854 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_erase (this=0x7ffff7d48f78 <max_prio_map>, __x=0x555555f43cb0)
ceph#9 0x00007ffff6bc8854 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_erase (this=0x7ffff7d48f78 <max_prio_map>, __x=0x555555f43ad0)
ceph#10 0x00007ffff6bc8805 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::~_Rb_tree (this=0x7ffff7d48f78 <max_prio_map>)
ceph#11 0x00007ffff6bc7345 in std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >::~map (this=0x7ffff7d48f78 <max_prio_map>)
ceph#12 0x00007ffff484bd51 in __cxa_finalize (d=0x7ffff7d3f440) at ./stdlib/cxa_finalize.c:97
ceph#13 0x00007ffff6af9487 in __do_global_dtors_aux () from /home/kefu/dev/ceph/build/lib/libceph-common.so.2
ceph#14 0x00007ffff7fbfd20 in ?? ()
ceph#15 0x00007ffff7fc8fc2 in _dl_call_fini (closure_map=0x7fffffffd0f0, closure_map@entry=0x7ffff7fbfd20) at ./elf/dl-call_fini.c:43
ceph#16 0x00007ffff7fcbe72 in _dl_fini () at ./elf/dl-fini.c:120
ceph#17 0x00007ffff484c291 in __run_exit_handlers (status=0, listp=0x7ffff49f1680 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:118
ceph#18 0x00007ffff484c35a in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:148
ceph#19 0x00007ffff4833caf in __libc_start_call_main (main=main@entry=0x55555556cd90 <main(int, char const**)>, argc=argc@entry=2, argv=argv@entry=0x7fffffffd488) at ../sysdeps/nptl/libc_start_call_main.h:74
ceph#20 0x00007ffff4833d65 in __libc_start_main_impl (main=0x55555556cd90 <main(int, char const**)>, argc=2, argv=0x7fffffffd488, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd478) at ../csu/libc-start.c:360
ceph#21 0x00005555555695e1 in _start ()
```
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
sajibreadd-croit
pushed a commit
that referenced
this pull request
Dec 18, 2025
See comment:
```
//TODO: should be changed to return future<> once all calls
// to refresh are through co_await. We return LBAMapping
// for now to avoid mandating the callers to make sure
// the life of the lba mapping survives the refresh.
```
For now introduce co_refresh and mark the existing refresh as
deprecated. Following work will audit all the existing users of
refresh and move them to the new method. This change is not trivial
so I prefer to follow up on this as a separate PR.
This should help avoiding UAR in suspension points:
```
==103588==ERROR: AddressSanitizer: stack-use-after-return on address 0xffff80197e90 at pc 0xaaaacb941b24 bp 0xffff7e48dd80 sp 0xffff7e48dd78
READ of size 8 at 0xffff80197e90 thread T1
#0 0xaaaacb941b20 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::swap(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:172:18
#1 0xaaaacb941998 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::operator=(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:93:61
#2 0xaaaacb933758 in crimson::os::seastore::LBAMapping::operator=(crimson::os::seastore::LBAMapping&&) /ceph/src/crimson/os/seastore/lba_mapping.h:46:48
#3 0xaaaacde2fa54 in ... crimson::os::seastore::LBAMapping&&, std::array<crimson::os::seastore::LBAManager::remap_entry_t, 1ul>) (.resume) /ceph/src/crimson/os/seastore/transaction_manager.h:1282:11
```
Deprecate is commented out since otherwise make check would fail.
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
sajibreadd-croit
pushed a commit
that referenced
this pull request
Feb 11, 2026
Fix memory leaks detected by AddressSanitizer in unittest_dbstore_tests.
The test was failing with ASan enabled due to SQLObjectOp objects not
being properly cleaned up.
ASan reported the following leaks:
Direct leak of 200 byte(s) in 1 object(s) allocated from:
#0 operator new(unsigned long)
#1 SQLGetBucket::Execute(DoutPrefixProvider const*, rgw::store::DBOpParams*)
/src/rgw/driver/dbstore/sqlite/sqliteDB.cc:1689
#2 rgw::store::DB::ProcessOp(DoutPrefixProvider const*, ...)
/src/rgw/driver/dbstore/common/dbstore.cc:258
Direct leak of 200 byte(s) in 1 object(s) allocated from:
#0 operator new(unsigned long)
#1 SQLInsertBucket::Execute(DoutPrefixProvider const*, rgw::store::DBOpParams*)
/src/rgw/driver/dbstore/sqlite/sqliteDB.cc:1433
#2 rgw::store::DB::ProcessOp(DoutPrefixProvider const*, ...)
/src/rgw/driver/dbstore/common/dbstore.cc:258
SUMMARY: AddressSanitizer: 460550 byte(s) leaked in 1823 allocation(s).
Root cause: The DB::Destroy() method had an early return when the db
pointer was NULL, preventing cleanup of the objectmap which stores
SQLObjectOp pointers. These objects were allocated during test execution
but never freed.
Changes:
- Modified DB::Destroy() to always clean up objectmap even when db is NULL
- Added explicit delete in objectmapDelete() for consistency
- Added lsan suppression for SQLite internal allocations (indirect leaks)
After the fix, all direct leaks are eliminated. Only indirect leaks from
SQLite's internal memory management remain, which are now suppressed.
Test results:
- Before: 460,550 bytes leaked (including 2 direct leaks of 200 bytes each)
- After: 0 direct leaks, unittest_dbstore_tests passes with ASan
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
sajibreadd-croit
pushed a commit
that referenced
this pull request
Feb 11, 2026
The ExtentMap.reshard_failure test was leaking memory by not properly
cleaning up the OnodeCacheShard and BufferCacheShard objects it created.
ASan reported:
Direct leak of 9928 byte(s) in 1 object(s) allocated from:
#1 BlueStore::OnodeCacheShard::create() BlueStore.cc:1221
#2 ExtentMap_reshard_failure_Test::TestBody() test_bluestore_types.cc:1244
Direct leak of 224 byte(s) in 1 object(s) allocated from:
#1 BlueStore::BufferCacheShard::create() BlueStore.cc:1680
#2 ExtentMap_reshard_failure_Test::TestBody() test_bluestore_types.cc:1246
SUMMARY: AddressSanitizer: 10288 byte(s) leaked in 8 allocation(s).
Fix by:
1. Wrapping coll and onode in an additional scope block to ensure they
are destroyed before the cache shards (releasing all blob references)
2. Adding proper cleanup with delete bc and delete oc at test end
This matches the cleanup pattern used in BlueStoreFixture::TearDown().
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
sajibreadd-croit
pushed a commit
that referenced
this pull request
Feb 11, 2026
…yed static" ``` Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: AddressSanitizer:DEADLYSIGNAL Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ================================================================= Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ==3==ERROR: AddressSanitizer: stack-overflow on address 0x7b512f6c8dd8 (pc 0x0000046e7a72 bp 0x7b512de7c900 sp 0x7b512f6c8dd8 T0) Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #0 0x0000046e7a72 in get_global_options() (/usr/bin/ceph-osd-crimson+0x46e7a72) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a) Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #1 0x0000046e540e in build_options() (/usr/bin/ceph-osd-crimson+0x46e540e) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a) Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #2 0x0000033b7949 in get_ceph_options() (/usr/bin/ceph-osd-crimson+0x33b7949) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a) Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #3 0x000003440540 in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) (/usr/bin/ceph-osd-crimson+0x3440540) (BuildId: 2a860> Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #4 0x0000046856a8 in crimson::common::ConfigProxy::ConfigProxy(EntityName const&, std::basic_string_view<char, std::char_traits<char> >) (/usr> Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #5 0x000000eb6cb5 in seastar::shared_ptr_count_for<crimson::common::ConfigProxy>::shared_ptr_count_for<EntityName&, std::__cxx11::basic_string> .. Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ceph#40 0x000000ed6434 in seastar::future<int> seastar::futurize<int>::apply<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::ope> Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ceph#41 0x000000ed672b in seastar::async<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::operator()() const::{lambda()#1}>(seast> ``` This reverts commit 1ab0a8c. Fixes: https://tracker.ceph.com/issues/74481 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
sajibreadd-croit
pushed a commit
that referenced
this pull request
Feb 11, 2026
gcm_cipher_internal() and ossl_gcm_stream_final() make it to the stack
trace only on CentOS Stream 9. On Ubuntu 22.04 and Rocky 10, it looks
as follows:
Thread 4 msgr-worker-1:
Conditional jump or move depends on uninitialised value(s)
at 0x70A36D4: ??? (in /usr/lib64/libcrypto.so.3.2.2)
by 0x70A39A1: ??? (in /usr/lib64/libcrypto.so.3.2.2)
by 0x6F8A09C: EVP_DecryptFinal_ex (in /usr/lib64/libcrypto.so.3.2.2)
by 0xB498C1F: ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&) (crypto_onwire.cc:271)
by 0xB4992D7: ceph::msgr::v2::FrameAssembler::disassemble_preamble(ceph::buffer::v15_2_0::list&) (frames_v2.cc:281)
by 0xB482D98: ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) (ProtocolV2.cc:1149)
by 0xB475318: ProtocolV2::run_continuation(Ct<ProtocolV2>&) (ProtocolV2.cc:54)
by 0xB457012: AsyncConnection::process() (AsyncConnection.cc:495)
by 0xB49E61A: EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) (Event.cc:492)
by 0xB49EA9D: UnknownInlinedFun (Stack.cc:50)
by 0xB49EA9D: UnknownInlinedFun (invoke.h:61)
by 0xB49EA9D: UnknownInlinedFun (invoke.h:111)
by 0xB49EA9D: std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:290)
by 0xBB11063: ??? (in /usr/lib64/libstdc++.so.6.0.33)
by 0x4F17119: start_thread (in /usr/lib64/libc.so.6)
The proposal to amend the existing suppression so that it's tied to the
specific callsite rather than libcrypto internals [1] received a thumbs
up from Radoslaw.
[1] ceph#61689 (comment)
Fixes: https://tracker.ceph.com/issues/74672
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
sajibreadd-croit
pushed a commit
that referenced
this pull request
Feb 11, 2026
Fix memory leak detected by AddressSanitizer in unittest_http_manager.
The test was failing with ASan enabled due to rgw_http_req_data objects
not being properly cleaned up when the HTTP manager thread exits.
ASan reported the following leaks:
Direct leak of 17152 byte(s) in 32 object(s) allocated from:
#0 operator new(unsigned long)
#1 RGWHTTPManager::add_request(RGWHTTPClient*)
/ceph/src/rgw/rgw_http_client.cc:946:33
#2 HTTPManager_SignalThread_Test::TestBody()
/ceph/src/test/rgw/test_http_manager.cc:132:10
Indirect leak of 768 byte(s) in 32 object(s) allocated from:
#0 operator new(unsigned long)
#1 rgw_http_req_data::rgw_http_req_data()
/ceph/src/rgw/rgw_http_client.cc:52:22
#2 RGWHTTPManager::add_request(RGWHTTPClient*)
/ceph/src/rgw/rgw_http_client.cc:946:37
SUMMARY: AddressSanitizer: 17920 byte(s) leaked in 64 allocation(s).
Root cause: The rgw_http_req_data class uses reference counting
(inherits from RefCountedObject). When a request is unregistered,
unregister_request() calls get() to increment the refcount, expecting
a corresponding put() to be called later.
In manage_pending_requests(), unregistered requests are properly
handled with both _unlink_request() and put(). However, in the thread
cleanup code (reqs_thread_entry exit path), only _unlink_request() was
called without the matching put(), causing a reference count leak.
The fix adds the missing put() call in the thread cleanup code to match
the reference counting pattern used in manage_pending_requests().
Test results:
- Before: 17,920 bytes leaked in 64 allocations
- After: 0 leaks, unittest_http_manager passes with ASan
Fixes: https://tracker.ceph.com/issues/74762
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
sajibreadd-croit
pushed a commit
that referenced
this pull request
Feb 16, 2026
This commit fixes a critical cache key collision bug in the ISA erasure
code plugin that could lead to heap-buffer-overflow and silent data
corruption.
Problem:
--------
The decoding table cache was indexed only by matrix type and erasure
signature (available/missing chunk pattern), but did NOT include the
(k,m) erasure code configuration parameters. This caused different EC
configurations with similar erasure patterns to collide in the cache,
leading to incorrectly-sized cached buffers being reused.
AddressSanitizer Report:
------------------------
==4904==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x5160001397b8 at pc 0x5de8e415296b bp 0x7ffc82260310 sp 0x7ffc8225fad0
READ of size 576 at 0x5160001397b8 thread T0
#0 __asan_memcpy
#1 ErasureCodeIsaTableCache::getDecodingTableFromCache()
.../ErasureCodeIsaTableCache.cc:260:5
#2 ErasureCodeIsaDefault::isa_decode()
.../ErasureCodeIsa.cc:490:15
0x5160001397b8 is located 0 bytes after 568-byte region
[0x516000139580,0x5160001397b8) allocated by:
#0 posix_memalign
#1 ceph::buffer::raw_combined::alloc_data_n_controlblock()
#2 ErasureCodeIsaTableCache::putDecodingTableToCache()
.../ErasureCodeIsaTableCache.cc:319:18
Root Cause:
-----------
Scenario illustrating the bug:
1. First decode operation: k=2, m=1, erasure pattern "+0+2-1"
- Creates cache entry with key "+0+2-1"
- Allocates buffer: 2*(1+2)*32 = 192 bytes
2. Second decode operation: k=3, m=3, same erasure pattern "+0+2-1"
- Looks up cache with key "+0+2-1" → COLLISION
- Retrieves 192-byte buffer but needs 3*(3+3)*32 = 576 bytes
- Result: Heap-buffer-overflow (reads 384 bytes beyond allocation)
Worse scenario (silent corruption):
1. First decode: k=3, m=3 → caches 576-byte table
2. Second decode: k=2, m=1 → retrieves wrong table
- Uses incorrect decoding matrix
- Result: Silent data corruption during recovery
Solution:
---------
Include k and m parameters in cache signature
- Old format: "+0+2+3-1-4"
- New format: "k3m2a+0+2+3e-1-4"
Test Fix:
---------
Also fixes a buffer overflow in TestErasureCodePlugins.cc where
hashes_bl offset was calculated using chunk_size instead of sizeof(uint32_t),
causing reads beyond the CRC buffer.
Production Impact:
------------------
Backward Compatibility: FULLY COMPATIBLE
- Cache is ephemeral (in-memory only, not persisted)
- Cache cleared on process restart
- Rolling upgrades safe - each OSD restart gets fixed code
- Old cache entries automatically invalidated on upgrade
- No wire protocol or on-disk format changes
- No configuration changes required
- No breaking changes
Data Integrity:
- Eliminates silent data corruption risk
- Eliminates heap-buffer-overflow crashes
- Cache now correctly isolated by (k,m) configuration
- Correct decoding tables always used for recovery
- No risk of corrupting user data from the fix itself
Why Users Haven't Complained:
------------------------------
Several factors likely prevented widespread reports:
1. Low probability conditions required:
- Need multiple EC pools with DIFFERENT (k,m) configurations
- Need similar erasure patterns across pools
- Need cache collision to occur during actual recovery operations
- Recovery operations are relatively rare in healthy clusters
2. Crash vs silent corruption detection:
- Buffer overflows (easier to detect) occur when k2,m2 > k1,m1
- Silent corruption (harder to detect) occurs when k2,m2 < k1,m1
- Crashes might be attributed to other causes
- Data corruption only detected during scrub or data verification
3. Common deployment patterns:
- Many deployments use single EC configuration cluster-wide
- Default EC configurations (k=2,m=1 or k=4,m=2) reduce collision space
- Erasure pattern variety may be insufficient for collisions
4. ISA plugin usage:
- Not universally deployed (requires Intel ISA-L library)
- Some sites use jerasure plugin instead
- Plugin selection depends on hardware and configuration
5. Detection difficulty:
- ASan not enabled in production builds
- Silent corruption only appears during:
* Degraded reads with recovery
* Scrub operations
* Deep-scrub verification
- Corrupted data might not be immediately accessed
Fixes: https://tracker.ceph.com/issues/74382
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.