#89 - Make Manager::shutdown() idempotent to prevent post-finalize log crash by RamyaGuru · Pull Request #90 · NVIDIA/daqiri

RamyaGuru · 2026-05-20T20:22:38Z

Fixes #89. `Manager::shutdown()` was being invoked twice in the bench lifecycle, and the second call's top `DAQIRI_LOG_INFO` crashed because spdlog's default logger had already been destroyed by `__cxa_finalize`.

Adds a 1-line idempotency guard (`if (!initialized_) return;`) at the top of `RdmaMgr::shutdown()`, `DpdkMgr::shutdown()`, and `SocketMgr::shutdown()`. Same log-first body-second pattern in all three managers, same fix.
`DpdkMgr`'s existing `num_init` reference-counted body is preserved — the guard only activates after the body has cleared `initialized_` on the final shutdown.

Test plan

A/B verified on a clean fix-branch base (off `origin/main`):
- With fix → `daqiri_bench_rdma --mode both` exits 0.
- Without fix (reset to `origin/main`) → bench segfaults during `__cxa_finalize` (exit 139). Confirms the fix is necessary on plain main, not just under the perf-report branch's bench changes.
DPDK / Socket benches still shut down cleanly (no regression — `num_init` ref counting in DpdkMgr preserved).

Discovered while tackling PR #72.

…g crash Every Manager backend's shutdown() begins with a DAQIRI_LOG_INFO call, and is invoked twice during the typical bench lifecycle: 1. Explicitly from main() via daqiri::shutdown(). 2. Again from the manager's destructor (or, for SocketMgr in RoCE mode, a destructor cascade into RdmaMgr::shutdown()) during C++ __cxa_finalize. By the time the destructor cascade fires, spdlog's default logger -- a function-local static created lazily on the first DAQIRI_LOG_INFO -- has already been destroyed. The DAQIRI_LOG_INFO at the top of the second shutdown() call then crashes inside spdlog::sink_it_. Repro on DGX Spark: daqiri_bench_rdma --mode both against examples/daqiri_bench_rdma_tx_rx_spark.yaml segfaults immediately after the legitimate shutdown completes. Backtrace shows __cxa_finalize -> ~SocketMgr -> SocketMgr::shutdown -> RdmaMgr::shutdown -> daqiri::log_formatted_message -> spdlog::logger::log_ -> spdlog::logger::sink_it_ -> SIGSEGV. Fix: short-circuit shutdown() on subsequent calls by returning early when initialized_ is false. Applied symmetrically to RdmaMgr, DpdkMgr, and SocketMgr -- the log-first body-second pattern is identical in all three. DpdkMgr's existing num_init reference-counted body is preserved; the guard only activates after the body has cleared initialized_ in the final shutdown. Verified by repeated daqiri_bench_rdma --mode both runs from a bash-parent shell. Pre-fix: SIGSEGV 100% reproducible. Post-fix: clean exit 0, all destructor markers run in order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: rgurunathan <rgurunathan@nvidia.com>

greptile-apps · 2026-05-20T20:57:38Z

Greptile Summary

This PR adds a one-line idempotency guard to DpdkMgr::shutdown(), RdmaMgr::shutdown(), and SocketMgr::shutdown() to prevent a segfault that occurred when __cxa_finalize invoked the destructor chain a second time after spdlog's default logger had already been torn down.

DpdkMgr / RdmaMgr receive a simple if (!initialized_) { return; } guard, consistent with each manager's existing practice of clearing initialized_ at the end of the shutdown body.
SocketMgr uses a dual-flag guard (if (!initialized_ && !running_.load()) { return; }) to preserve the init-failure cleanup path: initialize() sets running_=true before setup and initialized_=true only on success, so a throw mid-setup calls shutdown() with initialized_=false, running_=true — a state that must fall through to join any partially-spawned threads, avoiding std::terminate in the destructor cascade.

Confidence Score: 5/5

Safe to merge — three small, well-isolated guards that each fire only after the corresponding manager has fully torn itself down.

The change is minimal: one guard per manager, each guarding on the flag that the shutdown body already clears. The SocketMgr variant correctly uses a dual-flag check to preserve the init-failure cleanup path, as confirmed by reading initialize(). The ref-counted DpdkMgr body is untouched. The previous outside-diff concern about a dead secondary guard has been resolved by this commit's tightened condition and removal of the old guard position.

No files require special attention.

Important Files Changed

Filename	Overview
src/managers/dpdk/daqiri_dpdk_mgr.cpp	Adds `if (!initialized_) { return; }` guard before the first log call; `initialized_` is only cleared when the ref-count `num_init` reaches zero, so the guard does not interfere with normal ref-counted shutdown sequences.
src/managers/rdma/daqiri_rdma_mgr.cpp	Adds `if (!initialized_) { return; }` guard at the top; `initialized_` is cleared at the end of the shutdown body, so a post-finalize re-entry exits immediately without touching the destroyed spdlog logger.
src/managers/socket/daqiri_socket_mgr.cpp	Guard moved from below the RoCE branch to the top of the function, and strengthened to `if (!initialized_ && !running_.load()) { return; }` so that the init-failure cleanup path (running_=true, initialized_=false) still falls through and joins any partially-spawned threads.

Sequence Diagram

sequenceDiagram
    participant App
    participant Manager
    participant spdlog

    App->>Manager: explicit shutdown()
    Note over Manager: guard: initialized_=true → pass
    Manager->>spdlog: DAQIRI_LOG_INFO(...)
    Manager->>Manager: cleanup body
    Manager->>Manager: "initialized_ = false"

    Note over spdlog: __cxa_finalize destroys default logger

    Manager->>Manager: ~Manager() → shutdown()
    Note over Manager: guard: initialized_=false → return immediately
    Note over spdlog: (no log call — crash prevented)

_{Reviews (2): Last reviewed commit: "#89 - Tighten SocketMgr::shutdown() guar..." | Re-trigger Greptile}

…eanup Greptile review of the original idempotency commit flagged the secondary `if (!initialized_ && !running_.load()) { return; }` in SocketMgr::shutdown() as having a dead `!initialized_` clause, since the new top-of-function guard `if (!initialized_) { return; }` already covers that state. Investigation surfaced the deeper concern: the top guard was too aggressive. SocketMgr::initialize() sets initialized_=false and running_=true before running setup, then sets initialized_=true on success. If setup_tcp_endpoint or setup_udp_endpoint throws after spawning an accept_thread or io_thread, the catch-block shutdown() call entered with initialized_=false and running_=true. Under the original top guard the cleanup body was skipped, leaving the worker threads joinable on the EndpointState — the destructor cascade would then std::terminate on an unjoined std::thread. Tighten the top guard to require both flags cleared. The post-shutdown re-entry from __cxa_finalize still fires (both flags cleared at the end of the body) while the init-failure cleanup path (running_=true) falls through and joins its threads. The pre-existing secondary check is now fully redundant and removed. DpdkMgr and RdmaMgr keep the simpler `if (!initialized_) { return; }` — neither has an init-failure shutdown() caller, so the asymmetry is intentional and isolated to the manager whose initialize() partially spawns threads before setting initialized_=true. Verified manually with both the existing DPDK / socket-udp / socket-tcp normal-shutdown smokes and a new 2-endpoint UDP init-failure repro (malformed remote IP on endpoint 2 → parse_ipv4_addr throws after endpoint 1's io_thread is spawned): rc=1, no SIGSEGV / SIGABRT / "terminate called" in stderr. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: rgurunathan <rgurunathan@nvidia.com>

RamyaGuru marked this pull request as ready for review May 20, 2026 20:54

RamyaGuru merged commit 261d62f into main May 20, 2026
1 check passed

RamyaGuru deleted the fix-manager-shutdown-idempotency branch May 20, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#89 - Make Manager::shutdown() idempotent to prevent post-finalize log crash#90

#89 - Make Manager::shutdown() idempotent to prevent post-finalize log crash#90
RamyaGuru merged 2 commits into
mainfrom
fix-manager-shutdown-idempotency

RamyaGuru commented May 20, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 20, 2026 •

edited

Loading

Greptile Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RamyaGuru commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

greptile-apps Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RamyaGuru commented May 20, 2026 •

edited

Loading

greptile-apps Bot commented May 20, 2026 •

edited

Loading