-
Notifications
You must be signed in to change notification settings - Fork 796
Fix Groot2Publisher destructor infinite loop #1100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The destructor could hang indefinitely when the ZMQ server thread was waiting on recv() while active_server_ remained true. Changes: - Set active_server_=false before joining threads - Call zmq_context.shutdown() to interrupt blocking recv() - Add try-catch around ZMQ operations to handle context termination - Reorder destructor to remove hooks after threads are joined Includes tests for: - Destructor completion after exception - Destructor with multiple tree nodes attached - Rapid create/destroy cycles Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR fixes an infinite loop bug in Groot2Publisher's destructor by reordering the shutdown sequence: signaling thread stop, explicitly shutting down the ZMQ context to unblock recv(), joining threads, then removing hooks. Exception handling wraps recv() and send() calls in the server loop. Test coverage is expanded with scenarios for exception handling and rapid creation/destruction cycles. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1100 +/- ##
==============================
==============================
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/loggers/groot2_publisher.cpp`:
- Around line 272-283: The sendErrorReply lambda currently calls
reply_msg.send() without catching zmq::error_t, so during ZMQ context shutdown
an exception (ETERM) can escape and kill the thread; update sendErrorReply to
wrap the send call in a try/catch for zmq::error_t and silently handle/return on
ETERM (or log and return) to mirror the recv()/reply_msg.send() handling in the
main loop; apply the same pattern to the similar send path around the code
referenced at lines 510–518 so all zmq::multipart_t::send() calls are guarded
against ZMQ shutdown exceptions.
| try | ||
| { | ||
| continue; | ||
| if(!requestMsg.recv(socket) || requestMsg.size() == 0) | ||
| { | ||
| continue; | ||
| } | ||
| } | ||
| catch(const zmq::error_t&) | ||
| { | ||
| // Context was terminated or socket error - exit the loop | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
ZMQ zmq::multipart_t::send throw exception ETERM shutdown pattern
💡 Result:
zmq::multipart_t::send() ultimately calls zmq_send() on the underlying socket. If your process is shutting down and the context is shut down / terminated while another thread is sending, libzmq makes (almost) all further socket operations fail with ETERM, which cppzmq surfaces as zmq::error_t. This is expected behavior: context shutdown/termination forces any blocking (and many non-blocking) socket ops to return immediately with ETERM. [1], [2]
Recommended shutdown pattern (avoid “spurious” exceptions, exit cleanly)
- Signal your worker threads to stop producing/sending (atomic flag, cancellation token, etc.).
- Interrupt any threads blocked in
recv()/send()by callingctx.shutdown()(maps tozmq_ctx_shutdown()), which causes blocked operations to return withETERM. [1] - In every send/recv loop, catch
zmq::error_tand treatETERMas a normal exit condition. - Close sockets (often set
linger = 0first so close/term doesn’t hang waiting for pending sends). [3] - Finally terminate the context (
ctx.close()/zmq_ctx_term()), after sockets are closed. [2]
Minimal C++ example (cppzmq)
std::atomic_bool stopping{false};
void worker(zmq::context_t& ctx, zmq::socket_t& s) {
s.set(zmq::sockopt::linger, 0); // optional but common for fast shutdown [3]
while (!stopping.load(std::memory_order_relaxed)) {
try {
zmq::multipart_t mp;
// ... fill mp ...
mp.send(s); // may throw zmq::error_t(ETERM) during shutdown
} catch (const zmq::error_t& e) {
if (e.num() == ETERM) break; // context shutdown/term in progress -> exit loop [1][2]
throw;
}
}
s.close();
}
// elsewhere (shutdown path):
stopping = true;
ctx.shutdown(); // interrupts blocked ops with ETERM [1]
// join worker threads
ctx.close(); // terminates context; requires sockets closed; may block per linger [2][3]Key point: if you call ctx.shutdown() / ctx.close() before (or concurrently with) stopping your senders, you should expect multipart_t::send() to throw ETERM—handle it as a normal shutdown condition rather than an error. [1], [4]
Sources:
[1] zmq_ctx_shutdown(3) behavior and ETERM semantics https://libzmq.readthedocs.io/en/latest/zmq_ctx_shutdown.html
[2] zmq_ctx_term(3) termination steps and blocking/linger considerations https://libzmq.readthedocs.io/en/latest/zmq_ctx_term.html
[3] ZMQ_LINGER option and effect on close/termination https://api.zeromq.org/4-2:zmq-setsockopt
[4] Catching ETERM during shutdown (discussion/example) https://stackoverflow.com/questions/8022391/zeromq-socket-recv-throws-context-was-terminated-exception-why-and-how-to
🏁 Script executed:
wc -l src/loggers/groot2_publisher.cppRepository: BehaviorTree/BehaviorTree.CPP
Length of output: 109
🏁 Script executed:
sed -n '260,295p' src/loggers/groot2_publisher.cppRepository: BehaviorTree/BehaviorTree.CPP
Length of output: 965
🏁 Script executed:
sed -n '500,530p' src/loggers/groot2_publisher.cppRepository: BehaviorTree/BehaviorTree.CPP
Length of output: 792
🏁 Script executed:
sed -n '250,270p' src/loggers/groot2_publisher.cppRepository: BehaviorTree/BehaviorTree.CPP
Length of output: 566
🏁 Script executed:
sed -n '240,265p' src/loggers/groot2_publisher.cppRepository: BehaviorTree/BehaviorTree.CPP
Length of output: 588
🏁 Script executed:
sed -n '255,265p' src/loggers/groot2_publisher.cppRepository: BehaviorTree/BehaviorTree.CPP
Length of output: 353
Guard sendErrorReply against ZMQ shutdown exceptions.
The recv() and reply_msg.send() calls in the main loop (lines 272–283 and 510–518) are wrapped in try/catch for zmq::error_t, but the sendErrorReply lambda (lines 256–262) sends without exception handling. During context shutdown, zmq::multipart_t::send() throws zmq::error_t (with ETERM), which will terminate the thread if uncaught. Wrap the error reply send similarly to maintain consistent shutdown behavior:
Suggested fix
auto sendErrorReply = [&socket](const std::string& msg) {
zmq::multipart_t error_msg;
error_msg.addstr("error");
error_msg.addstr(msg);
+ try
+ {
error_msg.send(socket);
+ }
+ catch(const zmq::error_t&)
+ {
+ // Ignore errors during shutdown (e.g., ETERM)
+ }
};Also applies to: 510–518
🤖 Prompt for AI Agents
In `@src/loggers/groot2_publisher.cpp` around lines 272 - 283, The sendErrorReply
lambda currently calls reply_msg.send() without catching zmq::error_t, so during
ZMQ context shutdown an exception (ETERM) can escape and kill the thread; update
sendErrorReply to wrap the send call in a try/catch for zmq::error_t and
silently handle/return on ETERM (or log and return) to mirror the
recv()/reply_msg.send() handling in the main loop; apply the same pattern to the
similar send path around the code referenced at lines 510–518 so all
zmq::multipart_t::send() calls are guarded against ZMQ shutdown exceptions.



Summary
Groot2Publisherdestructor could hang indefinitelyrecv()whileactive_server_remained trueChanges
active_server_=falsebefore joining threadszmq_context.shutdown()to interrupt blockingrecv()Test plan
DestructorCompletesAfterException- verifies destructor completes even when tree throwsDestructorCompletesWithMultipleNodes- verifies cleanup with complex treesRapidCreateDestroy- verifies no hangs during rapid lifecycle🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests