Skip to content

Conversation

@facontidavide
Copy link
Collaborator

@facontidavide facontidavide commented Feb 3, 2026

Summary

Changes

  • Set active_server_=false before joining threads
  • Call zmq_context.shutdown() to interrupt blocking recv()
  • Add try-catch around ZMQ operations to handle context termination gracefully
  • Reorder destructor to remove hooks after threads are joined

Test plan

  • Added test DestructorCompletesAfterException - verifies destructor completes even when tree throws
  • Added test DestructorCompletesWithMultipleNodes - verifies cleanup with complex trees
  • Added test RapidCreateDestroy - verifies no hangs during rapid lifecycle
  • All 337 tests pass

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced shutdown robustness with improved thread synchronization and error handling during cleanup to prevent potential race conditions.
  • Tests

    • Added comprehensive test coverage for destructor behavior and rapid create/destroy scenarios under error conditions.

The destructor could hang indefinitely when the ZMQ server thread
was waiting on recv() while active_server_ remained true.

Changes:
- Set active_server_=false before joining threads
- Call zmq_context.shutdown() to interrupt blocking recv()
- Add try-catch around ZMQ operations to handle context termination
- Reorder destructor to remove hooks after threads are joined

Includes tests for:
- Destructor completion after exception
- Destructor with multiple tree nodes attached
- Rapid create/destroy cycles

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

The PR fixes an infinite loop bug in Groot2Publisher's destructor by reordering the shutdown sequence: signaling thread stop, explicitly shutting down the ZMQ context to unblock recv(), joining threads, then removing hooks. Exception handling wraps recv() and send() calls in the server loop. Test coverage is expanded with scenarios for exception handling and rapid creation/destruction cycles.

Changes

Cohort / File(s) Summary
Groot2Publisher Shutdown Fix
src/loggers/groot2_publisher.cpp
Reordered destructor shutdown sequence to signal threads stop, shutdown ZMQ context, join threads, then remove hooks. Added try/catch blocks for zmq::error_t in serverLoop for recv() and send() calls to handle context/socket termination gracefully.
Test Coverage for Exception Scenarios
tests/gtest_groot2_publisher.cpp
Added three new test cases (DestructorCompletesAfterException, DestructorCompletesWithMultipleNodes, RapidCreateDestroy) and XML fixture to verify destructor robustness, exception propagation, and synchronization under repeated creation/destruction and exception conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 A loop that spun round and round,
No shutdown in sight to be found—
Till ZMQ's context went sleep,
And threads took their exit so deep,
Now destruction completes safe and sound!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main fix: resolving a destructor infinite loop issue in Groot2Publisher.
Description check ✅ Passed The description provides a clear summary with specific changes, explicitly references issue #1057, includes a detailed test plan with three tests, and confirms all 337 tests pass.
Linked Issues check ✅ Passed The PR directly addresses #1057 by implementing the necessary fixes: setting active_server_=false, calling zmq_context.shutdown(), adding try-catch blocks, and reordering destructor cleanup. All objectives from the issue are met.
Out of Scope Changes check ✅ Passed All changes are focused on fixing the infinite loop issue in Groot2Publisher destructor and adding tests to verify the fix. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/1057-groot2-publisher-infinite-loop

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% (target: -1.00%) 50.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (2c71b41) 5395 3755 69.60%
Head commit (ac925b0) 5402 (+7) 3760 (+5) 69.60% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#1100) 10 5 50.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@facontidavide facontidavide merged commit fcb95c8 into master Feb 3, 2026
17 of 18 checks passed
@facontidavide facontidavide deleted the fix/1057-groot2-publisher-infinite-loop branch February 3, 2026 08:17
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 3, 2026

@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.00%. Comparing base (2c71b41) to head (ac925b0).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@      Coverage Diff       @@
##   master   #1100   +/-   ##
==============================
==============================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/loggers/groot2_publisher.cpp`:
- Around line 272-283: The sendErrorReply lambda currently calls
reply_msg.send() without catching zmq::error_t, so during ZMQ context shutdown
an exception (ETERM) can escape and kill the thread; update sendErrorReply to
wrap the send call in a try/catch for zmq::error_t and silently handle/return on
ETERM (or log and return) to mirror the recv()/reply_msg.send() handling in the
main loop; apply the same pattern to the similar send path around the code
referenced at lines 510–518 so all zmq::multipart_t::send() calls are guarded
against ZMQ shutdown exceptions.

Comment on lines +272 to 283
try
{
continue;
if(!requestMsg.recv(socket) || requestMsg.size() == 0)
{
continue;
}
}
catch(const zmq::error_t&)
{
// Context was terminated or socket error - exit the loop
break;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

ZMQ zmq::multipart_t::send throw exception ETERM shutdown pattern

💡 Result:

zmq::multipart_t::send() ultimately calls zmq_send() on the underlying socket. If your process is shutting down and the context is shut down / terminated while another thread is sending, libzmq makes (almost) all further socket operations fail with ETERM, which cppzmq surfaces as zmq::error_t. This is expected behavior: context shutdown/termination forces any blocking (and many non-blocking) socket ops to return immediately with ETERM. [1], [2]

Recommended shutdown pattern (avoid “spurious” exceptions, exit cleanly)

  1. Signal your worker threads to stop producing/sending (atomic flag, cancellation token, etc.).
  2. Interrupt any threads blocked in recv() / send() by calling ctx.shutdown() (maps to zmq_ctx_shutdown()), which causes blocked operations to return with ETERM. [1]
  3. In every send/recv loop, catch zmq::error_t and treat ETERM as a normal exit condition.
  4. Close sockets (often set linger = 0 first so close/term doesn’t hang waiting for pending sends). [3]
  5. Finally terminate the context (ctx.close() / zmq_ctx_term()), after sockets are closed. [2]

Minimal C++ example (cppzmq)

std::atomic_bool stopping{false};

void worker(zmq::context_t& ctx, zmq::socket_t& s) {
  s.set(zmq::sockopt::linger, 0); // optional but common for fast shutdown [3]

  while (!stopping.load(std::memory_order_relaxed)) {
    try {
      zmq::multipart_t mp;
      // ... fill mp ...
      mp.send(s);                 // may throw zmq::error_t(ETERM) during shutdown
    } catch (const zmq::error_t& e) {
      if (e.num() == ETERM) break; // context shutdown/term in progress -> exit loop [1][2]
      throw;
    }
  }

  s.close();
}

// elsewhere (shutdown path):
stopping = true;
ctx.shutdown();  // interrupts blocked ops with ETERM [1]
// join worker threads
ctx.close();     // terminates context; requires sockets closed; may block per linger [2][3]

Key point: if you call ctx.shutdown() / ctx.close() before (or concurrently with) stopping your senders, you should expect multipart_t::send() to throw ETERM—handle it as a normal shutdown condition rather than an error. [1], [4]

Sources:
[1] zmq_ctx_shutdown(3) behavior and ETERM semantics https://libzmq.readthedocs.io/en/latest/zmq_ctx_shutdown.html
[2] zmq_ctx_term(3) termination steps and blocking/linger considerations https://libzmq.readthedocs.io/en/latest/zmq_ctx_term.html
[3] ZMQ_LINGER option and effect on close/termination https://api.zeromq.org/4-2:zmq-setsockopt
[4] Catching ETERM during shutdown (discussion/example) https://stackoverflow.com/questions/8022391/zeromq-socket-recv-throws-context-was-terminated-exception-why-and-how-to


🏁 Script executed:

wc -l src/loggers/groot2_publisher.cpp

Repository: BehaviorTree/BehaviorTree.CPP

Length of output: 109


🏁 Script executed:

sed -n '260,295p' src/loggers/groot2_publisher.cpp

Repository: BehaviorTree/BehaviorTree.CPP

Length of output: 965


🏁 Script executed:

sed -n '500,530p' src/loggers/groot2_publisher.cpp

Repository: BehaviorTree/BehaviorTree.CPP

Length of output: 792


🏁 Script executed:

sed -n '250,270p' src/loggers/groot2_publisher.cpp

Repository: BehaviorTree/BehaviorTree.CPP

Length of output: 566


🏁 Script executed:

sed -n '240,265p' src/loggers/groot2_publisher.cpp

Repository: BehaviorTree/BehaviorTree.CPP

Length of output: 588


🏁 Script executed:

sed -n '255,265p' src/loggers/groot2_publisher.cpp

Repository: BehaviorTree/BehaviorTree.CPP

Length of output: 353


Guard sendErrorReply against ZMQ shutdown exceptions.

The recv() and reply_msg.send() calls in the main loop (lines 272–283 and 510–518) are wrapped in try/catch for zmq::error_t, but the sendErrorReply lambda (lines 256–262) sends without exception handling. During context shutdown, zmq::multipart_t::send() throws zmq::error_t (with ETERM), which will terminate the thread if uncaught. Wrap the error reply send similarly to maintain consistent shutdown behavior:

Suggested fix
  auto sendErrorReply = [&socket](const std::string& msg) {
    zmq::multipart_t error_msg;
    error_msg.addstr("error");
    error_msg.addstr(msg);
+   try
+   {
      error_msg.send(socket);
+   }
+   catch(const zmq::error_t&)
+   {
+     // Ignore errors during shutdown (e.g., ETERM)
+   }
  };

Also applies to: 510–518

🤖 Prompt for AI Agents
In `@src/loggers/groot2_publisher.cpp` around lines 272 - 283, The sendErrorReply
lambda currently calls reply_msg.send() without catching zmq::error_t, so during
ZMQ context shutdown an exception (ETERM) can escape and kill the thread; update
sendErrorReply to wrap the send call in a try/catch for zmq::error_t and
silently handle/return on ETERM (or log and return) to mirror the
recv()/reply_msg.send() handling in the main loop; apply the same pattern to the
similar send path around the code referenced at lines 510–518 so all
zmq::multipart_t::send() calls are guarded against ZMQ shutdown exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Groot2Publisher enters infinite loop when exception is thrown

2 participants