fix(tests): stop LCM thread leak in cross-wall planning tests#2068
Conversation
`StatsMonitor` (enabled by `global_config(dtop=True)`) created an `LCMResourceLogger` whose `pLCMTransport` spun up an LCM handler thread that was never stopped, so `monitor_threads` reported a leaked `_lcm_loop` thread after each cross-wall test. - Add `stop()` to the `ResourceLogger` protocol; `LCMResourceLogger` shuts its transport down, `StructlogResourceLogger` no-ops. - `StatsMonitor.stop()` now also calls `self._logger.stop()`. - Rewrite `run_cross_wall_test` to drive LCM via `loop.add_reader(lcm.fileno(), ...)` and `asyncio.sleep` instead of a background thread, removing the test-local thread plumbing.
`StatsMonitor.stop()` now calls `self._logger.stop()`. The test's inline `CapturingLogger` didn't implement it, breaking `test_collect_stats` with `AttributeError`. Add a no-op `stop()`.
Greptile SummaryThis PR fixes LCM thread leaks in cross-wall planning tests by replacing a daemon thread polling loop with
Confidence Score: 5/5Safe to merge — the asyncio reader pattern is correct for single-process sequential tests, the transport thread leak is properly closed, and removing parallel process execution eliminates the LCM multicast collision root cause. All shared mutable state in conftest.py is now accessed exclusively on the asyncio event loop thread, so dropping the lock is correct. The stop() wiring through the ResourceLogger protocol is complete across all implementations. conftest.py — worth verifying that no future pytest-asyncio configuration introduces a running event loop that would conflict with asyncio.run() in run_cross_wall_test. Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as run_cross_wall_test
participant AR as asyncio.run
participant EL as event loop
participant LCM as lcm fd reader
participant Coord as ModuleCoordinator
Test->>AR: call asyncio.run(_run_cross_wall_test)
AR->>EL: create and run event loop
EL->>Coord: ModuleCoordinator.build(blueprint)
EL->>EL: add_reader(lcm_fd, _on_lcm_readable)
EL->>EL: await asyncio.sleep (yields control)
EL->>LCM: fd readable - calls _on_lcm_readable
LCM->>LCM: lcm.handle - dispatches _odom_handler
LCM-->>EL: odom_count / robot_x / robot_y updated
EL->>EL: assert odom received and goals reached
EL->>EL: finally: remove_reader, lcm.unsubscribe
EL->>Coord: coordinator.stop
AR-->>Test: return or propagate AssertionError
Reviews (6): Last reviewed commit: "-" | Re-trigger Greptile |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
When `cyclonedds` is installed (via the `dds` or `unitree-dds` extra) on a box that has ROS 2 Jazzy, importing the C extension fails with `libddsc.so.0: cannot open shared object file` because the wheel's loader resolves the SONAME against `/opt/ros/jazzy/lib/libddsc.so` but the actual `.so.0` lives in the multiarch dir. Document the LD_LIBRARY_PATH / setup.bash workarounds.
Re-applies the cross-wall thread-leak fix that was reverted: without `StatsMonitor.stop()` calling `self._logger.stop()`, the PickleLCM thread inside `LCMResourceLogger`'s `pLCMTransport` lives past test teardown and `monitor_threads` flags it as `_lcm_loop` leaked. - `ResourceLogger` protocol regains `stop()`. - `StructlogResourceLogger.stop()` is a no-op; `LCMResourceLogger.stop()` stops the transport. - `StatsMonitor.stop()` calls `self._logger.stop()` after joining its loop thread. - Test's `CapturingLogger` gets a no-op `stop()` to satisfy the protocol.
Merging origin/main pulled in PR #2068's fix for the leaked _lcm_loop thread on teardown. The simple/far variants of cross-wall planning also gained pytest.mark.self_hosted in that commit. Add the same marker to the rtab variant for consistency. Verified: 1 passed in 68.45s, no teardown error.
Problem
pytest-slow failing
Solution
async instead of threads