You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prevent long-running or stuck ros2 subprocesses from blocking overall graph collection by applying a per-command timeout.
Surface and propagate failures from topic and service list commands instead of silently continuing on partial data.
Make the integration test script more robust, debuggable, and tolerant of transient timing issues while removing the CI integration job from the sanity workflow.
Description
Added a graphCommandTimeout variable and a runGraphCommand helper that runs each ros2 invocation with a per-command timeout via WithTimeout.
Updated CollectSystemGraph to call runGraphCommand for node/topic/service commands and to return errors when topic list or service list fail.
Added unit tests graph_errors_test.go and graph_timeout_test.go that validate error propagation and that per-command timeouts are used by CollectSystemGraph using mapGraphRunner and delayedGraphRunner test runners.
Hardened tests/integration.sh with: dynamic container naming, stricter cleanup, debug logging on failure, explicit waits for nodes and topics, retries and JSON validation for gim graph, and retries for gim analyze output checks.
Removed the integration job from .github/workflows/sanity.yml so CI only runs the sanity job.
Testing
Ran the new package unit tests with go test ./gimble-ros/internal/ros -v which executed graph_errors_test.go and graph_timeout_test.go and they passed.
The updated tests/integration.sh improvements were exercised manually during development to validate behavior under transient container startup timing.
Removing the integration job here leaves ./tests/integration.sh unreferenced by CI; I checked the current workflow set (.github/workflows/sanity.yml and .github/workflows/publish-apt.yml) and neither runs it now. That drops automated coverage of the Docker/ROS path (gim ... --container), so container-specific regressions can merge undetected even though this commit hardens that script.
Reintroduce an overall timeout for graph collection
This per-command timeout wrapper creates a fresh deadline for every ROS call, so CollectSystemGraph no longer has a single end-to-end cap and can run for many sequential 10s waits (for example, one hung ros2 node info per node). With callers like the CLI using a broad parent timeout, this can turn graph/analyze operations into long stalls instead of failing fast as before the shared timeout was removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
ros2subprocesses from blocking overall graph collection by applying a per-command timeout.integrationjob from the sanity workflow.Description
graphCommandTimeoutvariable and arunGraphCommandhelper that runs eachros2invocation with a per-command timeout viaWithTimeout.CollectSystemGraphto callrunGraphCommandfor node/topic/service commands and to return errors whentopic listorservice listfail.graph_errors_test.goandgraph_timeout_test.gothat validate error propagation and that per-command timeouts are used byCollectSystemGraphusingmapGraphRunneranddelayedGraphRunnertest runners.tests/integration.shwith: dynamic container naming, stricter cleanup, debug logging on failure, explicit waits for nodes and topics, retries and JSON validation forgim graph, and retries forgim analyzeoutput checks.integrationjob from.github/workflows/sanity.ymlso CI only runs thesanityjob.Testing
go test ./gimble-ros/internal/ros -vwhich executedgraph_errors_test.goandgraph_timeout_test.goand they passed.tests/integration.shimprovements were exercised manually during development to validate behavior under transient container startup timing.Codex Task