Skip to content

fix: stop router before NATS in integration test cleanup#557

Closed
rkoster wants to merge 1 commit intodevelopfrom
fix/integration-test-cleanup-order
Closed

fix: stop router before NATS in integration test cleanup#557
rkoster wants to merge 1 commit intodevelopfrom
fix/integration-test-cleanup-order

Conversation

@rkoster
Copy link
Copy Markdown

@rkoster rkoster commented Apr 20, 2026

Summary

Fixes port binding conflicts in parallel integration test runs by correcting the cleanup order in StopAndCleanup().

Problem

When running integration tests in parallel (--nodes=7), tests were intermittently failing with port binding errors:

listen tcp :40303: bind: address already in use
listen tcp 127.0.0.1:42529: bind: address already in use

The root cause was the cleanup order in common_integration_test.go:

  1. NATS server stopped first
  2. GoRouter session terminated second

When NATS stops before the router, the subscriber's ClosedCB callback fires log.Fatalos.Exit(1), killing the test process prematurely before ports are fully released. In parallel runs, this causes subsequent tests to fail when attempting to bind to the same ports.

Solution

Reversed the cleanup order:

  1. Terminate GoRouter session first
  2. Stop NATS server second

This allows the router and its subscriber to disconnect gracefully before NATS shuts down.

Related

This follows the same pattern as the fix in commit b2bf830 (from PR #555) which resolved identical cleanup ordering issues in router/router_test.go, but that fix was not applied to the integration test suite.

Testing

This change affects all integration tests using testState.StopAndCleanup(). The fix should reduce or eliminate flaky port conflict failures in CI when tests run in parallel.

This prevents the subscriber's ClosedCB from firing log.Fatal when
NATS is stopped first, which was causing the test process to exit
prematurely and leading to port binding conflicts in parallel test
runs.

The cleanup order is now:
1. Terminate gorouter session
2. Stop NATS server
3. Clean up test files

This matches the fix from upstream PR #555 (commit b2bf830) which
resolved similar issues in router/router_test.go.
@github-project-automation github-project-automation Bot moved this from Inbox to Pending Merge | Prioritized in Application Runtime Platform Working Group Apr 20, 2026
hoffmaen added a commit to sap-contributions/routing-release that referenced this pull request Apr 20, 2026
Fixes cleanup order in StopAndCleanup(), main_test.go AfterEach, and
nats_test.go AfterEach. Stopping NATS first causes the subscriber's
ClosedCB to fire log.Fatal → os.Exit(1), killing the test proc.
Adopts the same fix as PR cloudfoundry#557 and extends it to all integration
test files.
@hoffmaen
Copy link
Copy Markdown
Contributor

The fix for the stopping order is also part of #558. Can we close this one in favor of #558?

@rkoster rkoster closed this Apr 22, 2026
@github-project-automation github-project-automation Bot moved this from Pending Merge | Prioritized to Done in Application Runtime Platform Working Group Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants