Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

@ammar-agent ammar-agent commented Nov 24, 2025

Fixes

  1. Fixed SSH Container Leaks: ssh-fixture.ts now properly cleans up containers on initialization failure.
  2. Removed Global Retries: Removed retries from 9 tests; kept only for flaky openai-web-search and ollama.

Context

Investigation found 100+ leaked containers on the CI box causing timeouts. Blanket retries were masking the resource exhaustion issue.

- Remove retries from 9 integration tests that don't need them
- Keep retries only for truly flaky tests (openai-web-search, ollama)
- Fix SSH fixture to cleanup containers on failure
- Root cause: 100+ leaked containers on CI box causing resource exhaustion

The integration test timeout issue was caused by accumulated container
leaks on the CI box. Each failed test setup left a container running,
eventually exhausting system resources.

Changes:
1. SSH fixture now tracks containerId and stops it on any error
2. Removed blanket retries from most tests - retries mask real issues
3. Kept retries only for external-service-dependent tests

_Generated with `mux`_
@ammario ammario merged commit d307930 into main Nov 24, 2025
14 checks passed
@ammario ammario deleted the debug-integration-test-timeout branch November 24, 2025 01:43
ammar-agent added a commit that referenced this pull request Nov 24, 2025
## Fixes

1. **Fixed SSH Container Leaks**: `ssh-fixture.ts` now properly cleans
up containers on initialization failure.
2. **Removed Global Retries**: Removed retries from 9 tests; kept only
for flaky `openai-web-search` and `ollama`.

## Context

Investigation found 100+ leaked containers on the CI box causing
timeouts. Blanket retries were masking the resource exhaustion issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants