Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

@ammar-agent ammar-agent commented Dec 1, 2025

Summary

Fixes flaky integration tests in runtimeFileEditing.test.ts that were timing out waiting for Anthropic API responses.

Root Cause

CI runners are slower than local dev machines due to:

  • Shared VMs with less CPU/memory
  • Higher network latency to Anthropic API
  • No prompt cache benefit (Anthropic cache requires 2048+ tokens, our test prompts are ~200-500 tokens)
  • 4 concurrent tests × 2 runtime types = 8 parallel API calls

Changes

  • Increased stream timeout: 15s → 30s (local), 25s → 45s (SSH)
  • Increased test timeout: 25s → 45s (local), 60s → 90s (SSH)
  • Added configureTestRetries(3) to handle occasional API hiccups

Why not switch models or use 1h cache TTL?

  • Tried codex-mini but it struggles with file editing tool calls
  • Anthropic's 1h cache TTL won't help - requires 2048+ token minimum, our prompts are too short

Generated with mux

The file editing integration tests were flaky due to timeout waiting for
Anthropic Claude Haiku API responses (15s timeout). Switching to OpenAI's
gpt-5.1-codex-mini model which is faster and more reliable for these
simple file editing tasks.

Also fixed the helper constant name from GPT_5_MINI_MODEL to CODEX_MINI_MODEL
with the correct model identifier (gpt-5.1-codex-mini).

_Generated with mux_
@ammar-agent ammar-agent force-pushed the fix-flaky-file-edit-test branch from 820a5f2 to 82ba697 Compare December 1, 2025 18:42
@ammar-agent ammar-agent changed the title 🤖 fix: de-flake file editing tests by switching to codex-mini 🤖 fix: de-flake file editing tests with increased timeouts Dec 1, 2025
@ammario ammario merged commit e87340f into main Dec 1, 2025
13 checks passed
@ammario ammario deleted the fix-flaky-file-edit-test branch December 1, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants