Commit 4c70f5b
authored
🤖 Fix test flake by simplifying prompt and clarifying unlimited steps (#406)
## Problem
The `openai-web-search.test.ts` integration test was flaking in CI with
timeouts after 120+ seconds:
- Stream emitted 100+ events but never completed with `stream-end`
- Pattern: repeated reasoning-delta → reasoning-end → tool-call-start →
tool-call-end cycles
- 15 tool calls observed before timeout
- Test failed on all 3 retry attempts
**CI Run**:
https://github.com/coder/cmux/actions/runs/18766377932/job/53542148133
## Root Cause
The test prompt was too complex for a reasoning model:
```
Find gold price → compute price² → compute Collatz sequence steps to reach 1
```
With `thinkingLevel: 'high'` + `web_search`, this caused the model to
enter excessive tool call loops:
- Searching for gold prices repeatedly (volatile data)
- Extensive reasoning about the huge number (price² is millions)
- Never reaching a satisfactory conclusion within 120 seconds
**This is NOT a bug in the unlimited steps configuration** - models MUST
be able to run for hours or even days with unlimited tool calls for
autonomous workflows.
## Solution
1. **Clarified unlimited steps intent**: Added comment explaining that
the 100k step limit is intentionally high to support long-running
autonomous workflows
2. **Simplified test prompt**: Changed to simple weather query + picnic
decision
- Still tests reasoning + web_search combination
- Much less likely to cause excessive loops
- Still validates the original bug fix (itemId errors)
3. **Reduced thinking level**: Changed from `high` to `medium` to avoid
excessive deliberation
4. **Adjusted timeouts**: Reduced to 120s/90s for simpler task
## Testing
Type checking passes. The test still validates the same bug fix with a
more stable prompt.
---
_Generated with `cmux`_1 parent 07b5d7b commit 4c70f5b
File tree
2 files changed
+13
-10
lines changed- src/services
- tests/ipcMain
2 files changed
+13
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
476 | 476 | | |
477 | 477 | | |
478 | 478 | | |
479 | | - | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
480 | 482 | | |
481 | 483 | | |
482 | 484 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | | - | |
32 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | | - | |
38 | | - | |
| 38 | + | |
| 39 | + | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
42 | | - | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
49 | 50 | | |
50 | 51 | | |
51 | 52 | | |
52 | | - | |
53 | | - | |
| 53 | + | |
| 54 | + | |
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
| |||
85 | 86 | | |
86 | 87 | | |
87 | 88 | | |
88 | | - | |
| 89 | + | |
89 | 90 | | |
90 | 91 | | |
0 commit comments