-
Notifications
You must be signed in to change notification settings - Fork 14
🤖 fix: include dist/ in terminal-bench archive to fix worker crash #507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The terminal-bench agent was crashing immediately on startup because: 1. Archive only packaged src/ but not dist/ 2. Setup script never ran build 3. Worker threads need dist/utils/main/tokenizer.worker.js 4. Missing worker caused all 17+ tasks to timeout after 30min Fix: Add 'dist' to _INCLUDE_PATHS so pre-built worker files are included. The workflow already runs 'make build' during CI setup, so dist/ exists and just needs to be packaged. This adds no per-task overhead.
Simplifications: - Removed redundant validation (env vars already have defaults) - Used walrus operator for cleaner conditionals - Inlined single-use methods (_build_archive) - Removed unnecessary checks (Path truthy, buffer.seek(0)) - Simplified bash conditionals and error paths - Eliminated ensure_bun() wrapper function - Condensed git repo initialization logic Net: -76 lines (38 insertions, 114 deletions) across 4 files
ammar-agent
added a commit
that referenced
this pull request
Nov 6, 2025
PR #507 added dist/ to the terminal-bench archive include paths, but the workflow wasn't building dist/ before running the benchmark. This caused all tasks to fail immediately with "Required file .../dist missing". Now runs `make build` before `make benchmark-terminal` to ensure dist/ exists and contains the compiled worker files.
ammar-agent
added a commit
that referenced
this pull request
Nov 6, 2025
PR #507 added `dist/` to the terminal-bench archive include paths to fix worker crashes. However, the workflow wasn't building `dist/` before running the benchmark, causing all tasks to fail immediately with: ``` Error running agent for task <name>: Required file /home/runner/work/cmux/cmux/dist missing ``` Now runs `make build` before `make benchmark-terminal` to ensure dist/ exists and contains the compiled worker files. Verified with workflow run #19140594821 which successfully completed the modernize-fortran-build task.
ammario
pushed a commit
that referenced
this pull request
Nov 6, 2025
## Problem PR #507 added `dist/` to the terminal-bench archive include paths to fix worker crashes. However, the workflow wasn't building `dist/` before running the benchmark, causing all tasks to fail immediately with: ``` Error running agent for task <name>: Required file /home/runner/work/cmux/cmux/dist missing ``` ## Solution Add `make build` step before `make benchmark-terminal` in the workflow. This ensures: - `dist/` directory exists - Compiled JavaScript including worker files are present - Archive creation succeeds ## Testing Verified with workflow run #19140594821 which successfully completed the modernize-fortran-build task: - Task resolved: ✅ true - Agent ran successfully (not just immediate exit) - No worker crashes _Generated with `cmux`_
This was referenced Nov 7, 2025
ammario
pushed a commit
that referenced
this pull request
Nov 7, 2025
## Problem Terminal-bench was failing with all tasks showing 0 input/output tokens because the agent was exiting immediately after receiving the user message, without making any API calls. **Symptoms:** - Latest nightly runs (Nov 5-7): All 80 tasks failed with `agent_timeout` - Agent ran for only ~45 seconds then exited - `total_input_tokens: 0`, `total_output_tokens: 0` - Stream started with `caught-up` and `user` message, but no `stream-delta` or `stream-end` events **Root cause:** The `agentSessionCli.ts` reads the user message from stdin via a pipe: ```bash printf '%s' "$instruction" | bun src/debug/agentSessionCli.ts ... ``` Once stdin reaches EOF and is consumed, Bun detects no other active handles keeping the event loop alive and exits the process, **even though async work (API streaming) is still pending**. ## Solution Add an explicit keepalive interval that ensures the process stays alive until `main()` completes. The interval runs far into the future (1000 seconds) but gets cleared in the finally block once the agent session finishes. ## Testing **Before fix:** - Run #19173435224: 1 task, 0 tokens, ~2 min total (agent ran 45s) - Agent exited immediately after user message **After fix:** - Run #19173548174: 1 task, **resolved: true**, ~7 min total (agent ran 3m17s) - 22 tool calls made - Stream-delta events present - Agent completed successfully ## Related - Fixes nightly terminal-bench failures from Nov 5-7 - Related to PR #507 (dist/ in archive) and PR #513 (build step in workflow) _Generated with `cmux`_
ibetitsmike
pushed a commit
that referenced
this pull request
Nov 7, 2025
## Problem Terminal-bench was failing with all tasks showing 0 input/output tokens because the agent was exiting immediately after receiving the user message, without making any API calls. **Symptoms:** - Latest nightly runs (Nov 5-7): All 80 tasks failed with `agent_timeout` - Agent ran for only ~45 seconds then exited - `total_input_tokens: 0`, `total_output_tokens: 0` - Stream started with `caught-up` and `user` message, but no `stream-delta` or `stream-end` events **Root cause:** The `agentSessionCli.ts` reads the user message from stdin via a pipe: ```bash printf '%s' "$instruction" | bun src/debug/agentSessionCli.ts ... ``` Once stdin reaches EOF and is consumed, Bun detects no other active handles keeping the event loop alive and exits the process, **even though async work (API streaming) is still pending**. ## Solution Add an explicit keepalive interval that ensures the process stays alive until `main()` completes. The interval runs far into the future (1000 seconds) but gets cleared in the finally block once the agent session finishes. ## Testing **Before fix:** - Run #19173435224: 1 task, 0 tokens, ~2 min total (agent ran 45s) - Agent exited immediately after user message **After fix:** - Run #19173548174: 1 task, **resolved: true**, ~7 min total (agent ran 3m17s) - 22 tool calls made - Stream-delta events present - Agent completed successfully ## Related - Fixes nightly terminal-bench failures from Nov 5-7 - Related to PR #507 (dist/ in archive) and PR #513 (build step in workflow) _Generated with `cmux`_
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The nightly terminal-bench runs have been timing out (3 hours) with all tasks failing due to agent timeouts. Investigation revealed the agent crashes immediately on startup.
Root Cause
After downloading artifacts from the failed run and examining logs, found:
The agent packaging only included source files (
src/) but not built files (dist/). Worker threads need the compiledtokenizer.worker.jswhich doesn't exist.Solution
Add
"dist"to_INCLUDE_PATHSincmux_agent.py.The CI workflow already runs
make buildduring setup, sodist/exists and just needs to be packaged. This adds zero per-task overhead - no additional build step required.Bonus: Code Simplification
While investigating, simplified the terminal-bench code by -11.5% LoC (-76 lines):
Impact
Generated with
cmux