🤖 fix: build dist/ before running terminal-bench #513

ammar-agent · 2025-11-06T15:28:57Z

Problem

PR #507 added dist/ to the terminal-bench archive include paths to fix worker crashes. However, the workflow wasn't building dist/ before running the benchmark, causing all tasks to fail immediately with:

Error running agent for task <name>: Required file /home/runner/work/cmux/cmux/dist missing

Solution

Add make build step before make benchmark-terminal in the workflow. This ensures:

dist/ directory exists
Compiled JavaScript including worker files are present
Archive creation succeeds

Testing

Verified with workflow run #19140594821 which successfully completed the modernize-fortran-build task:

Task resolved: ✅ true
Agent ran successfully (not just immediate exit)
No worker crashes

Generated with cmux

PR #507 added `dist/` to the terminal-bench archive include paths to fix worker crashes. However, the workflow wasn't building `dist/` before running the benchmark, causing all tasks to fail immediately with: ``` Error running agent for task <name>: Required file /home/runner/work/cmux/cmux/dist missing ``` Now runs `make build` before `make benchmark-terminal` to ensure dist/ exists and contains the compiled worker files. Verified with workflow run #19140594821 which successfully completed the modernize-fortran-build task.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

.github/workflows/terminal-bench.yml

Icons aren't needed for terminal-bench, and building them requires ImageMagick. Build only the essential JavaScript bundles needed for the benchmark.

The 'should not hang on commands that read stdin' test was flaky in CI: - Local: took 5073ms when expecting <5000ms (73ms over) - SSH: took 8645ms when expecting <8000ms (645ms over) Increased timeouts to provide headroom for CI runner variability: - Local: 5000ms → 6000ms (+20%) - SSH: 8000ms → 10000ms (+25%) These timeouts verify the command completes quickly (not hanging until the bash tool's 180s timeout), while accounting for CI slowness.

CI continues to show high variability: - runtimeExecuteBash local: 5073ms → 7079ms (trending up) - runtimeExecuteBash SSH: 8645ms (within new limits) - initWorkspace SSH: 12127ms when expecting <10000ms Increased timeouts to be more generous: - Local runtime: 6000ms → 10000ms (+67%) - SSH runtime: 10000ms → 15000ms (+50%) - Init queue check: 10000ms → 15000ms (+50%) These tests verify operations complete quickly (not hanging until the bash tool's 180s timeout). The large headroom accounts for CI slowness while still catching actual hangs.

## Problem Terminal-bench was failing with all tasks showing 0 input/output tokens because the agent was exiting immediately after receiving the user message, without making any API calls. **Symptoms:** - Latest nightly runs (Nov 5-7): All 80 tasks failed with `agent_timeout` - Agent ran for only ~45 seconds then exited - `total_input_tokens: 0`, `total_output_tokens: 0` - Stream started with `caught-up` and `user` message, but no `stream-delta` or `stream-end` events **Root cause:** The `agentSessionCli.ts` reads the user message from stdin via a pipe: ```bash printf '%s' "$instruction" | bun src/debug/agentSessionCli.ts ... ``` Once stdin reaches EOF and is consumed, Bun detects no other active handles keeping the event loop alive and exits the process, **even though async work (API streaming) is still pending**. ## Solution Add an explicit keepalive interval that ensures the process stays alive until `main()` completes. The interval runs far into the future (1000 seconds) but gets cleared in the finally block once the agent session finishes. ## Testing **Before fix:** - Run #19173435224: 1 task, 0 tokens, ~2 min total (agent ran 45s) - Agent exited immediately after user message **After fix:** - Run #19173548174: 1 task, **resolved: true**, ~7 min total (agent ran 3m17s) - 22 tool calls made - Stream-delta events present - Agent completed successfully ## Related - Fixes nightly terminal-bench failures from Nov 5-7 - Related to PR #507 (dist/ in archive) and PR #513 (build step in workflow) _Generated with `cmux`_

chatgpt-codex-connector bot reviewed Nov 6, 2025

View reviewed changes

.github/workflows/terminal-bench.yml Show resolved Hide resolved

🤖 fix: build only main+preload (skip icons)

aedcbc7

Icons aren't needed for terminal-bench, and building them requires ImageMagick. Build only the essential JavaScript bundles needed for the benchmark.

ammario approved these changes Nov 6, 2025

View reviewed changes

ammario enabled auto-merge November 6, 2025 16:03

ammar-agent added 2 commits November 6, 2025 16:24

ammario disabled auto-merge November 6, 2025 16:49

ammario merged commit cff4cd3 into main Nov 6, 2025
13 of 14 checks passed

ammario deleted the tb-timeout-v2 branch November 6, 2025 16:49

ammar-agent mentioned this pull request Nov 6, 2025

🤖 fix: increase stream timeouts in CI to handle rate limits #514

Closed

ammar-agent restored the tb-timeout-v2 branch November 7, 2025 15:45

This was referenced Nov 7, 2025

🤖 fix: prevent Bun from exiting when stdin closes in agentSessionCli #528

Closed

🤖 fix: keep process alive when stdin closes in agentSessionCli #529

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 fix: build dist/ before running terminal-bench #513

🤖 fix: build dist/ before running terminal-bench #513

Uh oh!

ammar-agent commented Nov 6, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🤖 fix: build dist/ before running terminal-bench #513

🤖 fix: build dist/ before running terminal-bench #513

Uh oh!

Conversation

ammar-agent commented Nov 6, 2025

Problem

Solution

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants