feat(mcp): pause_session tool + MCP-aware pause() yield mode#5544
Merged
feat(mcp): pause_session tool + MCP-aware pause() yield mode#5544
Conversation
In-test pause() calls hung subprocess runs invoked through the MCP server because
readline blocked on stdin that an agent can't supply. pause() now detects MCP
context (CODECEPTJS_MCP=1, non-TTY stdin) and adapts:
- Skip mode (CODECEPTJS_MCP=1 only): pause() prints a notice and resolves
immediately so leftover pause() calls don't deadlock CI runs.
- Yield mode (CODECEPTJS_MCP_PAUSE=1): pause() reads JSON-line commands on
stdin and emits {__mcpPause:true,...} responses on stdout (paused, result,
resumed, exited, error). Each run/snapshot response includes the artifact
bundle from captureSnapshot.
The new MCP server pause_session tool spawns a test subprocess in yield mode
and multiplexes start/run/snapshot/step/resume/exit/status sub-actions over
the JSON-line protocol. TTY behavior at a terminal is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the id-keyed message multiplexer and 7-action enum (run/snapshot/step/ resume/exit/status). The yield-mode subprocess now reads plain text lines from stdin (same shape as the TTY readline REPL) and emits one JSON line per input on stdout. The MCP server pause_session tool exposes only "start" and "run". A run takes a code string with the same conventions as the TTY pause REPL — "" steps, "resume" continues, "exit" aborts, otherwise treat as I.<expr> or =>raw_js. Each run returns the next protocol message. Net: 237 lines removed, 159 added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_test now spawns its subprocess in pause yield mode and returns early
with {status:"paused"} when the test hits pause(). The agent then drives
the REPL through the new "pause" tool, which only takes a code string.
Drops the standalone pause_session.start action — pause only makes sense
when a test is already running. Resume / step / exit are just code values
(matching the TTY pause REPL conventions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…container
Previously pause yield mode spawned a test subprocess and shuttled JSON-line
messages through stdin/stdout. That was a lot of plumbing for something the
existing run_step_by_step tool already does cleanly: run codecept in-process
in the MCP server itself.
Now lib/pause.js exposes setPauseHandler/setNextStep. The MCP server
installs a handler at startup that turns pause() into a Promise the agent
controls. run_test races bootstrap+run() vs that paused promise; on pause
it returns {status:"paused"} with the test promise stashed at module level.
The pause tool drives the REPL by running code through the same I that the
test is using, no IPC. resume/exit await the test promise and return the
final reporter result.
Drops: pauseChild, pauseProtocolWaiters, pauseProcessChunk, mcpYieldSession,
emitMcpProtocol, ensureMcpReadline, the CODECEPTJS_MCP* env detection in
lib/pause.js. The TTY readline path is unchanged.
Net: 270 added, 526 removed across pause/mcp files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pause tool was duplicating the TTY pause REPL (empty/resume/exit magic strings, => prefix, default I.<expr>) when MCP already has run_code for running code against the live container. Both tools share the same I, so during a paused test, run_code is the right surface for code execution. Replace pause with a simple "continue" tool that just releases the paused test and returns the final reporter result. Drop setNextStep — no step-by-step mode for MCP (use run_step_by_step if needed). Net: 55 added, 152 removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous patch hijacked process.stdout.write at the start of run_test and only restored it inside collectRunCompletion (i.e., on continue). That muted the MCP SDK's own protocol writes during the pause window — any run_code or continue response would be lost. Reuse the existing withSilencedIO helper instead. Wrap run_test's race and continue's await-pending-run inside it, so stdout is muted while codecept is producing step output and restored before the tool returns its MCP response. The MCP SDK writes responses on a clean stdout. While paused, the test is suspended (handler promise unresolved), so no test output is being produced — no need to mute. run_code calls during pause go through the existing run_code handler, which has its own isolation pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_test now accepts an optional pauseAt (1-based step index). The MCP
server tracks step.after events; when stepIndex matches pauseAt, it
schedules pauseNow() through the recorder so the test pauses between
steps. Useful as a programmatic breakpoint without editing the test —
the agent gets step indices via the list CLI or run_step_by_step.
The paused response now includes:
- pausedAfter: { index, name, status } of the last completed step
- page: { url, title, contentSize } via the live helper
- suggestions: which tool to call next (snapshot / run_code / continue)
lib/pause.js gains pauseNow() which schedules a one-shot pauseSession via
recorder.add — the same mechanism as the in-test pause() but without
re-attaching the global event listeners.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously run_step_by_step ran the whole test to completion in one call and returned a fat blob of per-step artifacts. That's the aiTrace plugin's job, not an interactive tool's. Now it pauses after every step using the same pauseNow + handler machinery as run_test's pauseAt: agent calls run_step_by_step, gets back a paused payload after step 1, calls continue to advance to step 2, and so on. At any pause they can run_code / snapshot to inspect state. continue is unified: it races "test paused again" vs "test completed", so the same call works for run_step_by_step (re-pauses each time), pauseAt (runs to end), and explicit pause() in the test (runs to end). Module- level pendingTestFile / pendingStepInfo carry the paused-payload data through repeated continue cycles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pause()now detects MCP context (CODECEPTJS_MCP=1, non-TTY stdin) and adapts: a skip mode that resolves immediately so leftoverpause()calls don't deadlock CI runs invoked through MCP, and a yield mode (CODECEPTJS_MCP_PAUSE=1) that reads JSON-line commands on stdin and emits{__mcpPause:true,...}responses on stdout (paused / result / resumed / exited / error). Each run/snapshot response carries the same artifact bundle asrun_code/snapshot(URL, ARIA, HTML, screenshot, console, storage).pause_sessionwith sub-actionsstart/run/snapshot/step/resume/exit/status. Spawns a test subprocess in yield mode, multiplexes commands by id, and queues consumers waiting for the nextpausedevent.npx codeceptjs run --debugat a terminal) is unchanged.Why
Before this change, an agent driving CodeceptJS through MCP couldn't tolerate
pause()in a test —readlineblocked on stdin the agent couldn't supply, the subprocess hung, and MCP eventually timed out. There was also no way for the agent to drive the REPL itself. This PR makes both work without affecting the human TTY workflow.Files
lib/pause.js— context detection, yield-mode session, persistent readline across pauseSession entries.bin/mcp-server.js—pause_sessiontool, JSON-line subprocess multiplexer, line-buffered stdout/stderr classifier.docs/mcp.md,docs/debugging.md— documentedpause_sessionandpause()'s three modes.test/unit/pause_test.js(new) — 10 cases: env detection, JSON envelope shape, protocol round-trip (paused/resumed/snapshot/invalid-JSON/unknown-type/exit-rejects).test/unit/mcpServer_test.js— 6 new cases for the line classifier.Test plan
npm run test:unit— 685 passed, 0 failednpx mocha test/unit/pause_test.js test/unit/mcpServer_test.js— 48 passedlib/pause.jsandbin/mcp-server.jsnpx codeceptjs run --debugwith an in-testpause()still drops to the human REPL exactly as todayCODECEPTJS_MCP=1 npx codeceptjs runwith apause()test prints the skip notice and continuespause_session.starton a test withpause()returns apausedevent;runcalls return artifact bundles;resumelets the test finish🤖 Generated with Claude Code