feat(mcp): per-test plugin overrides + shell session lifecycle#5547
Merged
feat(mcp): per-test plugin overrides + shell session lifecycle#5547
Conversation
- run_test / run_step_by_step accept a `plugins` object that mirrors
the CLI `-p` flag (e.g. `{ screencast: { saveScreenshots: true },
aiTrace: { on: 'fail' }, pause: true }`). Container is re-initialized
when the plugin set changes between calls.
- start_browser / stop_browser now drive a full shell session like
`codeceptjs shell`: bootstrap, recorder.start, suite.before /
test.before on start; matching after events plus codecept.teardown
on stop.
- run_code / snapshot now require an active session (shell or paused
test) and return a clear error pointing the agent at start_browser
or run_test otherwise. Plugins and listeners that depend on
suite.before / test.before now fire correctly during MCP usage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move artifact-on-disk reading from mcp-server.js into a TraceReader class
in lib/utils/trace.js. Python-style indexing via first / last / nth, kept
generic across kinds (aria / html / screenshot / console / storage). Sort
by filename — aiTrace's zero-padded step prefix means a lexical sort is
chronological.
run_code uses it to diff ARIA between the last aiTrace capture and the
new one produced by the steps inside this call:
const reader = new TraceReader(currentAiTraceDir)
const before = reader.last('aria')
// run code, aiTrace captures per step
const after = reader.last('aria')
if (before !== after) result.ariaDiff = ariaDiff(before, after)
initCodecept now force-enables aiTrace whenever the MCP server initializes
the container — it's the canonical per-step capture, no point in MCP doing
its own grabAriaSnapshot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/agents.md: new top-level page covering the MCP loop (open the page → read → run a CodeceptJS command → check → commit), how the agent reads page artifacts, and where MCP fits relative to pause(). - lib/aria.js: trim INTERACTIVE_ROLES to roles that actually take user input (drop container roles like grid/tablist/menubar); remove IGNORED_ROLES unwrap, icon-button auto-naming, and bool/null coercion in attribute values. Names are always emitted; attribute values are passed through as plain strings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_testandrun_step_by_stepnow accept apluginsobject mirroring the CLI-pflag — e.g.{ "screencast": { "saveScreenshots": true }, "aiTrace": { "on": "fail" }, "pause": true }. Keys are plugin names fromlib/plugin/, values are options (ortrue/{}for defaults). Plugins are merged intoconfig.plugins[name]withenabled: trueand the container is torn down + re-initialized whenever the plugin set changes between calls.start_browsernow does whatcodeceptjs shelldoes: container init →codecept.bootstrap()→recorder.start()→ emitsuite.beforeandtest.before.stop_browseremits the matching after events and runscodecept.teardown(). Plugins and listeners that hook into per-suite / per-test setup now actually fire during MCP usage.run_code/snapshotrequire a session. Calling either without an active shell session (or a paused test) now returns a clear error pointing the agent atstart_browserorrun_test. Avoids the silent-broken-state issue where these tools used to "work" but with no plugin/listener setup behind them.Test plan
run_codeerrors with the session-required hint.start_browser→run_codeworks; plugins enabled in config (e.g.screenshotOnFail) get theirsuite.before/test.beforesetup.run_testwithplugins: { screencast: { saveScreenshots: true } }produces a video and per-step screenshots inoutput/.run_testwithplugins: { aiTrace: { on: "fail" } }against a failing test produces an aiTracetrace.mdwith HTML/ARIA/screenshot.run_testcalls with the samepluginspayload do NOT re-init; calls with a different set do re-init (browser restarts).start_browserfollowed byrun_test(mocha-driven) does not double-emitsuite.before/test.before; bootstrap hook runs once.run_testwithpause()in the test →run_codeworks (paused test path counts as active session).run_test(noplugins) behaves as before.🤖 Generated with Claude Code