Skip to content

feat(evaluator): collect_artifacts glob to download workspace files on any outcome#85

Merged
zpzjzj merged 4 commits into
mainfrom
feat/collect-artifacts-glob
Jun 3, 2026
Merged

feat(evaluator): collect_artifacts glob to download workspace files on any outcome#85
zpzjzj merged 4 commits into
mainfrom
feat/collect-artifacts-glob

Conversation

@zpzjzj
Copy link
Copy Markdown
Collaborator

@zpzjzj zpzjzj commented Jun 3, 2026

What & why

Adds collect_artifacts: declare doublestar glob patterns (*.md, a/**/b, **/*.json) that select files from the case workspace to download as run artifacts — regardless of whether the agent succeeded, failed, or timed out.

Previously, files only landed in outputs/ when the agent itself reported them via SessionResult.Artifacts.GeneratedFiles (basename-flattened), and on timeout/error that download path was skipped entirely. This gives evals an explicit, declarative way to capture workspace outputs even on failure.

Behaviour

  • Matches are downloaded to <output-dir>/<case>/<config>/outputs/workspace/<relative-path>, preserving the workspace-relative path (report/run-1/x.jsonoutputs/workspace/report/run-1/x.json).
  • Two config layers, merged as a de-duplicated union:
    • cases.defaults.collect_artifacts — applies to every case.
    • per-case collect_artifacts — appended on top.
  • Glob: * matches within a path segment, ** matches across directories.

Implementation notes for reviewers

  • Runs on timeout/failure: collectGlobArtifacts is invoked in executeCaseOnce after the agent run and before both the timeout/error early-return and the deferred rt.Close(). The case context may already be cancelled / DeadlineExceeded, so collection runs on a detached context (context.WithoutCancel + its own 60s deadline).
  • Runtime-agnostic enumeration: lists workspace files via find . -type f through the runtime, then matches in Go with doublestar — works uniformly across none / docker / opensandbox.
  • Read-only & orthogonal: only find + DownloadFile, never mutates git state. Distinct from report.artifacts (artifact types) and the agent_judge workspace diff (a string fed to the judge, not downloaded files). Placed after the diff finalize so it can't pollute the diff.
  • Validation: empty / invalid globs are rejected at skill-up validate via doublestar.ValidatePattern.

Tests

  • internal/evaluator: relative-path preservation, download even when the context is cancelled (timeout simulation), no-patterns no-op, union/dedup resolution.
  • internal/config: valid globs accepted; empty + invalid globs rejected.

Docs

docs/{guide,zh/guide}/writing-evals.md (full-reference + dedicated section), both cli-reference.md output trees (outputs/workspace/), skills/skill-upper/references/eval-yaml.md, and CHANGELOG.md.

🤖 Generated with Claude Code

…es on any outcome

Declare doublestar glob patterns (`*.md`, `a/**/b`, `**/*.json`) selecting
workspace files to download as run artifacts. Matches are copied to
<output-dir>/<case>/<config>/outputs/workspace/ preserving their
workspace-relative path, regardless of whether the agent succeeded, failed,
or timed out.

- schema: cases.defaults.collect_artifacts (all cases) + per-case
  collect_artifacts, merged as a de-duplicated union.
- evaluator: collectGlobArtifacts runs after the agent run and before the
  timeout/error early-return and the deferred rt.Close(); it detaches from the
  (possibly cancelled/deadline-exceeded) case context so collection still runs
  on timeout. It is read-only — only `find` + DownloadFile, never mutates git —
  so it stays orthogonal to the agent_judge workspace diff.
- validator: reject empty/invalid globs via doublestar.ValidatePattern.
- docs (en/zh) + skill reference + CHANGELOG updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@zpzjzj zpzjzj requested a review from hittyt as a code owner June 3, 2026 01:57
…s-glob

# Conflicts:
#	docs/guide/writing-evals.md
#	docs/zh/guide/writing-evals.md
#	internal/config/validator.go
@zpzjzj zpzjzj requested review from jwx0925, lbfsc and lijunfeng722 June 3, 2026 02:17
@zpzjzj
Copy link
Copy Markdown
Collaborator Author

zpzjzj commented Jun 3, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a963a78efc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/evaluator/artifacts_collect.go
zpzjzj and others added 2 commits June 3, 2026 10:34
…llection

Mirror prepareOutputDir behaviour: remove outputs/workspace before
downloading the current matches so a reused --output-dir or a retry
attempt cannot leave stale artifacts from an earlier run that the
current workspace no longer produces or matches. The clear happens only
after listing succeeds, so a transient list failure preserves prior
output. Adds a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
doublestar's **/* match dotfile paths, and agent_judge runs commit a
baseline .git into the workspace (prepareWorkspaceDiffState), so a broad
glob like "**" would sweep the entire VCS object store into the
artifacts. Exclude .git in the find listing so collection captures agent
output, not framework noise. Adds a regression test and a doc note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@ninan-nn ninan-nn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zpzjzj zpzjzj requested a review from ninan-nn June 3, 2026 03:29
@zpzjzj zpzjzj self-assigned this Jun 3, 2026
@zpzjzj zpzjzj added the enhancement New feature or request label Jun 3, 2026
Copy link
Copy Markdown
Collaborator

@lijunfeng722 lijunfeng722 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zpzjzj zpzjzj merged commit 5fa437a into main Jun 3, 2026
12 checks passed
zpzjzj added a commit that referenced this pull request Jun 3, 2026
…ests

CI on `windows-latest` surfaced four failures after the latest rebase
onto main's 0.2.4 (which added new timeout/process-group tests under
PR #85 and PR #24).

`classifyExecError`: when ctx terminated the process, force exit code
-1 *before* checking *exec.ExitError. On POSIX a killed `sleep` already
surfaces as -1 (signal), but on Windows bash's `sleep 1` killed via
ctx-cancel returns OS exit code 1, which the surrounding Exec switch
would then misclassify as "real failure" instead of "killed by
context" — so the log line "command exited with code 1: sleep 1"
showed up instead of "command killed by context". Forcing -1 in the
ctx-cancel branch fixes `TestNoneRuntime_ExecReturnsContextErrorOnTimeout`
and the two `LogsDeadlineDelta` / `OmitsDeadlineDelta` tests on Windows
without changing POSIX behaviour (POSIX still arrives via the same
branch with -1 from the signal kill).

`TestNoneRuntime_ExecKillsDescendantsOnTimeout` and `TestScriptJudge_Timeout`:
both rely on POSIX process-group semantics — killing the parent shell
takes the whole descendant tree with it. Windows has no equivalent;
`configureProcessGroup` is a no-op there, so a backgrounded ping/sleep
survives ctx-cancel and (in the script-judge case) holds t.TempDir as
its cwd, breaking RemoveAll cleanup at test end. Skip both on Windows
with a comment pointing at the cancellation-itself test as the
remaining Windows coverage.

`TestCustomAgent_CollectArtifacts_MixedSafeAndEscaping`: the safe-suffix
check used `strings.HasSuffix(p, "/safe.txt")`, which can never match a
Windows path like `C:\...\safe.txt`. Switch to `filepath.Base(p) ==
"safe.txt"` so the assertion is OS-neutral.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zpzjzj added a commit that referenced this pull request Jun 4, 2026
…ests

CI on `windows-latest` surfaced four failures after the latest rebase
onto main's 0.2.4 (which added new timeout/process-group tests under
PR #85 and PR #24).

`classifyExecError`: when ctx terminated the process, force exit code
-1 *before* checking *exec.ExitError. On POSIX a killed `sleep` already
surfaces as -1 (signal), but on Windows bash's `sleep 1` killed via
ctx-cancel returns OS exit code 1, which the surrounding Exec switch
would then misclassify as "real failure" instead of "killed by
context" — so the log line "command exited with code 1: sleep 1"
showed up instead of "command killed by context". Forcing -1 in the
ctx-cancel branch fixes `TestNoneRuntime_ExecReturnsContextErrorOnTimeout`
and the two `LogsDeadlineDelta` / `OmitsDeadlineDelta` tests on Windows
without changing POSIX behaviour (POSIX still arrives via the same
branch with -1 from the signal kill).

`TestNoneRuntime_ExecKillsDescendantsOnTimeout` and `TestScriptJudge_Timeout`:
both rely on POSIX process-group semantics — killing the parent shell
takes the whole descendant tree with it. Windows has no equivalent;
`configureProcessGroup` is a no-op there, so a backgrounded ping/sleep
survives ctx-cancel and (in the script-judge case) holds t.TempDir as
its cwd, breaking RemoveAll cleanup at test end. Skip both on Windows
with a comment pointing at the cancellation-itself test as the
remaining Windows coverage.

`TestCustomAgent_CollectArtifacts_MixedSafeAndEscaping`: the safe-suffix
check used `strings.HasSuffix(p, "/safe.txt")`, which can never match a
Windows path like `C:\...\safe.txt`. Switch to `filepath.Base(p) ==
"safe.txt"` so the assertion is OS-neutral.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zpzjzj added a commit that referenced this pull request Jun 4, 2026
* ci: run build & test on a windows-latest runner

Promote Windows toward a first-class platform (issue #31): the build job
now runs on an ubuntu/windows OS matrix. The Windows leg is
continue-on-error for now so its failures surface as a regression
baseline without gating merges; codecov upload stays Linux-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: make the script judge cross-platform on Windows

The script judge hardcoded a POSIX execution model: scripts were
uploaded to /tmp without an extension and run via `chmod 700 && ./script`
through a `bash -c` shell. On native Windows none of that works, so
every script-judge test was skipped.

Introduce internal/platform to centralize OS-conditional shell and
binary discovery, and dispatch script execution by interpreter:

- runtime.NewShellCmd selects the host shell — bash/sh on POSIX, cmd.exe
  on Windows (wrapping the command so cmd's outer-quote stripping leaves
  embedded quoted paths intact).
- Runtime gains an optional TargetOSer interface; NoneRuntime reports the
  host OS, OpenSandboxRuntime always reports linux.
- The script judge plans execution from the file extension (or shebang):
  POSIX targets keep the original chmod+shebang behavior; Windows targets
  dispatch .ps1 to PowerShell, .cmd/.bat to cmd, and .sh to a discovered
  bash (Git Bash / WSL / SKILL_UP_BASH), with a clear error when absent.
- shellquote gains QuoteWindows (CommandLineToArgvW semantics) and QuoteFor.

Every t.Skip("windows") in the script judge tests is replaced with
platform-aware table-driven cases, plus new interpreter and quoting unit
tests that run on all platforms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent): route quoting through shellquote, fix Check on Windows

The agent package carried its own POSIX single-quote implementation,
duplicating internal/shellquote. Collapse shellQuote into a thin
delegator so the project keeps one quoting implementation; agent
commands always run under bash, so POSIX quoting stays correct.

CLIAgent.Check ran `command -v <bin>`, which cmd.exe cannot execute.
checkCommandForOS now translates it to `where <bin>` for Windows
targets, so agent availability checks work on a Windows host.

Drop the unused platform.LookAgentBinary wrapper: agent binaries are
resolved inside the runtime shell (which may be a remote sandbox), so a
host-side exec.LookPath wrapper has no caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: add PowerShell tooling and example script for Windows

GNU make and POSIX sh are not available out of the box on Windows.
Add PowerShell counterparts under scripts/windows/ for the hooks,
lint-tools, and verify Makefile targets, and a .ps1 version of the
example judge-debug-eval script so Windows users can author script
judges without a bash dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: pin line endings via .gitattributes

Without explicit attributes, a Windows clone with core.autocrlf=true
rewrites checked-in scripts with CRLF. A CRLF shebang breaks shell
script execution and CRLF in .go files trips gofmt. Pin .go/.sh/.ps1
to LF and .cmd/.bat to CRLF.

Path construction in internal/runner, internal/report, and internal/skill
was audited: all of it already uses filepath.Join, so no code changes
were needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add Windows support guide

Add a Windows support page (English + Chinese) covering supported
features, the script judge interpreter dispatch, SKILL_UP_BASH, the
scripts/windows/ tooling, and known limitations. Register it in the
VitePress sidebar and point AGENTS.md / CONTRIBUTING.md at it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: clarify OpenSandbox runtime behavior on Windows

Research outcome for the "Windows + opensandbox" question: the
opensandbox runtime never spawns a host shell and already crosses the
host->sandbox path boundary via filepath.ToSlash, so driving a remote
(Linux) sandbox from a native Windows host works today. OpenSandbox has
no Windows-container support — its SDK and execution API are Linux-only
— and WSL2 is the recommended path for the full agent workflow without
a remote sandbox. Document all three points.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: correct OpenSandbox Windows-sandbox availability

Earlier I stated OpenSandbox has no Windows-container support. That was
wrong: OpenSandbox does offer a Windows guest profile (dockur/windows in
KVM/QEMU) selected via `platform.os=windows` on create. The actual gap
is in the Go SDK, which has not yet added the Platform field to its
CreateSandboxRequest. Note that as the upstream dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): use POSIX quoting and forward-slash paths where targets are POSIX

Two Windows CI failures with the same root cause:

* internal/evaluator/fixtures.go and internal/runtime/opensandbox.go
  build commands that run inside a POSIX shell (the runtime sandbox or
  bash session). They were using shellquote.Quote, which is build-tagged
  to host quoting and produced Windows double quotes when skill-up runs
  on a Windows host. Switch to the explicit QuotePOSIX so the
  quoting matches the target shell regardless of host OS. Fixes
  TestGitInitUploader_QuotesSpecialCharsInRemote, TestApplyDiffUploader,
  TestGitCheckoutUploader, and the two OpenSandbox upload tests.

* internal/cli/import.go writes case file paths into the generated
  eval.yaml. eval.yaml is a portable config consumed by the loader on
  any OS; use path.Join with filepath.ToSlash so the value stays
  "cases/case-N.yaml" instead of becoming "cases\case-N.yaml" on
  Windows. Fixes TestGenerateEvalConfig, TestImportCmd_OutputIsSkillRoot,
  TestGenerateEvalConfig_UsesImportedCaseIDs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): force -1 on ctx-cancel and make tests OS-aware

Round two of fixing the Windows CI failures.

* classifyExecError now returns -1 unconditionally when ctx.Err() is
  set, so a killed cmd.exe (which surfaces exit code 1) does not look
  like a normal failure. TestNoneRuntime_ExecReturnsContextErrorOnTimeout
  also now uses `ping -n 3 127.0.0.1 > nul` on Windows to give the
  context time to fire before the process exits on its own.

* TestNoneRuntime_ExecExpandsPathFromRuntimeEnv asserts POSIX
  `printf "$PATH"` expansion; Windows cmd.exe has neither, so skip.

* TestIterationWorkspace_Paths now builds expected paths with
  filepath.Join so the assertions match the host separator.

* TestContextFilesUploader_RejectsUnsafePaths's "absolute" case used
  `filepath.Join(string(filepath.Separator), "tmp", "secret.txt")`,
  which is `\tmp\secret.txt` on Windows and treated as relative. A
  small absoluteSecretPath() helper returns a true Windows-absolute path
  (`C:\tmp\secret.txt`) when running there.

* TestCLIAgent_Run / RunExitError / InstallSkillWithCmd /
  InstallMCPUsesResolvedEndpointConfigRefAndEnv all bake POSIX-shell
  templates (`exit %d`, `mkdir -p ... && echo > ...`, `$VAR`) into
  their RunCmd/InstallSkillCmd/InstallMCPCmd. Native Windows agent
  execution is intentionally out of scope (users go through WSL2), so
  a shared skipIfNoPOSIXShell helper skips these on Windows.

* TestFindQoderSessionFile_SelectsNewestByModTime hand-builds a
  workspace-key directory that embeds the workspace path; on Windows
  that path starts with `C:` and is not a legal directory component.
  Skipped along with the rest of native-Windows Qoder support.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): use POSIX quoting in git checkout helper and run tests under bash

Two remaining Windows CI failures after rebase:

* gitCheckoutUploader was added on main with shellquote.Quote(branch),
  the host-aware variant, which produced double-quotes when running on
  a Windows host. The script always runs in a POSIX shell inside the
  runtime (the `none` runtime on a POSIX host or the Linux
  OpenSandbox), so use QuotePOSIX explicitly. Fixes
  TestGitCheckoutUploader_ErrorPathDoesNotEvaluateBranch.

* `go test -race -coverprofile=coverage.out ./...` produced
  `FAIL .out [setup failed]` on Windows: pwsh's legacy native-argument
  passing splits `-coverprofile=coverage.out` and feeds `.out` to go
  as a package import path. Force `shell: bash` on the test step so
  args reach go.exe verbatim; Git Bash is preinstalled on
  windows-latest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): address PR #33 review nits

Round of fixes for the P2 nits left on the Windows-support PR:

* Prefer Git Bash on Windows for NoneRuntime shell invocations and fall
  back to cmd /d /c when no bash is discoverable. This keeps the many
  internal POSIX command strings working (agent CLI prompts via
  shellQuote, `set -eu` git fixtures, workspace-diff `if ...; then`
  pipelines) and gives `%VAR%`-bearing inputs bash's literal
  double-quote semantics so cmd's mid-arg expansion stops mangling them.
  `/d` disables HKLM/HKCU AutoRun so the host's cmd AutoRun cannot
  prepend commands to every Exec.

* DiscoverBash no longer probes `C:\Windows\System32\bash.exe`; WSL
  bash expects `/mnt/c/...` paths, so picking it up would let `.sh`
  script judges fail with file-not-found. Users with WSL pipelines can
  still point SKILL_UP_BASH at it after arranging path translation.

* shellquote.QuoteWindows now treats `(`, `)`, and `%` as quoting
  triggers, so paths like `C:\tmp\(draft)\script.cmd` get wrapped
  instead of being passed as cmd syntax. The `%` case is documented as
  best-effort (cmd expands `%VAR%` even inside quotes; preferring bash
  via NewShellCmd is the real fix).

* shebangExtension parses the interpreter basename instead of doing a
  substring match, so `#!/usr/bin/env fish`, `python3`, `swish`, etc.
  no longer get misclassified as `.sh`. `env -S <name>` is handled.

* Agent's PATH override (`$HOME/.local/bin:$HOME/.nvm/current/bin:$PATH`)
  is build-tagged out on Windows so the inherited Windows `Path` reaches
  `where codex`/`where claude` checks intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): always quote QuoteWindows output and disable cmd AutoRun for judge

Two follow-up nits from the latest Codex review pass:

* P1: QuoteWindows previously returned the raw string for inputs without
  whitespace/metacharacters, so common script-judge paths such as
  `C:\tmp\skill-up-judge-N\script.cmd` were emitted unquoted. With
  NewShellCmd now routing through bash on Windows when Git Bash is
  available, unquoted backslashes are stripped by bash and the
  downstream `cmd /c`/`powershell -File` receives `C:tmpscript.cmd` —
  which made every script-judge test fail on the windows-latest runner.
  Drop the fast path and always wrap; double-quoted backslashes are
  literal under both bash and cmd, and CommandLineToArgvW unwraps them
  the same way.

* P2: the script-judge `.cmd`/`.bat` invocation and the workspace
  cleanup both used plain `cmd /c`, so HKLM/HKCU `Command Processor\AutoRun`
  could prepend commands before the judge script and make results
  non-deterministic. Switch them to `cmd /d /c` to match the runtime
  shell fallback hardening.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): defang MSYS argv conversion and skip stdio-framing test

The remaining Windows CI failures after the QuoteWindows always-quote
change came from two unrelated places:

* Git Bash / MSYS rewrites `/x`-shaped argv entries as POSIX paths
  before invoking native Windows binaries, so `bash -c "cmd /d /c X"`
  reached cmd.exe as `cmd "C:/Program Files/Git/d" ...` — cmd never
  saw its `/d /c` switches, dropped into the interactive prompt, and
  every script-judge test got the cmd banner as the script's stdout.
  Setting `MSYS_NO_PATHCONV=1` and `MSYS2_ARG_CONV_EXCL=*` in the env
  NoneRuntime hands the bash child is the standard escape hatch. The
  vars are no-ops when bash isn't the launched shell.

* TestMockedGenericServer_FramedNonASCIIRoundTrip drives the mocked MCP
  server's Content-Length framing through a Node child whose stdout, on
  default Windows, is CRLF-translated and codepage-rewritten — that
  corrupts a non-ASCII byte stream. Verifying the framing on POSIX is
  enough; skip with a doc comment pointing at the underlying Node
  stdio behavior. Not related to the Windows-support work itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: make Windows job required now that all tests pass

The windows-latest leg of the Build & Test matrix has been green for
several CI runs in a row (script judge, agent, evaluator, runtime,
mcp, all packages pass). Drop the continue-on-error gate so future
Windows regressions block the PR instead of being silently absorbed.

This is the final commit of the issue #31 Windows-support rollout
sketched in the original plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): skip WSL bash even when PATH-discovered

Even with C:\Windows\System32\bash.exe removed from knownWindowsBashPaths,
DiscoverBash on Windows still returned whatever exec.LookPath finds
first — and System32 is on PATH on every Windows host, so the WSL bash
shim wins. The shim expects Linux-format `/mnt/c/...` paths, so the
host-shell command strings and script-judge `.sh` invocations built from
`C:/...` host paths fail with file-not-found despite "bash discovered"
succeeding. Detect the System32-located shim by directory and fall
through to the Git Bash well-known paths so PATH-based hosts behave the
same as ones without bash on PATH.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(mcp): skip RejectsSymlinkEscape on Windows for the same Node-stdio reason

windows-latest creates symlinks fine (the runner runs with the
required privilege), so this test no longer falls into the existing
"symlinks not supported" skip path; it instead reaches the same Node
stdio framing hang that TestMockedGenericServer_FramedNonASCIIRoundTrip
hits. Skip on Windows with the same justification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(mcp): centralize Windows skip in startMockServer

Every mocked MCP server test spawns a child Node process and exchanges
Content-Length-framed JSON-RPC over its stdout — the same setup that
default-config Node stdio corrupts on Windows (codepage + CRLF). Moving
the Windows skip from per-test guards into the shared startMockServer
helper covers all current and future mock-server tests in one place,
and removes the two earlier per-test skips that this helper now
subsumes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): reject WSL bash via SKILL_UP_BASH override too

The override branch in DiscoverBash accepted any regular file at the
configured path, including C:\Windows\System32\bash.exe (the WSL
shim). The shim expects /mnt/<drive>/... paths, so script-judge `.sh`
invocations built from C:/... host paths still fail even when the
override is set. Apply the same isWSLBash check on the override path
that PATH-discovery already uses; advanced WSL users must point the
override at a non-WSL bash or arrange path translation upstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: add a Windows e2e job

The Windows leg of Build & Test exercises the unit tests, but the
Windows-specific code paths added by issue #31 (NewShellCmd shell
selection, MSYS argv handling, QuoteWindows, WaitDelay, script-judge
interpreter dispatch) only really light up when skill-up runs as a
subprocess through its own CLI — which is the e2e package. Add an
E2E (none runtime, Windows) job that runs `go test -tags e2e ./e2e`
without SKILL_UP_FULL_E2E, so the LLM-dependent tests self-skip and
the mock-engine / script-judge contract tests land on Windows.

This costs ~one extra CI job slot and no API keys; real-LLM coverage
keeps living in the Linux e2e job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(e2e): emit skill-up.exe on Windows so TestMain build is executable

The e2e harness's TestMain builds the skill-up CLI into a temp dir and
then exec's it for every test. `go build -o <path>` writes the binary
to exactly the path given, so on Windows the result has no extension
and Windows refuses to execute it ("executable file not found in
%PATH%"). Append .exe on Windows so every test in the suite can
actually launch the binary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): normalize transcript path and honor shebang options for .sh

Two follow-up nits from the latest Codex pass:

* P1: EVAL_TRANSCRIPT_PATH was built with joinForGOOS, so on Windows
  it landed as `C:\...\transcript.json`. The `.sh` script-judge path
  runs under Git Bash where common idioms like `cat
  "$EVAL_TRANSCRIPT_PATH"` go through POSIX coreutils that only accept
  slash-style paths. Add an envPath translator to scriptPlan: identity
  for POSIX targets and `.ps1`/`.cmd`/`.bat` (native Windows paths);
  filepath.ToSlash for the `.sh` Windows plan. script.go applies it
  before injecting the env var.

* P2: planWindowsScript invoked bash explicitly as `bash <script>`,
  silently dropping any shebang-encoded options (e.g. `#!/bin/bash
  -eu` or `#!/usr/bin/env -S bash -eu`). On POSIX the kernel would
  honor those flags via shebang dispatch, so the same script behaved
  differently across platforms — a script relying on `-e` could
  continue past failures on Windows and report PASS. Refactor
  parseShebang to return (interpreter, opts) and forward opts in the
  bash command for the Windows `.sh` plan. Tests cover the env -S,
  direct, and only-flags shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): escape bash actives in script paths; align WSL docs

Two follow-up nits from the latest Codex pass:

* P2: NewShellCmd prefers Git Bash on Windows, so the Windows
  script-judge commands ultimately run through bash -c. Paths that
  contain `$` or backtick (e.g. via an unusual %TMP%) would be
  expanded/command-substituted by bash inside the QuoteWindows-emitted
  double quotes, so the downstream powershell -File / cmd /d /c saw a
  rewritten path. Add quoteWindowsThroughBash that wraps QuoteWindows
  and additionally escapes those two characters; the extra leading
  backslash is collapsed harmlessly by Windows path normalization when
  the same command falls through to cmd. Used by every Windows
  script-judge command builder and removeDirCommand.

* P2: docs/guide/windows.md (en + zh) advertised WSL bash as one of
  the discoverable options, but DiscoverBash now actively rejects any
  bash under %SystemRoot%\System32 (the WSL shim) at every step —
  SKILL_UP_BASH, PATH, and well-known locations — because it expects
  /mnt/c/... paths and silently fails on the Windows paths skill-up
  generates. Drop the WSL mention from the discovery table and add an
  explicit note documenting the rejection plus the recommended escape
  hatch (run skill-up inside WSL).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): make cmd fallback reliable, shell-aware quoter, env -S parsing

Codex flagged four P2s on the previous round; addressed:

* P2: cmd /d /c could trigger cmd's "preserve quotes" branch for certain
  command shapes (single-token executable, exactly two quotes). Switch
  the cmd fallback in NewShellCmd to `cmd /d /s /c "<command>"` so the
  outer-wrap strip is deterministic regardless of the inner command's
  shape; matching `cmd /d /s /c` is now also used by the script-judge
  .cmd plan and the cleanup command.

* P2: quoteWindowsThroughBash blindly escaped `$` and backtick into
  `\$` / `` \` `` even on hosts where NoneRuntime.Exec falls back to cmd
  (no bash). On the cmd path the extra backslash is preserved literally
  in the path, so e.g. `C:\tmp\$foo\script.cmd` became `C:\tmp\\$foo\…`
  -- a different file. Pick the quoter at plan time based on whether
  DiscoverBash succeeds (matching what NewShellCmd will choose), so the
  bash-active escapes only apply when the command actually goes through
  `bash -c`.

* P2: parseShebang accepted `env -S bash -eu` (split form) but rejected
  the compact GNU forms `env -Sbash -eu` and `env --split-string=bash`.
  Add explicit handling for both, rejoining the trailing tokens with a
  space so the result matches what env's -S would receive on a real
  shebang. Tests cover the split / compact / long forms.

* P2: scriptPlan now carries its own cleanupCommand (the global
  removeDirCommand is gone) so the cleanup goes through the same
  shell-aware quoter and the same `cmd /d /s /c` fallback as the run
  command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): consume env value-flags and use shell-style env -S tokenizer

Codex flagged two more parser-fidelity P2s plus a third about cmd
percent-expansion (which has no quoting-layer fix; replied with the
limitation):

* P2: parseEnvShebang treated every `-prefixed` env token as a single
  flag, so shebangs like `#!/usr/bin/env -u VAR bash -eu` or
  `#!/usr/bin/env -C /tmp bash` consumed the flag's value as the
  interpreter token (and reported "cannot determine interpreter" or
  routed to the wrong binary). Add a small set of value-taking short
  flags (-u, -C) the parser advances past; the long forms use `=`
  syntax so they remain self-contained.

* P2: splitStringInterpreter used strings.Fields, which breaks tokens
  inside quotes. A real shebang like `#!/usr/bin/env -S bash -c "echo
  ok"` would forward `["bash", "-c", "\"echo", "ok\""]` to bash on
  Windows instead of the three-token argv POSIX produces. Replace with
  a small shell-style tokenizer that respects single quotes, double
  quotes with `\"`/`\\` escapes, and outside-quote backslash escapes;
  unterminated quotes produce a clean "cannot determine interpreter"
  error rather than mis-split argv. Tests cover all three shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): bash-safe quoter, restrict shebang routing, /dev/null in where

Codex flagged three issues this round; all addressed:

* P1: windowsQuoter built on top of QuoteWindows then post-replaced
  every `$` with `\$`. For an input like `C:\tmp\$work\script.cmd`
  QuoteWindows preserved the literal `\$`, and the post-replace turned
  it into `\\$work` -- which under `bash -c` decoded to `\` + `$work`
  expansion, mangling the path. Replace the post-process scheme with a
  proper double-quote-with-bash-escapes quoter (every \ / " / $ /
  backtick doubled). Bash decodes them back to the original byte and cmd
  later collapses the resulting `\\` runs via Windows path
  normalization, so paths with `\$`-style segments survive both shells.

* P2: shebangPOSIXShells routed dash/ksh/zsh/ash to `.sh`, but the
  Windows .sh plan always invokes Git Bash -- silently running a zsh
  script through bash changes semantics. Restrict the Windows shebang
  classifier to `sh` and `bash`; other POSIX shells now return the
  planner's "cannot determine interpreter" error. POSIX targets are
  unaffected because planScript ignores extension there.

* P2: checkCommandForOS rewrote `command -v` to `where` but left POSIX
  `/dev/null` redirects untouched. cmd opens that as a literal path and
  the probe fails. Rewrite `/dev/null` to cmd's `nul` as part of the
  translation so `command -v codex >/dev/null 2>&1` becomes
  `where codex >/dev/null 2>&1`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): handle env long-flag values and `\_` separator in env -S

Two more parser-fidelity P2s from the Codex pass:

* P2: parseEnvShebang consumed `-u`/`-C` short flags' value tokens but
  not the long-form split shape (`--unset FOO bash`, `--chdir /tmp
  bash`). Add envValueTakingLongFlags and reuse the same +2 advance.
  The `=`-suffixed shapes (`--unset=FOO`) are still handled by the
  generic single-token skip.

* P2: tokenizeShebangSplitString treated every unquoted `\X` as
  "escape next byte literally", so the documented env -S separator
  `\_` (along with `\t`, `\n`, ...) silently became part of the same
  token. Decode the whitespace escapes (`\_`, `\t`, `\n`, `\r`, `\v`,
  `\f`) as token separators in unquoted context, so a shebang like
  `#!/usr/bin/env -S bash\_-eu` produces `(bash, [-eu])` instead of one
  glued `bash_-eu` token. Other backslash escapes keep their existing
  literal behavior.

Tests cover --unset/--chdir split forms and \_/\t separators.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(windows): unify shell host, drop dead quote API, require TargetGOOS

Self-review cleanup for the Windows-support PR. Five reviewer items were
addressable in code; four are documented limitations or out-of-scope.

Centralize shell decisions in platform.Host(): both NoneRuntime.Exec and
the .sh script-judge planner now consume a single HostShell value that
binds the shell command, quoter, and MSYS opt-out env together. This
removes the duplicate DiscoverBash calls in interpreter.go and the
goruntime.GOOS branch with inline MSYS env in NoneRuntime — the platform
package once again owns all OS dispatch.

Drop shellquote.Quote, shellquote.QuoteFor, and the windows/posix
build-tag files. No call site referenced them; QuotePOSIX and
QuoteWindows are explicit at every caller, so the dead symbols only
risked future misuse.

Promote TargetGOOS to a required Runtime interface method. The previous
TargetOSer optional interface defaulted unknown runtimes to "linux",
which would silently mis-route a future Windows-native runtime. Adding
the method to all six mocks surfaces the requirement at compile time.

Replace checkCommandForOS's strict prefix match with a regex so quiet
probes like `command -v codex >/dev/null 2>&1` translate correctly to
`where codex >/dev/null 2>&1` on Windows; the prior check only handled an
exact `command -v <name>` form.

Document two known limitations in the Windows guide: native agent CLI
execution still requires bash (so the PR's "first-class" scope covers
the runtime/judge layers, not agent bootstrap), and cmd.exe's %VAR%
expansion inside double-quoted arguments has no command-line escape —
install Git Bash to avoid the cmd fallback entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): cache platform.Host() and require bash for agent CLIs

Self-review follow-up addressing M1 and M3 from PR #33.

M3 (perf): platform.Host() now caches its HostShell via sync.OnceValue
on both POSIX and Windows. Previously every NoneRuntime.Exec call (and
every script-judge plan on Windows) re-ran exec.LookPath("bash") plus
the well-known Git Bash stats. The result is process-stable by design
-- SKILL_UP_BASH is documented as read-once at startup -- so this is a
straight cache. Removes the hottest filesystem-IO cost on the Exec
path.

M1 (security): all CLIAgent / ClaudeCodeAgent / CodexAgent /
QoderCLIAgent entry points now reject runs when rt.TargetGOOS() is
Windows and platform.Host().IsBash is false. Without this guard, the
cmd.exe fallback would still accept agent commands whose instruction
string is POSIX-quoted; cmd treats `'` as literal, so metacharacters
(& | " %VAR%) from arbitrary case messages could reach the host
shell. Native agent execution on Windows already required bash for the
nvm/Node bootstrap (documented in docs/guide/windows.md), so the new
ErrAgentRequiresBash makes that requirement explicit at the API
surface with a clear hint pointing at Git for Windows / SKILL_UP_BASH.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(judge): clean .sh temp dirs via bash rm on Windows

Self-review follow-up addressing M8 from PR #33.

The .sh script-judge plan already runs through bash -c, but cleanup
went through `cmd /d /s /c rd /s /q "<dir>"`. That meant the bash
shell's double-quote escaping (\\, ", $, `) had to round-trip safely
through cmd's strip-first-and-last-quote rule, which worked only
because judgeTempDir paths happen to be free of spaces and metachars
that cmd would mis-parse.

Stay inside bash for cleanup too: emit `rm -rf <forward-slash dir>`.
Git Bash's rm understands the forward-slash form of Windows temp
paths natively, so the bash -> cmd hop is avoided entirely. .ps1 /
.cmd / .bat continue to use the cmd-side cleanup since their command
itself is dispatched via cmd anyway.

Add TestCleanupCommand_Windows_ShellScriptUsesBashRm to lock in the
new shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(windows): post-review cleanups (CI versions, docs, lint dead code)

Self-review follow-up addressing M4/M5/M6 and several Info nits from
PR #33.

CI (M4): normalize actions/upload-artifact to @v5 across all jobs so
the upload/download halves stay on one major (previously a mix of
@v5 and @v7), and bring release-dryrun's actions/setup-go from @v5
up to @v6 to match every other job.

Docs (M5, M6): add a CHANGELOG [Unreleased] entry covering Windows
support, SKILL_UP_BASH, PowerShell tooling, and the agent bash
requirement. Point both READMEs at docs/guide/windows.md so the
"first-class Windows support" promise is discoverable without
relying on the vitepress sidebar.

Lint / code hygiene (I-series):
- Hoist string literals: introduce platform.GOOSWindows / Linux /
  Darwin so OS-dispatch sites stop comparing against bare strings;
  silences goconst on the new checkCommandForOS table tests.
- Drop the `_ = baseURL` noise in BaseAgent.logCredentialStatus; the
  baseURL value was never used after the binding.
- Drop the `plan.envPath != nil` guard in script judge; every
  scriptPlan returned by planScript sets envPath, so the nil branch
  was dead.
- Name the 10s WaitDelay magic number as execContextGracePeriod with
  a sourced comment.
- CONTRIBUTING: document the Windows equivalent of `make e2e`
  (`go test -tags e2e -v ./e2e`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(judge): handle env -a, optional-arg signal flags, NAME=VALUE, and \c

Codex auto-review on PR #33 flagged four GNU env corner cases the
shebang parser mishandled. All four would have produced a misleading
"cannot determine interpreter" error on Windows for shebangs that the
kernel's real env(1) would have executed cleanly. Each gets a targeted
fix plus a TestParseShebang table case.

- `-a ARG` / `--argv0=ARG`: env's documented argv0-override short
  option DOES consume a separate value token. Previously `-a myargv0
  bash` parsed as interpreter `myargv0`. Added `-a` to
  envValueTakingShortFlags.

- `--default-signal` / `--ignore-signal` / `--block-signal`: these
  take an OPTIONAL `[=sig]` argument, not a separate value token.
  Previously `--ignore-signal bash` was parsed as flag with value
  `bash`, leaving no interpreter. Removed them from
  envValueTakingLongFlags; the `--name=value` form is still handled by
  the generic "starts with `-`" skip arm.

- Leading `NAME=VALUE` assignments: env(1) accepts
  `env PYTHONPATH=/opt python3` and the equivalent
  `env -S PYTHONPATH=/opt python3`. Added isEnvAssignment +
  interpreterFromArgs so both parseEnvShebang's default branch and
  splitStringInterpreter walk past assignment tokens to the real
  interpreter.

- `\c` terminator: GNU env -S documents unquoted `\c` as "ignore the
  rest of the split-string body". tokenizerState gains a `done` flag
  the unquoted handler sets on `\c`; the top-level loop honors it,
  flushing the current token and discarding the tail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(judge): honor pwsh shebang and forward PowerShell shebang flags

Codex auto-review on PR #33 flagged two `.ps1` planning bugs that
diverged from how a kernel shebang would invoke the script.

`#!/usr/bin/env pwsh`: previously the extension dispatch always ran
`powershell.exe` (Windows PowerShell 5.x), so an extensionless script
asking for PowerShell Core 7+ silently got the legacy interpreter --
breaking pwsh-only syntax/modules. The planner now picks `pwsh` when
the shebang names it explicitly and keeps `powershell` as the default.

`#!/usr/bin/env -S pwsh -NoLogo`: previously the `.ps1` plan hard-coded
`powershell -NoProfile -ExecutionPolicy Bypass -File ...` and dropped
any shebang options, so a request like `-NoLogo` was silently lost.
The planner now forwards shebang options when the shebang names a
PowerShell interpreter, then appends the safety defaults so unsigned
scripts still execute in CI.

A `.ps1` file with no shebang (or a non-PowerShell shebang) keeps the
old `powershell.exe` + `-NoProfile -ExecutionPolicy Bypass -File`
behavior. Tests cover all three shapes plus the `pwsh` routing and
option forwarding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): tag this branch's CHANGELOG entry as 0.3.0

Maintainer asked for a concrete version on the CHANGELOG section
introduced by this PR. After rebasing onto main (now at 0.2.1), the
next release is 0.3.0:

- Adds first-class Windows support (substantial new feature surface).
- Promotes Runtime.TargetGOOS() from optional to required, which is a
  breaking change for any out-of-tree Runtime implementer. Under
  semver this argues for a minor bump while the project is in 0.x.

Also wires DockerRuntime (landed on main while this branch was open)
into the required TargetGOOS interface: docker provisions a Linux
guest, so TargetGOOS returns platform.GOOSLinux. Without this the
rebased branch would no longer build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent): use platform.GOOSWindows constant in requireBashOnWindowsHost

Every other TargetGOOS comparison in this package and judge/runtime/cli
already routes through platform.GOOSWindows; only requireBashOnWindowsHost
was still using the bare "windows" string literal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(judge): defensive copy in .ps1 plan.command to match .sh branch

The .ps1 closure was returning strings.Join(append(psArgs, ...), " "),
which mutates psArgs's backing array if cap exceeds len. Currently safe
because plan.command is called exactly once per script run, but a latent
footgun if anyone refactors to multi-invocation. The .sh branch right
below already does the copy-then-append dance; mirror it for consistency.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(runtime): correct DockerRuntime.TargetGOOS doc comment

The PR commit that introduced TargetGOOS() on DockerRuntime accidentally
duplicated the leading "// Start starts the container" line above the
new method, which revive flags as a malformed exported-method comment.
The pre-existing main lint job (cd41d6d) now fails the build on this
class of violation; drop the orphan line so the doc comment starts with
the method name as required.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(release): bump 0.3.0 date to 2026-05-27 to follow main's 0.2.3

* fix(windows): reject rooted POSIX-style paths in workspace/sandbox guards

`IterationWorkspace.WriteFile` and `safeLocalTarget` both used
`filepath.IsAbs` alone to reject user-supplied absolute paths. On
Windows `filepath.IsAbs` requires a volume name (`C:\...`), so a
POSIX-style `/abs.txt` or `/absolute` slipped through and the
underlying `os.WriteFile` / `filepath.Join` happily honored it —
exactly the path-traversal escape the guard was supposed to block.

Both call sites now additionally reject any path whose first byte is a
separator (`/` or `\`). The unit tests `TestIterationWorkspace_WriteFile`
and `TestOpenSandboxLocalHelpersRejectUnsafePathsAndPreserveRemoteScope`,
which assert `/abs.txt` / `/absolute` are rejected, now pass on the
Windows CI runner this PR introduces.

Surfaced by adding `windows-latest` to the build matrix; main does not
yet run these tests on Windows. Tests themselves are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(changelog): drop empty [0.3.0] header after main absorbed Windows into 0.2.4

The earlier "bump 0.3.0 date" commit on this branch staged a 0.3.0
section to hold the Windows entries. Main has since cut a 0.2.4
release that incorporated those entries directly, so the standalone
[0.3.0] header on this branch is now empty and would confuse readers.
Remove it; the Windows content is preserved under the just-released
[0.2.4] section above [0.2.3].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): force -1 exit on ctx-cancel; skip POSIX-process-group tests

CI on `windows-latest` surfaced four failures after the latest rebase
onto main's 0.2.4 (which added new timeout/process-group tests under
PR #85 and PR #24).

`classifyExecError`: when ctx terminated the process, force exit code
-1 *before* checking *exec.ExitError. On POSIX a killed `sleep` already
surfaces as -1 (signal), but on Windows bash's `sleep 1` killed via
ctx-cancel returns OS exit code 1, which the surrounding Exec switch
would then misclassify as "real failure" instead of "killed by
context" — so the log line "command exited with code 1: sleep 1"
showed up instead of "command killed by context". Forcing -1 in the
ctx-cancel branch fixes `TestNoneRuntime_ExecReturnsContextErrorOnTimeout`
and the two `LogsDeadlineDelta` / `OmitsDeadlineDelta` tests on Windows
without changing POSIX behaviour (POSIX still arrives via the same
branch with -1 from the signal kill).

`TestNoneRuntime_ExecKillsDescendantsOnTimeout` and `TestScriptJudge_Timeout`:
both rely on POSIX process-group semantics — killing the parent shell
takes the whole descendant tree with it. Windows has no equivalent;
`configureProcessGroup` is a no-op there, so a backgrounded ping/sleep
survives ctx-cancel and (in the script-judge case) holds t.TempDir as
its cwd, breaking RemoveAll cleanup at test end. Skip both on Windows
with a comment pointing at the cancellation-itself test as the
remaining Windows coverage.

`TestCustomAgent_CollectArtifacts_MixedSafeAndEscaping`: the safe-suffix
check used `strings.HasSuffix(p, "/safe.txt")`, which can never match a
Windows path like `C:\...\safe.txt`. Switch to `filepath.Base(p) ==
"safe.txt"` so the assertion is OS-neutral.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: align upload-artifact on @v7 to match main

An earlier commit on this branch normalized all upload-artifact refs to
@v5 for internal consistency. Main has since standardized on @v7
(dependabot bump #...), so four of this branch's refs (the e2e-none,
opensandbox, and badge-on-main uploads) were effectively downgrading
@v7@v5 after each rebase. Bring all six refs to @v7 so the branch
matches main's convention and introduces no version drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants