-
Notifications
You must be signed in to change notification settings - Fork 81
QA Evidence
Last updated: 2026-06-20
#936 RESOLVED (2026-06-11). genty live-stack is fixed at the root across 5 layers: (1) real tool-calling wired into genty-core (
58cc4cc27), (2) the "Effect failed" JSON-parse spin on the in-process orchestration path (aa5637b12+a39848b6b), (3) agent non-convergence + the decisive toolless-delegated-worker bug, found via a local Azure repro (33a59a10e), (4)/(5) the run now anchors on<workspace>/.a5c/runsso completion-proof is discoverable — the genty CLI was eagerly defaulting the runs dir to global, defeating the workspace anchor (270de90c3+fe1aa68f8). genty · vanilla NI · gpt-5.5 GREEN on all 3 OSes (Ubuntu / macOS / Windows; all 7 checks pass). Dispatched the other 5 NI models (gpt-5.4-mini, sonnet, gemini-3.5-flash, gemini-3.1-pro, DeepSeek) across all 3 OSes — they all FAIL on thefile-creationcheck ONLY (orchestration is fine: run-completion + completion-proof + tool-calling all pass; the model emits the content as ~12k-char text but never writes the target file). That is a model-adherence gap, NOT #936 → tracked in #956. genty BI/BP/interactive cells not yet dispatched.Relabel (2026-06-13): all 114 stale
SKIPPED [#936]genty cells were re-marked— (#936 fixed; redispatch)— #936 is closed, so they are no longer blocked by an open bug, just not-yet-redispatched. Expectation on redispatch: gpt-5.5 cells pass; weak-model NI cells are capped by #956 (file-creation/journal-events ceiling); genty BI/BP-mode cells use a different path (genty call --process) and are unverified. Cells are flipped to PASS only on a real passing run, never inferred.Relabel (2026-06-16): the 41 hermes-Windows
SKIPPED [#468]cells were re-pointed to #856 — #468 (the cross-platform-zheadless 0-output bug) is closed/fixed (hermes passes on Ubuntu/macOS), but hermes-Windows is genuinely blocked by the OPEN #856: hermes accepts prompts only via stdin and Windows prompt_toolkit needs a real console → ConPTY (>60 min overhead) or stdin never arrives. So these cells stay SKIPPED, now against the correct open root cause. Fix needs an upstream hermes--prompt/headless mode (or accept the skip, like cursor/copilot).Verified 2026-06-10 via run 27259214362 (complete): 12 PASS / 11 FAIL. One regression: Vanilla NI · DeepSeek-V4-Pro · pi · Ubuntu flipped PASS→FAIL — reproducible (failed 2×, ~3 min; SIGTERM/timeout with tool calls returning "(no output)"; NI-mode-specific, BI passes) → tracked in #954. All other failures were known-skipped genty (#936) / antigravity (#945). The late-finishing PASS cells (hermes/gpt-5.5 + pi/DeepSeek-V4-Pro, bp/create interactive) are untracked combos (matrix tracks hermes+DeepSeek for BP/Create; hermes Windows SKIPs are #856 ConPTY) — no cell change. Credit cells await Anthropic billing (#485).
Legend: PASS = link to job, — = not yet tested
Published live-stack GREEN (2026-06-20). The matrix below tracks the source-build workflow (
live-stack.yml). Separately, the published workflow (live-stack-published.yml) — which installs the real npm-published packages + documented marketplace plugin — is now fully green for the gpt-5.5 vanilla-NI lanes on Ubuntu (run 27870032695, all jobs success): claude-code · codex · pi Published vanilla-NI PASS, plus Documented Install (claude-code + codex) PASS (the #960 documentedplugin marketplace add … @staginginstall + hook-load). Root-cause fixes landed this round:
- #960 native plugin install — removed a fabricated
BeforePromptBuildclaude hook key that broke hook load (066218fc5); claude@<channel>ref + codexurl-source schema. documented_install verifies it every run.- claude harness-CLI install flaky (~3/4 red) —
@anthropic-ai/claude-codepostinstallspawn sh ENOENT→ half-installed → launch fails. Fix:NPM_CONFIG_SCRIPT_SHELL=/bin/bash+ guaranteed system PATH for the install subprocess (d4df91fc7), plus install-resilience + non-fatal-install + install-check-trusts-model-reached.- pi harness-CLI install — catalog node
pi:0-78-1mis-set toinstall:github-release(empty target); corrected toinstall:npm+@earendil-works/pi-coding-agent(260347880).- CI signal — empty interactive/BP matrices now skip instead of failing the run (
b362bbfbd). Broader published lanes (2026-06-20) surfaced — and then FIXED — two separate, pre-existing blockers (not the harness-CLI install): (a) macOS — every job died at vitest startup withCannot find module '@rolldown/binding-darwin-arm64'(vitest-4 rolldown native binding skipped by npm optional-deps bug #4828); fix installs the per-runner rolldown binding explicitly in both deps steps (9398fe082) → macOS now fully GREEN (run 27881427274: documented_install claude+codex + vanilla claude/codex/pi all pass). (b) BP/predefined claude-code interactive (Ubuntu) —babysitter harness:install-plugindied withspawn babysitter ENOENTbecause the BP setup re-installed babysitter-sdk from LOCAL unbuilt source, clobbering the globalbabysitterbin with a dangling dist/ symlink; fix gates local-source installs behindLIVE_STACK_PUBLISHED_PACKAGESso the published globals are used (e9f6595c4) → BP claude lane now GREEN (run 27881158028), making it a genuine published-package runtime test of the plugin (not the prior local-source false-confidence path).Windows (2026-06-20): the published CLIs (
claude/codex/adapters) are.cmdshims the child-process spawner couldn't resolve without a shell → every Windows job diedspawn <cmd> ENOENT. Fix routes them through a shell on Windows (279628ee5). Result (run 27882765161): Windows documented_install (claude+codex) now GREEN — so the #960 documented install is validated on all 3 OSes. Windows vanilla lanes initially failedfile-creationonly (single turn, proxy=1) — root-caused (#996) to a prompt-delivery regression:shell:truepasses args UNQUOTED to cmd.exe, splitting--prompt "Write about Homer's Odyssey …"at the first space so the agent saw only "Write". Fix routes.cmdshims viacmd.exe /d /s /cwith shell:false (335a7cc7d). Result (run 27893680969): Windows fully GREEN — documented_install (claude+codex) + vanilla (claude/codex/pi). #996 closed. The published live-stack is now GREEN on all three OSes (Ubuntu/macOS/Windows).
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #856 — FAIL |
| genty | PASS | PASS | PASS |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #856 |
| genty | FAIL #956 | FAIL #956 | FAIL #956 |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | FAIL | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | BLOCKED (credits) #485 | --- | SKIPPED #856 |
| genty | FAIL #956 | FAIL #956 | FAIL #956 |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | PASS | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #856 |
| genty | FAIL #956 | FAIL #956 | FAIL #956 |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #856 |
| genty | FAIL #956 | FAIL #956 | FAIL #956 |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | --- | --- | --- |
| copilot-cli | --- | --- | --- |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | MODEL-FLAKY #954 — DeepSeek empty-turn after tool result; transport/codec proven sound (gpt-5.5 100% same path); passes macOS/Windows | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #856 — FAIL |
| genty | FAIL #956 | FAIL #956 | FAIL #956 |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | PASS | PASS | PASS |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | --- |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | BLOCKED (credits) #485 | PASS |
| codex | PASS | BLOCKED (credits) #485 | --- |
| pi | PASS | BLOCKED (credits) #485 | PASS |
| gemini-cli | BLOCKED (credits) #485 | --- | --- |
| hermes | BLOCKED (credits) #485 | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | BLOCKED (credits) #485 | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | --- |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | --- | --- | --- |
| copilot-cli | --- | --- | --- |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | --- |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #856 — FAIL |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | PASS | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | FAIL | --- | --- |
| hermes | PASS | PASS | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | FAIL | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | FAIL | --- | --- |
| codex | FAIL | PASS | --- |
| pi | FAIL | --- | --- |
| gemini-cli | FAIL | --- | --- |
| hermes | FAIL | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | FAIL | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | PASS | --- | --- |
| pi | BLOCKED (credits) #485 | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | BLOCKED (credits) #485 | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | --- |
| codex | PASS | PASS | PASS |
| pi | FAIL | --- | --- |
| gemini-cli | --- | PASS | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | PASS | --- | --- |
| codex | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | --- |
| cursor-cli | --- | --- | --- |
| copilot-cli | --- | --- | --- |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | PASS |
| codex | PASS | --- | --- |
| pi | PASS | PASS | --- |
| gemini-cli | --- | --- | --- |
| hermes | PASS | PASS | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | FAIL | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | FAIL | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | FAIL | --- | --- |
| codex | FAIL | PASS | --- |
| pi | FAIL | --- | --- |
| gemini-cli | FAIL | --- | --- |
| hermes | FAIL | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | BLOCKED (credits) #485 | --- | --- |
| codex | BLOCKED (credits) #485 | --- | --- |
| pi | BLOCKED (credits) #485 | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | BLOCKED (credits) #485 | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | --- |
| pi | --- | --- | --- |
| gemini-cli | --- | PASS | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | --- | --- | --- |
| codex | PASS | PASS | --- |
| pi | --- | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | --- |
| cursor-cli | --- | --- | --- |
| copilot-cli | --- | --- | --- |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | --- | --- | --- |
| codex | PASS | PASS | --- |
| pi | --- | PASS | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | --- |
| gemini-cli | --- | --- | --- |
| hermes | PASS | PASS | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | FAIL | --- | --- |
| codex | PASS | --- | --- |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | FAIL | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | BLOCKED (credits) #485 | --- | --- |
| codex | BLOCKED (credits) #485 | --- | --- |
| pi | BLOCKED (credits) #485 | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | BLOCKED (credits) #485 | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | PASS | --- | --- |
| pi | FAIL | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | FAIL | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | --- |
| codex | PASS | PASS | --- |
| pi | --- | --- | --- |
| gemini-cli | PASS | PASS | --- |
| hermes | --- | --- | --- |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | --- | --- | --- |
| copilot-cli | --- | --- | --- |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | PASS | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | --- |
| codex | PASS | PASS | PASS |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | FAIL | --- | --- |
| codex | FAIL | --- | --- |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | BLOCKED (credits) #485 | --- | --- |
| codex | BLOCKED (credits) #485 | --- | --- |
| pi | BLOCKED (credits) #485 | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | --- | --- | --- |
| pi | --- | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | --- | --- | --- |
| codex | --- | --- | --- |
| pi | --- | --- | --- |
| gemini-cli | PASS | --- | --- |
| hermes | --- | --- | --- |
| cursor-cli | --- | --- | --- |
| copilot-cli | --- | --- | --- |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | --- | PASS | --- |
| codex | --- | --- | --- |
| pi | PASS | PASS | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | --- | PASS |
| gemini-cli | --- | --- | --- |
| hermes | PASS | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | PASS | --- | --- |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | FAIL | --- | --- |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | --- | --- | --- |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | --- |
| cursor-cli | --- | --- | --- |
| copilot-cli | --- | --- | --- |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | --- |
| pi | PASS | PASS | PASS |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | PASS | --- | --- |
| pi | PASS | PASS | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | PASS | --- | --- |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | --- | --- | --- |
| pi | --- | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | --- | --- |
| codex | --- | --- | --- |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | --- |
| cursor-cli | --- | --- | --- |
| copilot-cli | --- | --- | --- |
| opencode | --- | --- | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | --- | PASS | PASS |
| codex | PASS | --- | PASS |
| pi | PASS | --- | --- |
| gemini-cli | --- | --- | --- |
| hermes | --- | --- | SKIPPED #856 |
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| antigravity | SKIPPED #945 | SKIPPED #945 | SKIPPED #945 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | --- | --- | --- |
Tula uses its internal agent-core → agent-runtime → agent-platform stack to call models directly. Launched via
amux launch tula <provider>. Status: Proxy chain works (agent-core → transport-mux → Azure foundry), but tula'syolocommand creates a babysitter process from the prompt instead of executing a simple file-write task. The live-stack test prompt needs adaptation for tula's process-oriented workflow.
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| genty | PASS | PASS | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| genty | PASS | PASS | --- |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| genty | --- | --- | SKIPPED |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| genty | --- | --- | SKIPPED |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| genty | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) | — (#936 fixed; redispatch) |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| genty | PASS | PASS | --- |
| Issue | Summary | Status |
|---|---|---|
| #258 | gemini-cli file write (superseded by #341) | Closed |
| #308 | macOS BI PTY fallback | FIXED |
| #311 | Windows BP fixture setup | FIXED |
| #312 | BP/Resume hooks check | FIXED |
| #313 | BP claude hooks-mux | FIXED |
| #339 | claude-code BI intermittent | FIXED (PR #427) |
| #340 | BP bridged-hooks logs missing |
FIXED (a1f2d66) |
| #341 | gemini-cli NI --yolo missing |
FIXED (9ecb285) |
| #368 | BP/Create mode fails for claude+pi | FIXED (PR #428) |
| #436 | Cross-provider proxy fails pi+gemini with sonnet | Closed |
| #856 | hermes stdin + proxy routing | PARTIAL — gemini-flash PASS, foundry needs hermes provider config fix |
| #482 | gemini streaming tool schemas | Merged PR #510 |
| #483 | gemini-cli NI proxy auth |
FIXED fbea902 — pinned to 0.43.0, verified all 3 OS |
| #484 | BP/Create process generation | Merged PR #506 |
| #485 | Sonnet — Anthropic credit exhaustion | BLOCKED — needs Anthropic billing top-up |
| #486 | gemini-cli BI PTY fallback | Open |
| #487 | mini BP model routing | FIXED PR #493 |
| #488 | proxy response loop (Responses tool calls) | Merged PR #492/#525 |
| #489 | DeepSeek BP timeout | Merged PR #511 |
| #490 | hooks-mux shim resolution macOS/Windows | FIXED PR #494 (verified all 3 OS) |
| #491 | BI Windows mini/DeepSeek too slow for 600s timeout | WONTFIX — performance characteristic, not a bug |
| Commit | Fix |
|---|---|
c72fb2b |
Test harness: shell: false on Windows (root cause of all Windows failures) |
3a96afe |
Test harness: node -e mkdirSync for cross-platform dir creation |
3f9dd43 |
Launch: restore direct .exe spawn for Bun binaries on Windows |
2bafe47 |
Transport-mux: preserve tool_calls in OpenAI chat codec normalization |
2a158d9 |
Transport-mux: add tool-call support to openAiChatStreamResponse |
2dc3cb4 |
CI: remove agent skill dirs from workspace before live-stack tests |
3a7a61c |
Launch: fix .cmd-to-.js resolution with %dp0% substitution |
17463c1 |
Launch: resolve wrapper scripts on macOS/Linux for node-pty spawn |
3ed3a18 |
CI: add gpt-5.4-mini model key to live-stack matrix |
aeb77e1 |
Launch: bridge-interactive child_process fallback (output parsing + prompt injection) |
98adc38 |
Test harness: cross-platform BP fixture setup (bash→node), shell:true for Windows |
09a5cc8 |
Test harness: hooks-mux optional in interactive mode |
5cf62d0 |
Launch: BI fallback prompt-in-args + SDK shell:true on Windows |
cebff73 |
Bridge-hooks: invoke hooks-mux instead of babysitter directly |
a1f2d66 |
CI: hooks-mux link pointed to dist/index.js (no-op) — fixed to dist/cli/main.js |
9ecb285 |
Atlas: gemini-cli --yolo launch config for auto-approval (root cause of #341) |
ca98429 |
Atlas: hermes --yolo launch config (was --auto-approve, wrong flag) |
25ef6dd |
Atlas+catalog: genty agent as adapters-launchable harness with live-stack support |
Target: all these combinations must PASS on all 3 OS.
| Agent + Model | Ubuntu | macOS | Windows |
|---|---|---|---|
| codex + gpt-5.5 | PASS | PASS | PASS |
| claude-code + gpt-5.5 | PASS | PASS | PASS |
| gemini-cli + gemini-3.5-flash | PASS | PASS | PASS |
| pi + gpt-5.5 | PASS | PASS | PASS |
| hermes + DeepSeek-V4-Pro | PASS | PASS | SKIPPED — ConPTY >60 min (hermes Windows BP needs native stdin) |
| genty + gpt-5.5 | PASS | PASS | PASS |
Target: all these combinations must PASS on all 3 OS.
| Agent + Model | Ubuntu | macOS | Windows |
|---|---|---|---|
| codex + gpt-5.5 | PASS | PASS | PASS |
| claude-code + gpt-5.5 | PASS | PASS | PASS |
| gemini-cli + gemini-3.5-flash | PASS | PASS | PASS |
| pi + gpt-5.5 | PASS | PASS | PASS |
| hermes + DeepSeek-V4-Pro | PASS | PASS | PASS |
| genty + gpt-5.5 | PASS | PASS | PASS |