Skip to content

QA Evidence

Tal Muskal edited this page Jun 21, 2026 · 287 revisions

QA Evidence — Live Stack Test Matrix

Last updated: 2026-06-20

#936 RESOLVED (2026-06-11). genty live-stack is fixed at the root across 5 layers: (1) real tool-calling wired into genty-core (58cc4cc27), (2) the "Effect failed" JSON-parse spin on the in-process orchestration path (aa5637b12+a39848b6b), (3) agent non-convergence + the decisive toolless-delegated-worker bug, found via a local Azure repro (33a59a10e), (4)/(5) the run now anchors on <workspace>/.a5c/runs so completion-proof is discoverable — the genty CLI was eagerly defaulting the runs dir to global, defeating the workspace anchor (270de90c3+fe1aa68f8). genty · vanilla NI · gpt-5.5 GREEN on all 3 OSes (Ubuntu / macOS / Windows; all 7 checks pass). Dispatched the other 5 NI models (gpt-5.4-mini, sonnet, gemini-3.5-flash, gemini-3.1-pro, DeepSeek) across all 3 OSes — they all FAIL on the file-creation check ONLY (orchestration is fine: run-completion + completion-proof + tool-calling all pass; the model emits the content as ~12k-char text but never writes the target file). That is a model-adherence gap, NOT #936 → tracked in #956. genty BI/BP/interactive cells not yet dispatched.

Relabel (2026-06-13): all 114 stale SKIPPED [#936] genty cells were re-marked — (#936 fixed; redispatch) — #936 is closed, so they are no longer blocked by an open bug, just not-yet-redispatched. Expectation on redispatch: gpt-5.5 cells pass; weak-model NI cells are capped by #956 (file-creation/journal-events ceiling); genty BI/BP-mode cells use a different path (genty call --process) and are unverified. Cells are flipped to PASS only on a real passing run, never inferred.

Relabel (2026-06-16): the 41 hermes-Windows SKIPPED [#468] cells were re-pointed to #856 — #468 (the cross-platform -z headless 0-output bug) is closed/fixed (hermes passes on Ubuntu/macOS), but hermes-Windows is genuinely blocked by the OPEN #856: hermes accepts prompts only via stdin and Windows prompt_toolkit needs a real console → ConPTY (>60 min overhead) or stdin never arrives. So these cells stay SKIPPED, now against the correct open root cause. Fix needs an upstream hermes --prompt/headless mode (or accept the skip, like cursor/copilot).

Verified 2026-06-10 via run 27259214362 (complete): 12 PASS / 11 FAIL. One regression: Vanilla NI · DeepSeek-V4-Pro · pi · Ubuntu flipped PASS→FAIL — reproducible (failed 2×, ~3 min; SIGTERM/timeout with tool calls returning "(no output)"; NI-mode-specific, BI passes) → tracked in #954. All other failures were known-skipped genty (#936) / antigravity (#945). The late-finishing PASS cells (hermes/gpt-5.5 + pi/DeepSeek-V4-Pro, bp/create interactive) are untracked combos (matrix tracks hermes+DeepSeek for BP/Create; hermes Windows SKIPs are #856 ConPTY) — no cell change. Credit cells await Anthropic billing (#485).

Legend: PASS = link to job, — = not yet tested

Published live-stack GREEN (2026-06-20). The matrix below tracks the source-build workflow (live-stack.yml). Separately, the published workflow (live-stack-published.yml) — which installs the real npm-published packages + documented marketplace plugin — is now fully green for the gpt-5.5 vanilla-NI lanes on Ubuntu (run 27870032695, all jobs success): claude-code · codex · pi Published vanilla-NI PASS, plus Documented Install (claude-code + codex) PASS (the #960 documented plugin marketplace add … @staging install + hook-load). Root-cause fixes landed this round:

  • #960 native plugin install — removed a fabricated BeforePromptBuild claude hook key that broke hook load (066218fc5); claude @<channel> ref + codex url-source schema. documented_install verifies it every run.
  • claude harness-CLI install flaky (~3/4 red) — @anthropic-ai/claude-code postinstall spawn sh ENOENT → half-installed → launch fails. Fix: NPM_CONFIG_SCRIPT_SHELL=/bin/bash + guaranteed system PATH for the install subprocess (d4df91fc7), plus install-resilience + non-fatal-install + install-check-trusts-model-reached.
  • pi harness-CLI install — catalog node pi:0-78-1 mis-set to install:github-release (empty target); corrected to install:npm + @earendil-works/pi-coding-agent (260347880).
  • CI signal — empty interactive/BP matrices now skip instead of failing the run (b362bbfbd). Broader published lanes (2026-06-20) surfaced — and then FIXED — two separate, pre-existing blockers (not the harness-CLI install): (a) macOS — every job died at vitest startup with Cannot find module '@rolldown/binding-darwin-arm64' (vitest-4 rolldown native binding skipped by npm optional-deps bug #4828); fix installs the per-runner rolldown binding explicitly in both deps steps (9398fe082) → macOS now fully GREEN (run 27881427274: documented_install claude+codex + vanilla claude/codex/pi all pass). (b) BP/predefined claude-code interactive (Ubuntu)babysitter harness:install-plugin died with spawn babysitter ENOENT because the BP setup re-installed babysitter-sdk from LOCAL unbuilt source, clobbering the global babysitter bin with a dangling dist/ symlink; fix gates local-source installs behind LIVE_STACK_PUBLISHED_PACKAGES so the published globals are used (e9f6595c4) → BP claude lane now GREEN (run 27881158028), making it a genuine published-package runtime test of the plugin (not the prior local-source false-confidence path).

Windows (2026-06-20): the published CLIs (claude/codex/adapters) are .cmd shims the child-process spawner couldn't resolve without a shell → every Windows job died spawn <cmd> ENOENT. Fix routes them through a shell on Windows (279628ee5). Result (run 27882765161): Windows documented_install (claude+codex) now GREEN — so the #960 documented install is validated on all 3 OSes. Windows vanilla lanes initially failed file-creation only (single turn, proxy=1) — root-caused (#996) to a prompt-delivery regression: shell:true passes args UNQUOTED to cmd.exe, splitting --prompt "Write about Homer's Odyssey …" at the first space so the agent saw only "Write". Fix routes .cmd shims via cmd.exe /d /s /c with shell:false (335a7cc7d). Result (run 27893680969): Windows fully GREEN — documented_install (claude+codex) + vanilla (claude/codex/pi). #996 closed. The published live-stack is now GREEN on all three OSes (Ubuntu/macOS/Windows).

Vanilla Non-Interactive (NI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #856FAIL
genty PASS PASS PASS
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #856
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode FAIL --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes BLOCKED (credits) #485 --- SKIPPED #856
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #856
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- FAIL

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #856
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi MODEL-FLAKY #954 — DeepSeek empty-turn after tool result; transport/codec proven sound (gpt-5.5 100% same path); passes macOS/Windows PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #856FAIL
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS PASS PASS

Vanilla Bridged-Interactive (BI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS BLOCKED (credits) #485 PASS
codex PASS BLOCKED (credits) #485 ---
pi PASS BLOCKED (credits) #485 PASS
gemini-cli BLOCKED (credits) #485 --- ---
hermes BLOCKED (credits) #485 --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode BLOCKED (credits) #485 --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli --- --- ---
hermes --- --- ---
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #856 — FAIL
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS --- ---

BP/Predefined — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli FAIL --- ---
hermes PASS PASS SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode FAIL --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code FAIL --- ---
codex FAIL PASS ---
pi FAIL --- ---
gemini-cli FAIL --- ---
hermes FAIL --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode FAIL --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi BLOCKED (credits) #485 --- ---
gemini-cli --- --- ---
hermes BLOCKED (credits) #485 --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS PASS
pi FAIL --- ---
gemini-cli --- PASS ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
codex PASS --- ---
gemini-cli --- --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS --- PASS
codex PASS --- ---
pi PASS PASS ---
gemini-cli --- --- ---
hermes PASS PASS SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Predefined — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli FAIL --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode FAIL --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code FAIL --- ---
codex FAIL PASS ---
pi FAIL --- ---
gemini-cli FAIL --- ---
hermes FAIL --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code BLOCKED (credits) #485 --- ---
codex BLOCKED (credits) #485 --- ---
pi BLOCKED (credits) #485 --- ---
gemini-cli --- --- ---
hermes BLOCKED (credits) #485 --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS ---
pi --- --- ---
gemini-cli --- PASS ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code --- --- ---
codex PASS PASS ---
pi --- --- ---
gemini-cli --- --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code --- --- ---
codex PASS PASS ---
pi --- PASS ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Create — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS ---
gemini-cli --- --- ---
hermes PASS PASS SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code FAIL --- ---
codex PASS --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes FAIL --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code BLOCKED (credits) #485 --- ---
codex BLOCKED (credits) #485 --- ---
pi BLOCKED (credits) #485 --- ---
gemini-cli --- --- ---
hermes BLOCKED (credits) #485 --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi FAIL --- ---
gemini-cli --- --- ---
hermes FAIL --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS ---
pi --- --- ---
gemini-cli PASS PASS ---
hermes --- --- ---
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---
Agent Ubuntu macOS Windows
claude-code PASS --- PASS
codex PASS PASS PASS
pi PASS --- ---
gemini-cli --- --- ---
hermes PASS --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Create — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS PASS
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code FAIL --- ---
codex FAIL --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code BLOCKED (credits) #485 --- ---
codex BLOCKED (credits) #485 --- ---
pi BLOCKED (credits) #485 --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex --- --- ---
pi --- --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code --- --- ---
codex --- --- ---
pi --- --- ---
gemini-cli PASS --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code --- PASS ---
codex --- --- ---
pi PASS PASS ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Resume — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS --- PASS
gemini-cli --- --- ---
hermes PASS --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex FAIL --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex --- --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Resume — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS ---
pi PASS PASS PASS
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi PASS PASS ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex --- --- ---
pi --- --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex --- --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code --- PASS PASS
codex PASS --- PASS
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #856
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

Tula Agent (Internal Harness)

Tula uses its internal agent-core → agent-runtime → agent-platform stack to call models directly. Launched via amux launch tula <provider>. Status: Proxy chain works (agent-core → transport-mux → Azure foundry), but tula's yolo command creates a babysitter process from the prompt instead of executing a simple file-write task. The live-stack test prompt needs adaptation for tula's process-oriented workflow.

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
genty PASS PASS ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
genty PASS PASS ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
genty --- --- SKIPPED

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
genty --- --- SKIPPED

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
genty — (#936 fixed; redispatch) — (#936 fixed; redispatch) — (#936 fixed; redispatch)

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
genty PASS PASS ---

Issues Status

Issue Summary Status
#258 gemini-cli file write (superseded by #341) Closed
#308 macOS BI PTY fallback FIXED
#311 Windows BP fixture setup FIXED
#312 BP/Resume hooks check FIXED
#313 BP claude hooks-mux FIXED
#339 claude-code BI intermittent FIXED (PR #427)
#340 BP bridged-hooks logs missing FIXED (a1f2d66)
#341 gemini-cli NI --yolo missing FIXED (9ecb285)
#368 BP/Create mode fails for claude+pi FIXED (PR #428)
#436 Cross-provider proxy fails pi+gemini with sonnet Closed
#856 hermes stdin + proxy routing PARTIAL — gemini-flash PASS, foundry needs hermes provider config fix
#482 gemini streaming tool schemas Merged PR #510
#483 gemini-cli NI proxy auth FIXED fbea902 — pinned to 0.43.0, verified all 3 OS
#484 BP/Create process generation Merged PR #506
#485 Sonnet — Anthropic credit exhaustion BLOCKED — needs Anthropic billing top-up
#486 gemini-cli BI PTY fallback Open
#487 mini BP model routing FIXED PR #493
#488 proxy response loop (Responses tool calls) Merged PR #492/#525
#489 DeepSeek BP timeout Merged PR #511
#490 hooks-mux shim resolution macOS/Windows FIXED PR #494 (verified all 3 OS)
#491 BI Windows mini/DeepSeek too slow for 600s timeout WONTFIX — performance characteristic, not a bug

Key Fixes Applied (staging branch, 2026-05-23)

Commit Fix
c72fb2b Test harness: shell: false on Windows (root cause of all Windows failures)
3a96afe Test harness: node -e mkdirSync for cross-platform dir creation
3f9dd43 Launch: restore direct .exe spawn for Bun binaries on Windows
2bafe47 Transport-mux: preserve tool_calls in OpenAI chat codec normalization
2a158d9 Transport-mux: add tool-call support to openAiChatStreamResponse
2dc3cb4 CI: remove agent skill dirs from workspace before live-stack tests
3a7a61c Launch: fix .cmd-to-.js resolution with %dp0% substitution
17463c1 Launch: resolve wrapper scripts on macOS/Linux for node-pty spawn
3ed3a18 CI: add gpt-5.4-mini model key to live-stack matrix
aeb77e1 Launch: bridge-interactive child_process fallback (output parsing + prompt injection)
98adc38 Test harness: cross-platform BP fixture setup (bash→node), shell:true for Windows
09a5cc8 Test harness: hooks-mux optional in interactive mode
5cf62d0 Launch: BI fallback prompt-in-args + SDK shell:true on Windows
cebff73 Bridge-hooks: invoke hooks-mux instead of babysitter directly
a1f2d66 CI: hooks-mux link pointed to dist/index.js (no-op) — fixed to dist/cli/main.js
9ecb285 Atlas: gemini-cli --yolo launch config for auto-approval (root cause of #341)
ca98429 Atlas: hermes --yolo launch config (was --auto-approve, wrong flag)
25ef6dd Atlas+catalog: genty agent as adapters-launchable harness with live-stack support

Primary Full Tests — BP/Create Interactive

Target: all these combinations must PASS on all 3 OS.

Agent + Model Ubuntu macOS Windows
codex + gpt-5.5 PASS PASS PASS
claude-code + gpt-5.5 PASS PASS PASS
gemini-cli + gemini-3.5-flash PASS PASS PASS
pi + gpt-5.5 PASS PASS PASS
hermes + DeepSeek-V4-Pro PASS PASS SKIPPED — ConPTY >60 min (hermes Windows BP needs native stdin)
genty + gpt-5.5 PASS PASS PASS

Primary Full Tests — BP/Resume Interactive

Target: all these combinations must PASS on all 3 OS.

Agent + Model Ubuntu macOS Windows
codex + gpt-5.5 PASS PASS PASS
claude-code + gpt-5.5 PASS PASS PASS
gemini-cli + gemini-3.5-flash PASS PASS PASS
pi + gpt-5.5 PASS PASS PASS
hermes + DeepSeek-V4-Pro PASS PASS PASS
genty + gpt-5.5 PASS PASS PASS

Clone this wiki locally