Skip to content

QA Evidence

Tal Muskal edited this page May 30, 2026 · 287 revisions

QA Evidence — Live Stack Test Matrix

Last updated: 2026-05-30

Legend: PASS = link to job, — = not yet tested

Vanilla Non-Interactive (NI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS blocked #468FAIL
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes SKIPPED #485FAIL SKIPPED #485FAIL blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS blocked #468FAIL
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

Vanilla Bridged-Interactive (BI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS blocked #491FAIL
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS blocked #483 PASS
hermes PASS PASS blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #485FAIL PASS
codex SKIPPED #485FAIL SKIPPED #485FAIL SKIPPED #485FAIL
pi PASS SKIPPED #485FAIL PASS
gemini-cli SKIPPED #485FAIL blocked #483 SKIPPED #485FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS blocked #491FAIL
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS blocked #468 — FAIL
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Predefined — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS PASS blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #487FAIL blocked #487FAIL blocked #487FAIL
codex blocked #487FAIL blocked #487FAIL blocked #487FAIL
pi blocked #487FAIL blocked #487FAIL blocked #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485FAIL SKIPPED #485 SKIPPED #485
codex SKIPPED #485FAIL SKIPPED #485FAIL SKIPPED #485FAIL
pi SKIPPED #485FAIL SKIPPED #485FAIL SKIPPED #485FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS SKIPPED #563FAILFAIL
codex PASS PASS PASS
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS blocked #489FAIL PASS
codex PASS blocked #489FAIL blocked #489FAIL
pi blocked #489FAIL blocked #489FAIL blocked #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Predefined — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #487FAIL blocked #487FAIL blocked #487FAIL
codex blocked #487FAIL blocked #487FAIL blocked #487FAIL
pi blocked #487FAIL blocked #487FAIL blocked #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS SKIPPED #563FAIL
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #489FAIL blocked #489FAIL blocked #489FAIL
codex PASS blocked #489FAIL blocked #489FAIL
pi blocked #489FAIL PASS blocked #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Create — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS blocked #484FAIL blocked #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #487FAIL blocked #487FAIL blocked #487FAIL
codex blocked #487FAIL blocked #487FAIL blocked #487FAIL
pi blocked #487FAIL blocked #487FAIL blocked #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS SKIPPED #563FAIL SKIPPED #563FAIL
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #484FAIL blocked #484FAIL PASS
codex PASS PASS PASS
pi blocked #484FAIL blocked #484FAIL blocked #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Create — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS blocked #484FAIL
codex PASS PASS PASS
pi PASS blocked #484FAIL blocked #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #487FAIL blocked #487FAIL blocked #487FAIL
codex blocked #487FAIL blocked #487FAIL blocked #487FAIL
pi blocked #487FAIL blocked #487FAIL blocked #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #484FAIL PASS blocked #484FAIL
codex blocked #489FAIL blocked #484FAIL blocked #484FAIL
pi PASS PASS blocked #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Resume — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS blocked #489FAIL PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #487FAIL blocked #487FAIL blocked #487FAIL
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL
codex SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL
pi SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi blocked #489FAIL blocked #489FAIL blocked #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Resume — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS blocked #490FAIL
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #487FAIL blocked #487FAIL blocked #487FAIL
codex SKIPPED #563FAIL blocked #490FAIL blocked #490FAIL
pi PASS PASS SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #563FAILFAIL SKIPPED #563FAIL
codex SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
pi SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code blocked #489FAIL PASS PASS
codex PASS blocked #489FAIL PASS
pi PASS blocked #489FAIL blocked #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes blocked #468 blocked #468 blocked #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

Omni Agent (Internal Harness)

Omni uses its internal agent-core → agent-runtime → agent-platform stack to call models directly. Launched via amux launch omni <provider>. Status: Proxy chain works (agent-core → transport-mux → Azure foundry), but omni's yolo command creates a babysitter process from the prompt instead of executing a simple file-write task. The live-stack test prompt needs adaptation for omni's process-oriented workflow.

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
omni PASS PASS blocked #615FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
omni PASS PASS blocked #528

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
omni SKIPPED #485FAIL blocked #528 SKIPPED

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
omni blocked #615FAIL blocked #528 SKIPPED

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
omni PASS PASS blocked #528

Issues Status

Issue Summary Status
#258 gemini-cli file write (superseded by #341) Closed
#308 macOS BI PTY fallback FIXED
#311 Windows BP fixture setup FIXED
#312 BP/Resume hooks check FIXED
#313 BP claude hooks-mux FIXED
#339 claude-code BI intermittent FIXED (PR #427)
#340 BP bridged-hooks logs missing FIXED (a1f2d66)
#341 gemini-cli NI --yolo missing FIXED (9ecb285)
#368 BP/Create mode fails for claude+pi FIXED (PR #428)
#436 Cross-provider proxy fails pi+gemini with sonnet Closed
#468 hermes stdin + proxy routing PARTIAL — gemini-flash PASS, foundry needs hermes provider config fix
#482 gemini streaming tool schemas Merged PR #510
#483 gemini-cli NI proxy auth FIXED fbea902 — pinned to 0.43.0, verified all 3 OS
#484 BP/Create process generation Merged PR #506
#485 Sonnet — Anthropic credit exhaustion BLOCKED — needs Anthropic billing top-up
#486 gemini-cli BI PTY fallback Open
#487 mini BP model routing FIXED PR #493
#488 proxy response loop (Responses tool calls) Merged PR #492/#525
#489 DeepSeek BP timeout Merged PR #511
#490 hooks-mux shim resolution macOS/Windows FIXED PR #494 (verified all 3 OS)
#491 BI Windows mini/DeepSeek too slow for 600s timeout WONTFIX — performance characteristic, not a bug

Key Fixes Applied (staging branch, 2026-05-23)

Commit Fix
c72fb2b Test harness: shell: false on Windows (root cause of all Windows failures)
3a96afe Test harness: node -e mkdirSync for cross-platform dir creation
3f9dd43 Launch: restore direct .exe spawn for Bun binaries on Windows
2bafe47 Transport-mux: preserve tool_calls in OpenAI chat codec normalization
2a158d9 Transport-mux: add tool-call support to openAiChatStreamResponse
2dc3cb4 CI: remove agent skill dirs from workspace before live-stack tests
3a7a61c Launch: fix .cmd-to-.js resolution with %dp0% substitution
17463c1 Launch: resolve wrapper scripts on macOS/Linux for node-pty spawn
3ed3a18 CI: add gpt-5.4-mini model key to live-stack matrix
aeb77e1 Launch: bridge-interactive child_process fallback (output parsing + prompt injection)
98adc38 Test harness: cross-platform BP fixture setup (bash→node), shell:true for Windows
09a5cc8 Test harness: hooks-mux optional in interactive mode
5cf62d0 Launch: BI fallback prompt-in-args + SDK shell:true on Windows
cebff73 Bridge-hooks: invoke hooks-mux instead of babysitter directly
a1f2d66 CI: hooks-mux link pointed to dist/index.js (no-op) — fixed to dist/cli/main.js
9ecb285 Atlas: gemini-cli --yolo launch config for auto-approval (root cause of #341)
ca98429 Atlas: hermes --yolo launch config (was --auto-approve, wrong flag)
25ef6dd Atlas+catalog: omni agent as amux-launchable harness with live-stack support

Clone this wiki locally