Skip to content

QA Evidence

Tal Muskal edited this page Jun 1, 2026 · 287 revisions

QA Evidence — Live Stack Test Matrix

Last updated: 2026-05-30

Legend: PASS = link to job, — = not yet tested

Vanilla Non-Interactive (NI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468FAIL
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes SKIPPED #485FAIL SKIPPED #485FAIL SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468FAIL
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS PASS PASS

Vanilla Bridged-Interactive (BI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS SKIPPED #491FAIL
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #485FAIL PASS
codex PASS SKIPPED #485FAIL SKIPPED #485FAIL
pi PASS SKIPPED #485FAIL PASS
gemini-cli SKIPPED #485FAIL SKIPPED #483 SKIPPED #485FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS SKIPPED #491FAIL
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468 — FAIL
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS SKIPPED #561FAIL SKIPPED #561FAIL

BP/Predefined — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS PASS SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
pi SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #485 SKIPPED #485
codex PASS SKIPPED #485FAIL SKIPPED #485FAIL
pi SKIPPED #485FAIL SKIPPED #485FAIL SKIPPED #485FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS SKIPPED #563FAILFAIL
codex PASS PASS PASS
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #489FAIL PASS
codex PASS SKIPPED #489FAIL SKIPPED #489FAIL
pi PASS SKIPPED #489FAIL SKIPPED #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS PASS SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Predefined — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
pi SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS SKIPPED #563FAIL
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #489FAIL SKIPPED #489FAIL SKIPPED #489FAIL
codex PASS SKIPPED #489FAIL SKIPPED #489FAIL
pi SKIPPED #489FAIL PASS SKIPPED #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Create — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS SKIPPED #484FAIL SKIPPED #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex PASS SKIPPED #487 � FAIL SKIPPED #487 � FAIL
pi SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS SKIPPED #563FAIL SKIPPED #563FAIL
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #484FAIL SKIPPED #484FAIL PASS
codex PASS PASS PASS
pi SKIPPED #484FAIL SKIPPED #484FAIL SKIPPED #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Create — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS SKIPPED #484FAIL
codex PASS PASS PASS
pi PASS SKIPPED #484FAIL SKIPPED #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
pi SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #484FAIL PASS SKIPPED #484FAIL
codex SKIPPED #489FAIL SKIPPED #484FAIL SKIPPED #484FAIL
pi PASS PASS SKIPPED #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Resume — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS SKIPPED #489FAIL PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL
codex SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL
pi SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi SKIPPED #489FAIL SKIPPED #489FAIL SKIPPED #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Resume — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS SKIPPED #490FAIL
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex SKIPPED #563FAIL SKIPPED #490FAIL SKIPPED #490FAIL
pi PASS PASS SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #563FAILFAIL SKIPPED #563FAIL
codex SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
pi SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #489FAIL PASS PASS
codex PASS SKIPPED #489FAIL PASS
pi PASS SKIPPED #489FAIL SKIPPED #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

Omni Agent (Internal Harness)

Omni uses its internal agent-core → agent-runtime → agent-platform stack to call models directly. Launched via amux launch omni <provider>. Status: Proxy chain works (agent-core → transport-mux → Azure foundry), but omni's yolo command creates a babysitter process from the prompt instead of executing a simple file-write task. The live-stack test prompt needs adaptation for omni's process-oriented workflow.

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
omni PASS PASS SKIPPED #615FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
omni PASS PASS SKIPPED #528

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
omni SKIPPED #485FAIL SKIPPED #528 SKIPPED

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
omni SKIPPED #615FAIL SKIPPED #528 SKIPPED

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
omni PASS PASS SKIPPED #528

Issues Status

Issue Summary Status
#258 gemini-cli file write (superseded by #341) Closed
#308 macOS BI PTY fallback FIXED
#311 Windows BP fixture setup FIXED
#312 BP/Resume hooks check FIXED
#313 BP claude hooks-mux FIXED
#339 claude-code BI intermittent FIXED (PR #427)
#340 BP bridged-hooks logs missing FIXED (a1f2d66)
#341 gemini-cli NI --yolo missing FIXED (9ecb285)
#368 BP/Create mode fails for claude+pi FIXED (PR #428)
#436 Cross-provider proxy fails pi+gemini with sonnet Closed
#468 hermes stdin + proxy routing PARTIAL — gemini-flash PASS, foundry needs hermes provider config fix
#482 gemini streaming tool schemas Merged PR #510
#483 gemini-cli NI proxy auth FIXED fbea902 — pinned to 0.43.0, verified all 3 OS
#484 BP/Create process generation Merged PR #506
#485 Sonnet — Anthropic credit exhaustion BLOCKED — needs Anthropic billing top-up
#486 gemini-cli BI PTY fallback Open
#487 mini BP model routing FIXED PR #493
#488 proxy response loop (Responses tool calls) Merged PR #492/#525
#489 DeepSeek BP timeout Merged PR #511
#490 hooks-mux shim resolution macOS/Windows FIXED PR #494 (verified all 3 OS)
#491 BI Windows mini/DeepSeek too slow for 600s timeout WONTFIX — performance characteristic, not a bug

Key Fixes Applied (staging branch, 2026-05-23)

Commit Fix
c72fb2b Test harness: shell: false on Windows (root cause of all Windows failures)
3a96afe Test harness: node -e mkdirSync for cross-platform dir creation
3f9dd43 Launch: restore direct .exe spawn for Bun binaries on Windows
2bafe47 Transport-mux: preserve tool_calls in OpenAI chat codec normalization
2a158d9 Transport-mux: add tool-call support to openAiChatStreamResponse
2dc3cb4 CI: remove agent skill dirs from workspace before live-stack tests
3a7a61c Launch: fix .cmd-to-.js resolution with %dp0% substitution
17463c1 Launch: resolve wrapper scripts on macOS/Linux for node-pty spawn
3ed3a18 CI: add gpt-5.4-mini model key to live-stack matrix
aeb77e1 Launch: bridge-interactive child_process fallback (output parsing + prompt injection)
98adc38 Test harness: cross-platform BP fixture setup (bash→node), shell:true for Windows
09a5cc8 Test harness: hooks-mux optional in interactive mode
5cf62d0 Launch: BI fallback prompt-in-args + SDK shell:true on Windows
cebff73 Bridge-hooks: invoke hooks-mux instead of babysitter directly
a1f2d66 CI: hooks-mux link pointed to dist/index.js (no-op) — fixed to dist/cli/main.js
9ecb285 Atlas: gemini-cli --yolo launch config for auto-approval (root cause of #341)
ca98429 Atlas: hermes --yolo launch config (was --auto-approve, wrong flag)
25ef6dd Atlas+catalog: omni agent as amux-launchable harness with live-stack support

Primary Full Tests — BP/Create Interactive

Target: all these combinations must PASS on all 3 OS.

Agent + Model Ubuntu macOS Windows
codex + gpt-5.5 PASS PASS PASS
claude-code + claude-sonnet-4-6 FAIL — Anthropic API error FAIL FAIL
gemini-cli + gemini-3.5-flash PASS PASS PASS
pi + gpt-5.5 PASS PASS PASS
hermes + DeepSeek-V4-Pro PASS PASS FAIL — CI token generation fails on Windows
omni + gpt-5.5 FAIL — agent-core+gpt-5.5 can't follow babysitter tool protocol FAIL FAIL

Primary Full Tests — BP/Resume Interactive

Target: all these combinations must PASS on all 3 OS.

Agent + Model Ubuntu macOS Windows
codex + gpt-5.5 pending --- ---
claude-code + claude-sonnet-4-6 FAIL (credits) FAIL (credits) ---
gemini-cli + gemini-3.5-flash FAIL FAIL FAIL
pi + gpt-5.5 FAIL FAIL ---
hermes + DeepSeek-V4-Pro FAIL FAIL ---
omni + gpt-5.5 FAIL FAIL FAIL

Clone this wiki locally