Skip to content

QA Evidence

Tal Muskal edited this page Jun 2, 2026 · 287 revisions

QA Evidence — Live Stack Test Matrix

Last updated: 2026-05-30

Legend: PASS = link to job, — = not yet tested

Vanilla Non-Interactive (NI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468FAIL
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes SKIPPED #485FAIL SKIPPED #485FAIL SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468FAIL
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS PASS PASS

Vanilla Bridged-Interactive (BI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS SKIPPED #491FAIL
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #485FAIL PASS
codex PASS SKIPPED #485FAIL SKIPPED #485FAIL
pi PASS SKIPPED #485FAIL PASS
gemini-cli SKIPPED #485FAIL SKIPPED #483 SKIPPED #485FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS SKIPPED #491FAIL
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468 — FAIL
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS SKIPPED #561FAIL SKIPPED #561FAIL

BP/Predefined — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS PASS SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
pi SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #485 SKIPPED #485
codex PASS SKIPPED #485FAIL SKIPPED #485FAIL
pi SKIPPED #485FAIL SKIPPED #485FAIL SKIPPED #485FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS SKIPPED #563FAILFAIL
codex PASS PASS PASS
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #489FAIL PASS
codex PASS SKIPPED #489FAIL SKIPPED #489FAIL
pi PASS SKIPPED #489FAIL SKIPPED #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS PASS SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Predefined — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
pi SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS SKIPPED #563FAIL
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #489FAIL SKIPPED #489FAIL SKIPPED #489FAIL
codex PASS SKIPPED #489FAIL SKIPPED #489FAIL
pi SKIPPED #489FAIL PASS SKIPPED #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Create — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS SKIPPED #484FAIL SKIPPED #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex PASS SKIPPED #487 � FAIL SKIPPED #487 � FAIL
pi SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS SKIPPED #563FAIL SKIPPED #563FAIL
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #484FAIL SKIPPED #484FAIL PASS
codex PASS PASS PASS
pi SKIPPED #484FAIL SKIPPED #484FAIL SKIPPED #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Create — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS SKIPPED #484FAIL
codex PASS PASS PASS
pi PASS SKIPPED #484FAIL SKIPPED #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
pi SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
pi SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #484FAIL PASS SKIPPED #484FAIL
codex SKIPPED #489FAIL SKIPPED #484FAIL SKIPPED #484FAIL
pi PASS PASS SKIPPED #484FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Resume — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS SKIPPED #489FAIL PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes PASS SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL
codex SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL
pi SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi SKIPPED #489FAIL SKIPPED #489FAIL SKIPPED #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

BP/Resume — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS SKIPPED #490FAIL
pi PASS PASS PASS
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #487FAIL SKIPPED #487FAIL SKIPPED #487FAIL
codex SKIPPED #563FAIL SKIPPED #490FAIL SKIPPED #490FAIL
pi PASS PASS SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code SKIPPED #485 SKIPPED #485 SKIPPED #485
codex SKIPPED #485 SKIPPED #485 SKIPPED #485
pi SKIPPED #485 SKIPPED #485 SKIPPED #485
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS SKIPPED #563FAILFAIL SKIPPED #563FAIL
codex SKIPPED #563FAILFAIL SKIPPED #563FAILFAIL SKIPPED #563FAIL
pi SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code SKIPPED #489FAIL PASS PASS
codex PASS SKIPPED #489FAIL PASS
pi PASS SKIPPED #489FAIL SKIPPED #489FAIL
gemini-cli SKIPPED #563FAIL SKIPPED #563FAIL SKIPPED #563FAIL
hermes SKIPPED #468 SKIPPED #468 SKIPPED #468
tula --- --- ---
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode SKIPPED #561FAIL SKIPPED #561FAIL SKIPPED #561FAIL

Tula Agent (Internal Harness)

Tula uses its internal agent-core → agent-runtime → agent-platform stack to call models directly. Launched via amux launch tula <provider>. Status: Proxy chain works (agent-core → transport-mux → Azure foundry), but tula's yolo command creates a babysitter process from the prompt instead of executing a simple file-write task. The live-stack test prompt needs adaptation for tula's process-oriented workflow.

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
tula PASS PASS SKIPPED #615FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
tula PASS PASS SKIPPED #528

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
tula SKIPPED #485FAIL SKIPPED #528 SKIPPED

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
tula SKIPPED #615FAIL SKIPPED #528 SKIPPED

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
tula PASS PASS SKIPPED #528

Issues Status

Issue Summary Status
#258 gemini-cli file write (superseded by #341) Closed
#308 macOS BI PTY fallback FIXED
#311 Windows BP fixture setup FIXED
#312 BP/Resume hooks check FIXED
#313 BP claude hooks-mux FIXED
#339 claude-code BI intermittent FIXED (PR #427)
#340 BP bridged-hooks logs missing FIXED (a1f2d66)
#341 gemini-cli NI --yolo missing FIXED (9ecb285)
#368 BP/Create mode fails for claude+pi FIXED (PR #428)
#436 Cross-provider proxy fails pi+gemini with sonnet Closed
#468 hermes stdin + proxy routing PARTIAL — gemini-flash PASS, foundry needs hermes provider config fix
#482 gemini streaming tool schemas Merged PR #510
#483 gemini-cli NI proxy auth FIXED fbea902 — pinned to 0.43.0, verified all 3 OS
#484 BP/Create process generation Merged PR #506
#485 Sonnet — Anthropic credit exhaustion BLOCKED — needs Anthropic billing top-up
#486 gemini-cli BI PTY fallback Open
#487 mini BP model routing FIXED PR #493
#488 proxy response loop (Responses tool calls) Merged PR #492/#525
#489 DeepSeek BP timeout Merged PR #511
#490 hooks-mux shim resolution macOS/Windows FIXED PR #494 (verified all 3 OS)
#491 BI Windows mini/DeepSeek too slow for 600s timeout WONTFIX — performance characteristic, not a bug

Key Fixes Applied (staging branch, 2026-05-23)

Commit Fix
c72fb2b Test harness: shell: false on Windows (root cause of all Windows failures)
3a96afe Test harness: node -e mkdirSync for cross-platform dir creation
3f9dd43 Launch: restore direct .exe spawn for Bun binaries on Windows
2bafe47 Transport-mux: preserve tool_calls in OpenAI chat codec normalization
2a158d9 Transport-mux: add tool-call support to openAiChatStreamResponse
2dc3cb4 CI: remove agent skill dirs from workspace before live-stack tests
3a7a61c Launch: fix .cmd-to-.js resolution with %dp0% substitution
17463c1 Launch: resolve wrapper scripts on macOS/Linux for node-pty spawn
3ed3a18 CI: add gpt-5.4-mini model key to live-stack matrix
aeb77e1 Launch: bridge-interactive child_process fallback (output parsing + prompt injection)
98adc38 Test harness: cross-platform BP fixture setup (bash→node), shell:true for Windows
09a5cc8 Test harness: hooks-mux optional in interactive mode
5cf62d0 Launch: BI fallback prompt-in-args + SDK shell:true on Windows
cebff73 Bridge-hooks: invoke hooks-mux instead of babysitter directly
a1f2d66 CI: hooks-mux link pointed to dist/index.js (no-op) — fixed to dist/cli/main.js
9ecb285 Atlas: gemini-cli --yolo launch config for auto-approval (root cause of #341)
ca98429 Atlas: hermes --yolo launch config (was --auto-approve, wrong flag)
25ef6dd Atlas+catalog: tula agent as amux-launchable harness with live-stack support

Primary Full Tests — BP/Create Interactive

Target: all these combinations must PASS on all 3 OS.

Agent + Model Ubuntu macOS Windows
codex + gpt-5.5 PASS PASS PASS
claude-code + gpt-5.5 PASS PASS PASS
gemini-cli + gemini-3.5-flash PASS PASS PASS
pi + gpt-5.5 PASS PASS PASS
hermes + DeepSeek-V4-Pro PASS PASS SKIPPED — ConPTY >60 min (hermes Windows BP needs native stdin)
tula + gpt-5.5 PASS PASS PASS

Primary Full Tests — BP/Resume Interactive

Target: all these combinations must PASS on all 3 OS.

Agent + Model Ubuntu macOS Windows
codex + gpt-5.5 PASS PASS PASS
claude-code + gpt-5.5 PASS PASS PASS
gemini-cli + gemini-3.5-flash PASS PASS pending
pi + gpt-5.5 PASS PASS PASS
hermes + DeepSeek-V4-Pro PASS FAIL (macOS install) pending
tula + gpt-5.5 --- --- ---

Clone this wiki locally