Skip to content

QA Evidence

Tal Muskal edited this page Jun 13, 2026 · 287 revisions

QA Evidence — Live Stack Test Matrix

Last updated: 2026-06-11

#936 RESOLVED (2026-06-11). genty live-stack is fixed at the root across 5 layers: (1) real tool-calling wired into genty-core (58cc4cc27), (2) the "Effect failed" JSON-parse spin on the in-process orchestration path (aa5637b12+a39848b6b), (3) agent non-convergence + the decisive toolless-delegated-worker bug, found via a local Azure repro (33a59a10e), (4)/(5) the run now anchors on <workspace>/.a5c/runs so completion-proof is discoverable — the genty CLI was eagerly defaulting the runs dir to global, defeating the workspace anchor (270de90c3+fe1aa68f8). genty · vanilla NI · gpt-5.5 GREEN on all 3 OSes (Ubuntu / macOS / Windows; all 7 checks pass). Dispatched the other 5 NI models (gpt-5.4-mini, sonnet, gemini-3.5-flash, gemini-3.1-pro, DeepSeek) across all 3 OSes — they all FAIL on the file-creation check ONLY (orchestration is fine: run-completion + completion-proof + tool-calling all pass; the model emits the content as ~12k-char text but never writes the target file). That is a model-adherence gap, NOT #936 → tracked in #956. genty BI/BP/interactive cells not yet dispatched.

Verified 2026-06-10 via run 27259214362 (complete): 12 PASS / 11 FAIL. One regression: Vanilla NI · DeepSeek-V4-Pro · pi · Ubuntu flipped PASS→FAIL — reproducible (failed 2×, ~3 min; SIGTERM/timeout with tool calls returning "(no output)"; NI-mode-specific, BI passes) → tracked in #954. All other failures were known-skipped genty (#936) / antigravity (#945). The late-finishing PASS cells (hermes/gpt-5.5 + pi/DeepSeek-V4-Pro, bp/create interactive) are untracked combos (matrix tracks hermes+DeepSeek for BP/Create; hermes #468 SKIPs are vanilla-foundry) — no cell change. Credit cells await Anthropic billing (#485).

Legend: PASS = link to job, — = not yet tested

Vanilla Non-Interactive (NI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468FAIL
genty PASS PASS PASS
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode FAIL --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes BLOCKED (credits) #485 --- SKIPPED #468
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- FAIL

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi FAIL #954 PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468FAIL
genty FAIL #956 FAIL #956 FAIL #956
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS PASS PASS

Vanilla Bridged-Interactive (BI)

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- FAIL

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS BLOCKED (credits) #485 PASS
codex PASS BLOCKED (credits) #485 ---
pi PASS BLOCKED (credits) #485 PASS
gemini-cli BLOCKED (credits) #485 --- ---
hermes BLOCKED (credits) #485 --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode BLOCKED (credits) #485 --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli --- --- ---
hermes --- --- ---
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli PASS PASS PASS
hermes PASS PASS SKIPPED #468 — FAIL
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode PASS --- ---

BP/Predefined — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli FAIL --- ---
hermes PASS PASS SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode FAIL --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code FAIL --- ---
codex FAIL PASS ---
pi FAIL --- ---
gemini-cli FAIL --- ---
hermes FAIL --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode FAIL --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi BLOCKED (credits) #485 --- ---
gemini-cli --- --- ---
hermes BLOCKED (credits) #485 --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS PASS
pi FAIL --- ---
gemini-cli --- PASS ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
codex PASS --- ---
gemini-cli --- --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS --- PASS
codex PASS --- ---
pi PASS PASS ---
gemini-cli --- --- ---
hermes PASS PASS SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Predefined — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli FAIL --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode FAIL --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code FAIL --- ---
codex FAIL PASS ---
pi FAIL --- ---
gemini-cli FAIL --- ---
hermes FAIL --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code BLOCKED (credits) #485 --- ---
codex BLOCKED (credits) #485 --- ---
pi BLOCKED (credits) #485 --- ---
gemini-cli --- --- ---
hermes BLOCKED (credits) #485 --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS ---
pi --- --- ---
gemini-cli --- PASS ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code --- --- ---
codex PASS PASS ---
pi --- --- ---
gemini-cli --- --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code --- --- ---
codex PASS PASS ---
pi --- PASS ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Create — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS PASS ---
gemini-cli --- --- ---
hermes PASS PASS SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code FAIL --- ---
codex PASS --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes FAIL --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code BLOCKED (credits) #485 --- ---
codex BLOCKED (credits) #485 --- ---
pi BLOCKED (credits) #485 --- ---
gemini-cli --- --- ---
hermes BLOCKED (credits) #485 --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi FAIL --- ---
gemini-cli --- --- ---
hermes FAIL --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS ---
pi --- --- ---
gemini-cli PASS PASS ---
hermes --- --- ---
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---
Agent Ubuntu macOS Windows
claude-code PASS --- PASS
codex PASS PASS PASS
pi PASS --- ---
gemini-cli --- --- ---
hermes PASS --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Create — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS ---
codex PASS PASS PASS
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code FAIL --- ---
codex FAIL --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code BLOCKED (credits) #485 --- ---
codex BLOCKED (credits) #485 --- ---
pi BLOCKED (credits) #485 --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex --- --- ---
pi --- --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code --- --- ---
codex --- --- ---
pi --- --- ---
gemini-cli PASS --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code --- PASS ---
codex --- --- ---
pi PASS PASS ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Resume — Interactive

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS --- PASS
gemini-cli --- --- ---
hermes PASS --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS PASS PASS
pi PASS PASS PASS
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex FAIL --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex --- --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS PASS
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

BP/Resume — Bridged-Hooks

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS PASS PASS
codex PASS PASS ---
pi PASS PASS PASS
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi PASS PASS ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex PASS --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex --- --- ---
pi --- --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
claude-code PASS --- ---
codex --- --- ---
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- ---
cursor-cli --- --- ---
copilot-cli --- --- ---
opencode --- --- ---

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
claude-code --- PASS PASS
codex PASS --- PASS
pi PASS --- ---
gemini-cli --- --- ---
hermes --- --- SKIPPED #468
genty SKIPPED #936 SKIPPED #936 SKIPPED #936
antigravity SKIPPED #945 SKIPPED #945 SKIPPED #945
cursor-cli SKIPPED #562FAIL SKIPPED #562FAIL SKIPPED #562FAIL
copilot-cli SKIPPED #560FAIL SKIPPED #560FAIL SKIPPED #560FAIL
opencode --- --- ---

Tula Agent (Internal Harness)

Tula uses its internal agent-core → agent-runtime → agent-platform stack to call models directly. Launched via amux launch tula <provider>. Status: Proxy chain works (agent-core → transport-mux → Azure foundry), but tula's yolo command creates a babysitter process from the prompt instead of executing a simple file-write task. The live-stack test prompt needs adaptation for tula's process-oriented workflow.

gpt-5.5 (Azure Foundry)

Agent Ubuntu macOS Windows
genty PASS PASS ---

gpt-5.4-mini (Azure Foundry)

Agent Ubuntu macOS Windows
genty PASS PASS ---

claude-sonnet-4-6 (Anthropic)

Agent Ubuntu macOS Windows
genty --- --- SKIPPED

gemini-3.5-flash (Google)

Agent Ubuntu macOS Windows
genty --- --- SKIPPED

gemini-3.1-pro-preview (Google)

Agent Ubuntu macOS Windows
genty SKIPPED #936 SKIPPED #936 SKIPPED #936

DeepSeek-V4-Pro (Azure Foundry)

Agent Ubuntu macOS Windows
genty PASS PASS ---

Issues Status

Issue Summary Status
#258 gemini-cli file write (superseded by #341) Closed
#308 macOS BI PTY fallback FIXED
#311 Windows BP fixture setup FIXED
#312 BP/Resume hooks check FIXED
#313 BP claude hooks-mux FIXED
#339 claude-code BI intermittent FIXED (PR #427)
#340 BP bridged-hooks logs missing FIXED (a1f2d66)
#341 gemini-cli NI --yolo missing FIXED (9ecb285)
#368 BP/Create mode fails for claude+pi FIXED (PR #428)
#436 Cross-provider proxy fails pi+gemini with sonnet Closed
#468 hermes stdin + proxy routing PARTIAL — gemini-flash PASS, foundry needs hermes provider config fix
#482 gemini streaming tool schemas Merged PR #510
#483 gemini-cli NI proxy auth FIXED fbea902 — pinned to 0.43.0, verified all 3 OS
#484 BP/Create process generation Merged PR #506
#485 Sonnet — Anthropic credit exhaustion BLOCKED — needs Anthropic billing top-up
#486 gemini-cli BI PTY fallback Open
#487 mini BP model routing FIXED PR #493
#488 proxy response loop (Responses tool calls) Merged PR #492/#525
#489 DeepSeek BP timeout Merged PR #511
#490 hooks-mux shim resolution macOS/Windows FIXED PR #494 (verified all 3 OS)
#491 BI Windows mini/DeepSeek too slow for 600s timeout WONTFIX — performance characteristic, not a bug

Key Fixes Applied (staging branch, 2026-05-23)

Commit Fix
c72fb2b Test harness: shell: false on Windows (root cause of all Windows failures)
3a96afe Test harness: node -e mkdirSync for cross-platform dir creation
3f9dd43 Launch: restore direct .exe spawn for Bun binaries on Windows
2bafe47 Transport-mux: preserve tool_calls in OpenAI chat codec normalization
2a158d9 Transport-mux: add tool-call support to openAiChatStreamResponse
2dc3cb4 CI: remove agent skill dirs from workspace before live-stack tests
3a7a61c Launch: fix .cmd-to-.js resolution with %dp0% substitution
17463c1 Launch: resolve wrapper scripts on macOS/Linux for node-pty spawn
3ed3a18 CI: add gpt-5.4-mini model key to live-stack matrix
aeb77e1 Launch: bridge-interactive child_process fallback (output parsing + prompt injection)
98adc38 Test harness: cross-platform BP fixture setup (bash→node), shell:true for Windows
09a5cc8 Test harness: hooks-mux optional in interactive mode
5cf62d0 Launch: BI fallback prompt-in-args + SDK shell:true on Windows
cebff73 Bridge-hooks: invoke hooks-mux instead of babysitter directly
a1f2d66 CI: hooks-mux link pointed to dist/index.js (no-op) — fixed to dist/cli/main.js
9ecb285 Atlas: gemini-cli --yolo launch config for auto-approval (root cause of #341)
ca98429 Atlas: hermes --yolo launch config (was --auto-approve, wrong flag)
25ef6dd Atlas+catalog: genty agent as adapters-launchable harness with live-stack support

Primary Full Tests — BP/Create Interactive

Target: all these combinations must PASS on all 3 OS.

Agent + Model Ubuntu macOS Windows
codex + gpt-5.5 PASS PASS PASS
claude-code + gpt-5.5 PASS PASS PASS
gemini-cli + gemini-3.5-flash PASS PASS PASS
pi + gpt-5.5 PASS PASS PASS
hermes + DeepSeek-V4-Pro PASS PASS SKIPPED — ConPTY >60 min (hermes Windows BP needs native stdin)
genty + gpt-5.5 PASS PASS PASS

Primary Full Tests — BP/Resume Interactive

Target: all these combinations must PASS on all 3 OS.

Agent + Model Ubuntu macOS Windows
codex + gpt-5.5 PASS PASS PASS
claude-code + gpt-5.5 PASS PASS PASS
gemini-cli + gemini-3.5-flash PASS PASS PASS
pi + gpt-5.5 PASS PASS PASS
hermes + DeepSeek-V4-Pro PASS PASS PASS
genty + gpt-5.5 PASS PASS PASS

Clone this wiki locally