-
Notifications
You must be signed in to change notification settings - Fork 81
QA Evidence
Tal Muskal edited this page Jun 1, 2026
·
287 revisions
Last updated: 2026-05-30
Legend: PASS = link to job, — = not yet tested
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #468 — FAIL |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | SKIPPED #485 — FAIL | SKIPPED #485 — FAIL | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | PASS | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #468 — FAIL |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | PASS | PASS | PASS |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | SKIPPED #491 — FAIL |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | SKIPPED #485 — FAIL | PASS |
| codex | PASS | SKIPPED #485 — FAIL | SKIPPED #485 — FAIL |
| pi | PASS | SKIPPED #485 — FAIL | PASS |
| gemini-cli | SKIPPED #485 — FAIL | SKIPPED #483 | SKIPPED #485 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | SKIPPED #491 — FAIL |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | PASS | PASS | PASS |
| hermes | PASS | PASS | SKIPPED #468 — FAIL |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | PASS | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | PASS | PASS | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| codex | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| pi | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | SKIPPED #485 | SKIPPED #485 |
| codex | PASS | SKIPPED #485 — FAIL | SKIPPED #485 — FAIL |
| pi | SKIPPED #485 — FAIL | SKIPPED #485 — FAIL | SKIPPED #485 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | SKIPPED #563 — FAIL — FAIL |
| codex | PASS | PASS | PASS |
| pi | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | SKIPPED #489 — FAIL | PASS |
| codex | PASS | SKIPPED #489 — FAIL | SKIPPED #489 — FAIL |
| pi | PASS | SKIPPED #489 — FAIL | SKIPPED #489 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | PASS | PASS | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| codex | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| pi | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| codex | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| pi | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | SKIPPED #563 — FAIL |
| pi | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #489 — FAIL | SKIPPED #489 — FAIL | SKIPPED #489 — FAIL |
| codex | PASS | SKIPPED #489 — FAIL | SKIPPED #489 — FAIL |
| pi | SKIPPED #489 — FAIL | PASS | SKIPPED #489 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | SKIPPED #484 — FAIL | SKIPPED #484 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | PASS | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| codex | PASS | SKIPPED #487 � FAIL | SKIPPED #487 � FAIL |
| pi | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| codex | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| pi | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| pi | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #484 — FAIL | SKIPPED #484 — FAIL | PASS |
| codex | PASS | PASS | PASS |
| pi | SKIPPED #484 — FAIL | SKIPPED #484 — FAIL | SKIPPED #484 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | PASS | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | SKIPPED #484 — FAIL |
| codex | PASS | PASS | PASS |
| pi | PASS | SKIPPED #484 — FAIL | SKIPPED #484 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| codex | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| pi | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| codex | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| pi | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| pi | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #484 — FAIL | PASS | SKIPPED #484 — FAIL |
| codex | SKIPPED #489 — FAIL | SKIPPED #484 — FAIL | SKIPPED #484 — FAIL |
| pi | PASS | PASS | SKIPPED #484 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | PASS | SKIPPED #489 — FAIL | PASS |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | PASS | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| codex | PASS | PASS | PASS |
| pi | PASS | PASS | PASS |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| codex | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| pi | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL — FAIL |
| codex | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL — FAIL |
| pi | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | PASS |
| pi | SKIPPED #489 — FAIL | SKIPPED #489 — FAIL | SKIPPED #489 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | PASS | PASS |
| codex | PASS | PASS | SKIPPED #490 — FAIL |
| pi | PASS | PASS | PASS |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL | SKIPPED #487 — FAIL |
| codex | SKIPPED #563 — FAIL | SKIPPED #490 — FAIL | SKIPPED #490 — FAIL |
| pi | PASS | PASS | SKIPPED #563 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| codex | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| pi | SKIPPED #485 | SKIPPED #485 | SKIPPED #485 |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | PASS | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL |
| codex | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL — FAIL | SKIPPED #563 — FAIL |
| pi | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| claude-code | SKIPPED #489 — FAIL | PASS | PASS |
| codex | PASS | SKIPPED #489 — FAIL | PASS |
| pi | PASS | SKIPPED #489 — FAIL | SKIPPED #489 — FAIL |
| gemini-cli | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL | SKIPPED #563 — FAIL |
| hermes | SKIPPED #468 | SKIPPED #468 | SKIPPED #468 |
| cursor-cli | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL | SKIPPED #562 — FAIL |
| copilot-cli | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL | SKIPPED #560 — FAIL |
| opencode | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL | SKIPPED #561 — FAIL |
Tula uses its internal agent-core → agent-runtime → agent-platform stack to call models directly. Launched via
amux launch tula <provider>. Status: Proxy chain works (agent-core → transport-mux → Azure foundry), but tula'syolocommand creates a babysitter process from the prompt instead of executing a simple file-write task. The live-stack test prompt needs adaptation for tula's process-oriented workflow.
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| tula | PASS | PASS | SKIPPED #615 — FAIL |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| tula | PASS | PASS | SKIPPED #528 |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| tula | SKIPPED #485 — FAIL | SKIPPED #528 | SKIPPED |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| tula | SKIPPED #615 — FAIL | SKIPPED #528 | SKIPPED |
| Agent | Ubuntu | macOS | Windows |
|---|---|---|---|
| tula | PASS | PASS | SKIPPED #528 |
| Issue | Summary | Status |
|---|---|---|
| #258 | gemini-cli file write (superseded by #341) | Closed |
| #308 | macOS BI PTY fallback | FIXED |
| #311 | Windows BP fixture setup | FIXED |
| #312 | BP/Resume hooks check | FIXED |
| #313 | BP claude hooks-mux | FIXED |
| #339 | claude-code BI intermittent | FIXED (PR #427) |
| #340 | BP bridged-hooks logs missing |
FIXED (a1f2d66) |
| #341 | gemini-cli NI --yolo missing |
FIXED (9ecb285) |
| #368 | BP/Create mode fails for claude+pi | FIXED (PR #428) |
| #436 | Cross-provider proxy fails pi+gemini with sonnet | Closed |
| #468 | hermes stdin + proxy routing | PARTIAL — gemini-flash PASS, foundry needs hermes provider config fix |
| #482 | gemini streaming tool schemas | Merged PR #510 |
| #483 | gemini-cli NI proxy auth |
FIXED fbea902 — pinned to 0.43.0, verified all 3 OS |
| #484 | BP/Create process generation | Merged PR #506 |
| #485 | Sonnet — Anthropic credit exhaustion | BLOCKED — needs Anthropic billing top-up |
| #486 | gemini-cli BI PTY fallback | Open |
| #487 | mini BP model routing | FIXED PR #493 |
| #488 | proxy response loop (Responses tool calls) | Merged PR #492/#525 |
| #489 | DeepSeek BP timeout | Merged PR #511 |
| #490 | hooks-mux shim resolution macOS/Windows | FIXED PR #494 (verified all 3 OS) |
| #491 | BI Windows mini/DeepSeek too slow for 600s timeout | WONTFIX — performance characteristic, not a bug |
| Commit | Fix |
|---|---|
c72fb2b |
Test harness: shell: false on Windows (root cause of all Windows failures) |
3a96afe |
Test harness: node -e mkdirSync for cross-platform dir creation |
3f9dd43 |
Launch: restore direct .exe spawn for Bun binaries on Windows |
2bafe47 |
Transport-mux: preserve tool_calls in OpenAI chat codec normalization |
2a158d9 |
Transport-mux: add tool-call support to openAiChatStreamResponse |
2dc3cb4 |
CI: remove agent skill dirs from workspace before live-stack tests |
3a7a61c |
Launch: fix .cmd-to-.js resolution with %dp0% substitution |
17463c1 |
Launch: resolve wrapper scripts on macOS/Linux for node-pty spawn |
3ed3a18 |
CI: add gpt-5.4-mini model key to live-stack matrix |
aeb77e1 |
Launch: bridge-interactive child_process fallback (output parsing + prompt injection) |
98adc38 |
Test harness: cross-platform BP fixture setup (bash→node), shell:true for Windows |
09a5cc8 |
Test harness: hooks-mux optional in interactive mode |
5cf62d0 |
Launch: BI fallback prompt-in-args + SDK shell:true on Windows |
cebff73 |
Bridge-hooks: invoke hooks-mux instead of babysitter directly |
a1f2d66 |
CI: hooks-mux link pointed to dist/index.js (no-op) — fixed to dist/cli/main.js |
9ecb285 |
Atlas: gemini-cli --yolo launch config for auto-approval (root cause of #341) |
ca98429 |
Atlas: hermes --yolo launch config (was --auto-approve, wrong flag) |
25ef6dd |
Atlas+catalog: tula agent as amux-launchable harness with live-stack support |
Target: all these combinations must PASS on all 3 OS.
| Agent + Model | Ubuntu | macOS | Windows |
|---|---|---|---|
| codex + gpt-5.5 | PASS | PASS | PASS |
| claude-code + gpt-5.5 | PASS | PASS | pending |
| gemini-cli + gemini-3.5-flash | PASS | PASS | PASS |
| pi + gpt-5.5 | PASS | PASS | PASS |
| hermes + DeepSeek-V4-Pro | PASS | PASS | FAIL — CI token generation fails on Windows |
| tula + gpt-5.5 | FAIL — agent-core+gpt-5.5 can't follow babysitter tool protocol | FAIL | FAIL |
Target: all these combinations must PASS on all 3 OS.
| Agent + Model | Ubuntu | macOS | Windows |
|---|---|---|---|
| codex + gpt-5.5 | pending | --- | --- |
| claude-code + gpt-5.5 | pending | --- | --- |
| gemini-cli + gemini-3.5-flash | FAIL | FAIL | FAIL |
| pi + gpt-5.5 | FAIL | FAIL | --- |
| hermes + DeepSeek-V4-Pro | FAIL | FAIL | --- |
| tula + gpt-5.5 | FAIL | FAIL | FAIL |