We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Update: omni fails — agent-core+gpt-5.5 can't follow babysitter tool protocol
Update: omni still fails (agent-core can't drive SDK loop even with 5 stalls)
Update: pi PASS on all 3 OS (13/18 total)
Update: gemini+pi PASS on Windows (12/18 total)
Update: gemini-cli PASS on Ubuntu (10/18 total)
Update: gemini+pi PASS on macOS with create-mode upgrade fix
Update primary matrix: codex 3/3, hermes 2/3, pending gemini+pi re-runs
Update primary matrix: 3/6 PASS on Ubuntu (codex, claude, hermes)
Add Primary Full Tests — BP/Resume Interactive section
Update primary matrix: 4/5 PASS + add omni row
Update primary matrix: codex+gpt55 and pi+gpt55 BP/Create Ubuntu PASS
Add primary full tests section: BP/Create Interactive target matrix
Update wiki: codex sonnet BP/Pred Ubuntu PASS
Update wiki: claude-code sonnet BP/Pred Ubuntu PASS
Update wiki: claude-code + codex sonnet NI+BI Ubuntu PASS (#485 billing recharged)
Garden v2: convert 184 blocked cells to SKIPPED — all issues closed, fixes merged
Update wiki: gemini-cli BI macOS gemini-flash+mini PASS (#483 verified)
Update wiki: opencode DeepSeek NI Windows PASS
Update wiki: 4 opencode cells flipped to PASS (#561 fix verified)
Revert "Garden: change all 185 blocked cells to SKIPPED — all blocking issues now closed" This reverts commit 689d2b056679b1f82518178b0764675997f4b08c.
Garden: change all 185 blocked cells to SKIPPED — all blocking issues now closed
Update wiki: hermes gpt55 BP/Create Ubuntu PASS
Update wiki: hermes DeepSeek BP/Pred macOS + codex DeepSeek BP/Pred Ubuntu PASS
Update wiki: pi DeepSeek BP/Predefined Ubuntu PASS
Update wiki: hermes DeepSeek BP/Predefined Ubuntu PASS
Update wiki: codex BP/Create mini Ubuntu PASS
Update wiki: hermes BP pred macOS gpt55 + BP resume Ubuntu gpt55 PASS
Update wiki: hermes macOS vanilla fully PASS + hermes BP create deepseek Ubuntu PASS
Update wiki: 5 hermes BI cells + pi create gpt55 flipped to PASS
Revert: restore fixable issues to blocked — only truly human-action issues are SKIPPED