Skip to content

History / QA Evidence

Revisions

  • QA: add gemini-3.1-pro-preview BI results (claude/codex/pi Ubuntu PASS), update date to 2026-06-06

    @tmuskal tmuskal committed Jun 6, 2026
  • QA: pi/gemini-3.1-pro-preview Ubuntu NI PASS (14/14 vanilla NI complete)

    @tmuskal tmuskal committed Jun 5, 2026
  • QA: add gemini-3.1-pro-preview Vanilla NI results (13/14 PASS)

    @tmuskal tmuskal committed Jun 5, 2026
  • Add gemini-3.1-pro-preview (Google) sections to all test matrices

    @tmuskal tmuskal committed Jun 4, 2026
  • Add antigravity rows to QA Evidence matrix (all sections, not yet tested)

    @tmuskal tmuskal committed Jun 4, 2026
  • Update QA Evidence: date to 2026-06-04, tula→genty rename

    @tmuskal tmuskal committed Jun 4, 2026
  • BP/Resume: ALL 18/18 PASS! Tula 3/3 PASS — full matrix complete!

    @tmuskal tmuskal committed Jun 2, 2026
  • BP/Resume: hermes 3/3 PASS! (15/18 — only tula pending)

    @tmuskal tmuskal committed Jun 2, 2026
  • BP/Resume: gemini 3/3 PASS! (12/15 — hermes macOS+Windows pending)

    @tmuskal tmuskal committed Jun 2, 2026
  • BP/Resume: gemini PASS macOS! Build fix works. (11/15)

    @tmuskal tmuskal committed Jun 2, 2026
  • BP/Resume: hermes PASS Ubuntu! All 5 agents pass on Ubuntu now.

    @tmuskal tmuskal committed Jun 2, 2026
  • BP/Resume: gemini Ubuntu PASS, macOS install failures for gemini+hermes

    @tmuskal tmuskal committed Jun 2, 2026
  • Update BP/Resume: gemini PASS Ubuntu! Evidence ReferenceError was root cause.

    @tmuskal tmuskal committed Jun 2, 2026
  • Update BP/Resume: 9/15 PASS (codex+claude+pi 3/3; gemini+hermes 0/3)

    @tmuskal tmuskal committed Jun 1, 2026
  • Hermes Windows BP: SKIPPED — ConPTY >60 min, needs native stdin support. 17/18 PASS.

    @tmuskal tmuskal committed Jun 1, 2026
  • Hermes Windows: known ConPTY limitation (>45 min for BP tasks). 17/18 PASS.

    @tmuskal tmuskal committed Jun 1, 2026
  • Update: hermes Windows switched to predefined mode (create too slow for ConPTY)

    @tmuskal tmuskal committed Jun 1, 2026
  • Update BP/Resume: codex 3/3, claude 2/3, pi 1/3 (7/15 PASS)

    @tmuskal tmuskal committed Jun 1, 2026
  • Tula 3/3 PASS! (17/18 BP/Create — hermes Windows only remaining)

    @tmuskal tmuskal committed Jun 1, 2026
  • TULA PASS on Ubuntu! (17/18 BP/Create) — autonomous host loop works

    @tmuskal tmuskal committed Jun 1, 2026
  • Add tula to all Vanilla and BP matrix tables (alongside hermes, codex, pi, etc.)

    @tmuskal tmuskal committed Jun 1, 2026
  • Update: BP/Resume codex+claude PASS Ubuntu; BP/Create 16/18

    @tmuskal tmuskal committed Jun 1, 2026
  • Update: claude+gpt-5.5 PASS all 3 OS! (16/18 BP/Create)

    @tmuskal tmuskal committed Jun 1, 2026
  • Update: claude+gpt-5.5 PASS macOS (15/18 BP/Create)

    @tmuskal tmuskal committed Jun 1, 2026
  • Update: claude+gpt-5.5 PASS on Ubuntu (14/18 BP/Create); BP/Resume switched to gpt-5.5

    @tmuskal tmuskal committed Jun 1, 2026
  • Switch claude-code to gpt-5.5 in primary matrix (Anthropic credits depleted)

    @tmuskal tmuskal committed Jun 1, 2026
  • rename: omni → tula in wiki pages

    @tmuskal tmuskal committed Jun 1, 2026
  • Update: BP/Resume failing across board; hermes Windows CI token issue noted

    @tmuskal tmuskal committed Jun 1, 2026
  • Update: omni fails — agent-core+gpt-5.5 can't follow babysitter tool protocol

    @tmuskal tmuskal committed Jun 1, 2026
  • Update: omni still fails (agent-core can't drive SDK loop even with 5 stalls)

    @tmuskal tmuskal committed May 31, 2026