History for QA Evidence · a5c-ai/babysitter Wiki

QA: add gemini-3.1-pro-preview BI results (claude/codex/pi Ubuntu PASS), update date to 2026-06-06

tmuskal committed Jun 6, 2026

4c8d079

QA: pi/gemini-3.1-pro-preview Ubuntu NI PASS (14/14 vanilla NI complete)

tmuskal committed Jun 5, 2026

6648d82

QA: add gemini-3.1-pro-preview Vanilla NI results (13/14 PASS)

tmuskal committed Jun 5, 2026

472ed73

Add gemini-3.1-pro-preview (Google) sections to all test matrices

tmuskal committed Jun 4, 2026

553ee7e

Add antigravity rows to QA Evidence matrix (all sections, not yet tested)

tmuskal committed Jun 4, 2026

7884f07

Update QA Evidence: date to 2026-06-04, tula→genty rename

tmuskal committed Jun 4, 2026

e7baea9

BP/Resume: ALL 18/18 PASS! Tula 3/3 PASS — full matrix complete!

tmuskal committed Jun 2, 2026

31074f7

BP/Resume: hermes 3/3 PASS! (15/18 — only tula pending)

tmuskal committed Jun 2, 2026

a872fd9

BP/Resume: gemini 3/3 PASS! (12/15 — hermes macOS+Windows pending)

tmuskal committed Jun 2, 2026

e6d409b

BP/Resume: gemini PASS macOS! Build fix works. (11/15)

tmuskal committed Jun 2, 2026

43c7fa5

BP/Resume: hermes PASS Ubuntu! All 5 agents pass on Ubuntu now.

tmuskal committed Jun 2, 2026

159f6e8

BP/Resume: gemini Ubuntu PASS, macOS install failures for gemini+hermes

tmuskal committed Jun 2, 2026

efd3d27

Update BP/Resume: gemini PASS Ubuntu! Evidence ReferenceError was root cause.

tmuskal committed Jun 2, 2026

97ee24c

Update BP/Resume: 9/15 PASS (codex+claude+pi 3/3; gemini+hermes 0/3)

tmuskal committed Jun 1, 2026

470611b

Hermes Windows BP: SKIPPED — ConPTY >60 min, needs native stdin support. 17/18 PASS.

tmuskal committed Jun 1, 2026

fb8b508

Hermes Windows: known ConPTY limitation (>45 min for BP tasks). 17/18 PASS.

tmuskal committed Jun 1, 2026

a84dabf

Update: hermes Windows switched to predefined mode (create too slow for ConPTY)

tmuskal committed Jun 1, 2026

71ad7f2

Update BP/Resume: codex 3/3, claude 2/3, pi 1/3 (7/15 PASS)

tmuskal committed Jun 1, 2026

648816e

Tula 3/3 PASS! (17/18 BP/Create — hermes Windows only remaining)

tmuskal committed Jun 1, 2026

7d8ac20

TULA PASS on Ubuntu! (17/18 BP/Create) — autonomous host loop works

tmuskal committed Jun 1, 2026

8677b2e

Add tula to all Vanilla and BP matrix tables (alongside hermes, codex, pi, etc.)

tmuskal committed Jun 1, 2026

3008c07

Update: BP/Resume codex+claude PASS Ubuntu; BP/Create 16/18

tmuskal committed Jun 1, 2026

65dceba

Update: claude+gpt-5.5 PASS all 3 OS! (16/18 BP/Create)

tmuskal committed Jun 1, 2026

178117f

Update: claude+gpt-5.5 PASS macOS (15/18 BP/Create)

tmuskal committed Jun 1, 2026

952e3b9

Update: claude+gpt-5.5 PASS on Ubuntu (14/18 BP/Create); BP/Resume switched to gpt-5.5

tmuskal committed Jun 1, 2026

88cd00b

Switch claude-code to gpt-5.5 in primary matrix (Anthropic credits depleted)

tmuskal committed Jun 1, 2026

8727954

rename: omni → tula in wiki pages

tmuskal committed Jun 1, 2026

2421c3b

Update: BP/Resume failing across board; hermes Windows CI token issue noted

tmuskal committed Jun 1, 2026

7ee7569

Update: omni fails — agent-core+gpt-5.5 can't follow babysitter tool protocol

tmuskal committed Jun 1, 2026

e66a454

Update: omni still fails (agent-core can't drive SDK loop even with 5 stalls)

tmuskal committed May 31, 2026

c2c727a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

History / QA Evidence

Revisions