-
Notifications
You must be signed in to change notification settings - Fork 0
v3 retrospective
Note
Status: historical record, closed
Period: 2026-04-19 → 2026-05-23
Paired with releases: agentm v3.0.0 + crickets v1.0.0
This document is the focused-survey retrospective of the V3 arc — the work that took agentm from v1.0.0 (Codex-removal sweep) to v3.0.0 (V3 close-out), and crickets from inception (v0.5.0 split-out) to v1.0.0 (public-API commitment).
It exists for posterity (when the next arc starts, this is the source-of-truth for "what V3 was"), for future maintainers (so the design themes are read once not re-derived), and for the operator's later vault archive of V3 material.
V3 begins with plan #0 (the Codex-removal sweep that shipped harness v1.0.0 on 2026-05-11) and ends with this plan (the V3 close-out that ships harness v3.0.0 + toolkit v1.0.0). Every plan in between executed under the phase-gated workflow established in harness ADR 0001 and the toolkit-split architecture established in harness ADR 0006 / toolkit ADR 0001.
Pre-V3 work (harness v0.x) is out of scope. So is anything in the AgentMemoryV4 roadmap — that's the next arc, not this one.
Plans: 23 total. 22 archived under .harness/PLAN.archive.*.md, plus this active plan (#14) which closes V3.
Paired releases: 12 paired pairs (harness × toolkit), chronologically:
| Harness | Toolkit | Date | Theme | Substantive side |
|---|---|---|---|---|
| v2.0.0 | v0.5.0 | 2026-05-12 | crickets repo split (BREAKING migration) | both |
| v2.1.0 | v0.6.0 | 2026-05-13 |
evaluator sub-agent + /review §3b |
both |
| v2.2.0 | v0.7.0 | 2026-05-14 | base hooks: kill-switch / steer / commit-on-stop | both |
| v2.3.0 | v0.8.0 | 2026-05-15 |
/design skill v1 + /release §1b |
both |
| v2.3.1 | v0.8.1 | 2026-05-16 | external-review-handoff option (dogfood patch) | both |
| v2.4.0 | v0.9.0 | 2026-05-17 | Gemini-CLI host removal | toolkit-substantive |
| v2.4.1 | v0.9.2 | 2026-05-20 | local-only embeddings + BGE-large default | toolkit-substantive |
| v2.4.2 | v0.10.0 | 2026-05-22 | MemoryVault Discovery + Mining | toolkit-substantive |
| v2.4.3 | v0.11.0 | 2026-05-22 |
diataxis-author skill |
toolkit-substantive |
| v2.5.0 | v0.11.1 | 2026-05-22 | auto-context into harness phases | harness-substantive |
| v2.6.0 | v0.12.0 | 2026-05-23 | evidence-tracker hook for /work
|
both |
| v2.6.1 | v0.13.0 | 2026-05-23 | quality-gates bundle | toolkit-substantive |
V3 closes with paired pair #13 — harness v3.0.0 + toolkit v1.0.0.
Primitives shipped (toolkit):
- 1 sub-agent:
evaluator. - 4 hooks:
kill-switch,steer,commit-on-stop,evidence-tracker. - 3 skills:
memory(with/memory save|recall|evolve|reflect|index-skills|discover-skills|adapt-skills|watchlist),design,diataxis-author. - 1 real-substance bundle:
quality-gates(packages the evaluator + 4 hooks via sibling-reference dispatch).
Architecture decisions: 7 harness ADRs (0001–0007) + 9 toolkit ADRs (0001–0010 with an intentional gap at 0005). All accepted; three carry dated amendments (toolkit 0001 / 0002 / 0004).
Versioning shape: harness V3 release matches the AgentMemory implementation V-versioning (V3 = merged Obsidian+GDrive vault auto-loaded into every phase). Toolkit v1.0.0 commits to a stable public API surface: bundle/manifest schema + installer flags + bundles/ namespace + the 11 customization kinds. Internal surface (scripts/, lib/install/) remains pre-1.0 in spirit.
Paired-release cadence. Twelve consecutive paired releases established that harness changes and toolkit changes ship together — even when one side is doc-only, the paired CHANGELOG entry on the other side keeps version cadences readable. Toolkit-first ordering (toolkit release notes URL-link the harness release, then the harness release URL-links the toolkit release) became the locked convention.
Sibling-reference over copy-with-parity. Plan #10 shipped the quality-gates bundle after an operator-driven mid-plan pivot from COPY to sibling-reference. A bundle is now a manifest pointing at standalone primitives; the installer resolves contents: entries against the toolkit's standalone layout. Net effect: zero file duplication, zero drift surface, single source of truth (ADR 0010).
Evidence-tracking default-FAIL contract. Plan #9 shipped a hook that blocks [ ] → [x] flips in PLAN.md unless the agent has demonstrably Read a spec-shaped file in this session. Hybrid resolver: heuristic (look for **Evidence:** task-body annotations or files named in the task) + per-task override + explicit opt-out with mandatory rationale (ADR 0009).
Auto-recall in every harness phase. Plan #8 wired harness_memory.py recall into /setup, /plan, /work, /review, /release, /bugfix at natural boundaries. Self-modulating offer-save (confidence-thresholded prompt). Cursor-tracked promotion (.harness/.promoted-progress-cursor) for plan-done end-of-plan reflection (ADR 0007).
Sub-letter spec amendment convention. Adding §5b instead of inserting a new §6 preserves integer §-numbering — incoming wiki refs that cite "§N" stay valid. Established in plan #3, reinforced across plans #4 / #8 / #9. Line-range anchors still need manual updating, but the §-level contract is stable.
.py-sidecar installer pattern. Plan #9 introduced the convention that a hook can ship a Python helper alongside its .sh/.ps1 entry script. The installer extension was ~7 lines per OS; the integrity check had to be extended in parallel to allow .py files in .claude/hooks/. Pattern now reusable for future hooks that need stdlib-only Python helpers.
Permeable A3 write boundary. MemoryVault writes default to personal-private/ but agents can write anywhere on explicit operator instruction or after confirmation. Read is universal; write is constrained-by-default. Established plan #7a part 4; reinforced by the adapt-don't-import sub-agent write allowlist in plan #7b.
These surfaced multiple times and are now part of the muscle memory:
-
LC_ALL=C sortfor deterministic line order. macOS uses case-insensitive collation vs. Linux byte-order; breaks SHA-256 byte-identity inlib/install/.checksums.txt. -
$hostis a read-only PowerShell built-in. Loop-variable collisions in installers; always rename. -
Git Bash on Windows ships
sha256sum, notshasum. Runtime detection with fallback inlib-paritychecks. -
Git
autocrlf=trueon Windows breaks SHA-256 byte-identity..gitattributesforcing LF + sed pattern normalizing binary-mode and text-modesha256sumoutput. -
Path.__str__()returns native separators on Windows. UsePath.as_posix()for display output and cross-platform comparisons. -
ConvertTo-Jsonsingle-element array unwrap. A single hook event stored as object instead ofList[object]breaks Claude Code's hook loader schema; useConvertFrom-Json -AsHashtablethroughout. -
Start-Process -ArgumentListsplits multi-word args. Switched to& python3direct invocation. -
Windows Python
cp1252stdout encoding can't encode→or em-dashes.sys.stdout.reconfigure(encoding='utf-8')at module load; inlinepython3 -c open(file)needs explicitencoding='utf-8'. -
Join-Pathconstructs strings but doesn'tmkdir. Bash'smktemp -dhides this — Windows CI catches it. -
Path.write_texttranslates\n→\r\non Windows. pwsh(?m)^${org}$regex won't match CRLF lines; switch towrite_bytesfor LF-only. - PII scanner false-positives on synthetic identities. Synthetic test emails + SSH-form URLs + file-path-shaped strings in fixtures trip the scanner. Allowlist must be maintained; even narrative describing prior scrubs trips the scanner — v2.5.0 hit itself, fixed forward.
- Sub-letter spec amendments preserve §-numbering. §5b not §6; preserves external "§N" refs.
-
Wake-on-CI: never mark
[x]speculatively. Push → schedule ~90s wake → close out only when CI is green across 3-OS. Six distinct Windows-only failures caught this way. - Operator-driven mid-plan pivots are normal. Build the ADR around the locked decision, not the first sketch.
- ADR numbering is per-repo, not global. Harness ADRs 0001–0007; toolkit ADRs 0001–0010 (gap at 0005). Conflating numbering breaks cross-references at scale.
Six plans had substantive mid-flight design changes initiated by the operator:
- Plan #7a part 4 — A3 permeable boundary. Initial sketch was "memory writes only to vault." Revised to "writes default to vault; explicit operator instruction or confirmation unlocks anywhere." Locked the read-universal / write-constrained pattern.
-
Plan #8 Q4/Q5 mid-flight revision. Flat-ask offer-save → self-modulating (confidence-thresholded). Release-only promotion → dual-trigger cursor-tracked (plan-done in
/work+ tail-scan in/release). - Plan #10 Q1 COPY → sibling-reference. Operator's "wait why did we make a bunch of copies?" caught a design smell labeled "invasive" but actually ~50 lines per OS. Net: -1992 lines, zero ongoing maintenance burden, single source of truth.
- Plan #11 explicit deferral. Operator chose option 1 (defer formally with 4 revisit triggers) rather than build speculatively. First explicit "skip this item formally" decision in the arc.
- Plan #9 Q1 evidence resolver. Hybrid heuristic + override + explicit opt-out instead of strict path-list.
- Plan #14 (this plan) HLD constraint. Operator-locked: HLD must stand on its own, no overt external prior-art references; operator review-and-approve gate before commit/push.
Pattern: don't anchor on existing precedent; sanity-check "invasive alternative" framings against actual cost.
-
#11 Wake-from-state ⏸️ deferred 2026-05-23. Implicit
state on diskalready covers ~95%; no real crash has lost recoverable data through the 10-plan arc. Revisit triggers: (i) real crash loses data; (ii) #26 ships and reshapes WHERE state lives; (iii) #28 ships and absorbs the wake surface; (iv) #24 cross-device hits state-recovery friction. - #16 Personal-knowledge consolidation — gated on V4 vault architecture. Out of scope for V3 because the V3 vault tree is now stable but the comprehensive content-curation pass is a 2-3 session co-creation project, not a code-shipping plan.
- #18 Local-only embeddings — already shipped mid-flight as a plan; no longer "deferred." Listed here for completeness.
V4 carries the memory line forward; non-V4 backlog stays on the slimmed ROADMAP. The full V4 design space lives in the AgentMemory evolution HLD (shipped alongside this retrospective) and the new .harness/ROADMAP-AgentMemoryV4.md. Headlines:
-
Vault-backed harness state (#26). Move
PLAN.md/progress.md/ROADMAP.md/project.jsoninto the vault. Universal cross-repo development. Reshapes the state model that #11 (wake-from-state) was originally going to formalize. - FRIDAY-style natural-extension-of-memory (#28). Agent + harness + memory as a personal-knowledge-management OS. Vision item; absorbs explicit-wake-surface in favor of higher-abstraction "open the file for X."
- AgentMemory evolution audit (#25). Read prior-art memory architectures (Karpathy LLM Wiki + GBrain + others); 4-bucket classification (adopt as-is / adopt with adaptation / deliberately reject / incompatible).
- Cross-surface AgentMemory protocol (#22). Configure-don't-build vault access for Claude.ai / Gemini / Antigravity / etc.
- Auto-orchestration (#23). Chain MemoryVault skills (recall + reflect + idea-ledger + index-skills + discover-skills + adapt-skills + watchlist + promote) into natural automatic invocations.
Non-V4 frontiers: #17 Antigravity 2.0 + CLI host support; #19 Ideas.md format redesign; #21 harness self-audit skill; #24 portable harness (Claude Code Web / NAS Docker remote); #30 public-consumption-ready release.
Cross-references:
-
agent-memory-evolution— the AgentMemory evolution HLD (V1 → V4), shipped alongside this retrospective in plan #14 -
memoryvault— the parent MemoryVault design (V3 implementation) -
agentm/.harness/ROADMAP-AgentMemoryV4.md— V4 roadmap (lands in plan #14 task 3) -
agentm/.harness/ROADMAP.archive.20260523-v3-complete.md— full V3-era ROADMAP snapshot (operator-local;.harness/is gitignored — archive preserved for eventual vault migration)