Skip to content

agentm hld

github-actions[bot] edited this page Jun 21, 2026 · 17 revisions

title: AgentM — High Level Design status: launched seeded: 2026-06-19 approved: 2026-06-20 kind: design scope: arc area: agentm governs:

  • scripts
  • harness succeeds: wiki/designs/memory-os-architecture.md children:
  • children/agentm-memory-system.md
  • children/agentm-experience-and-dreaming.md
  • children/agentm-opinions-and-gates.md
  • children/agentm-personas.md

Note

LAUNCHED (2026-06-20). The live agentm parent HLD, lifted into tracked wiki/designs/ in AG Phase 2, succeeding the existing memory-os-architecture.md (the HLD crickets points up at, now superseded with a forward-pointer to this doc). Framed around the four Foundations pillars; the deep mechanics of each live in the child designs under children/ (still status: seeded — they get their own passes later, at which point those children/… links resolve). Built on design-doc Appendix B. Diagrams are hand-authored vector images under diagrams/, matching Foundations.

AgentM — the part of the assistant that's yours

A useful assistant feels like an extension of you — it remembers, it has opinions, it knows how you like to work (the Foundations make that case). agentm is the part that makes it true. It's the person: the stateful core that carries the memory, builds experience, forms opinions, and wears different hats for different jobs. Everything that persists lives here. The capabilities it uses to get work done — planning, building, reviewing — are tools (crickets); agentm is who picks them up, and who remembers what happened once they're set down.

This doc is about agentm's insides: the four pillars it's built from, the components that make each one real, and how they fit together. The beliefs underneath it live up in the Foundations.

What agentm is for

agentm carries the goals the whole system shares — continuity, trust, control, durability (see the Foundations) — and adds one that's its own:

  • Growth — agentm doesn't only persist, it accumulates. More memory, sharper opinions, eventually more hats to wear. The tools stay fixed; the person gets better with use.

Everything below serves that: a durable place to keep what's learned, sound judgment about how to use it, and a way to get better with every session.

The four pillars

agentm-the-person is built from four pillars — the same four the Foundations name: Experience, Memory, Opinions, and Personas. Each is an idea, and each is made real by a handful of named components. This doc names those components, shows how they fit, and flags where each touches crickets; the deep mechanics of each pillar live in a child design, linked at the end of its section so this stays readable.

The shape of it — the person and its four pillars, the tools, and the foundation they share:

How agentm and crickets relate: the stateful person — its four pillars Experience, Memory, Opinions, Personas — running on the shared foundation, with the stateless tools drawing on Memory, following Opinions, and wielded by Personas

Each pillar, and the components that make it real:

The four pillars and the components that make each one real: Experience (reflection, scheduled learning, heat policy, idea incubation, import watchlist, the scheduler, dreaming), Memory (memory engine, storage seam, resolution plane, backends, write protocol, recall loop, harness-state I/O, MCP server), Opinions (what done/good/efficient looks like, how we engineer), Personas (the persona tier, rememberer + the roster, the persona gate)

The four pillars and the components that make each one real. (How each pillar relates to crickets is described per pillar below.)

Memory — what it has learned

The durable record: everything agentm knows, kept on disk so it survives a session ending. This is the largest pillar and the ground the other three stand on. Its discipline is one port — every caller reaches storage the same way, through a single seam.

Components:

  • Memory engine — the verbs (save · recall · forget) and the cross-cutting logic that lives exactly once (idempotency, content-hash CAS, soft-delete, token-budgeted recall, link integrity).
  • Storage seam — the one port to disk: a StorageBackend contract, a registry, an opaque Locator, and a three-tier storage taxonomy (source / shared-abstracts / local-index).
  • Resolution plane — finds the store without naming it: the config, the vault_path() resolver, the selector, and two independent capability mechanisms (a request-guard and an availability-query).
  • Backends — interchangeable adapters behind the seam: device-local (agentm's default) and obsidian-vault (a crickets plugin — the live vault). The seam is open: more backends can be added by implementing the same contract (a different store, or a different sync layer), with nothing above the seam changing.
  • Write protocol — concurrency-safe writes: atomic-write + a per-vault mutex + content-hash compare-and-swap, coordinated by primitives with no daemon.
  • Recall loop — two hooks (session-start always-load + a per-prompt five-step search), token-budgeted, over a device-local vector index.
  • Harness-state I/O — plan/progress/feature state, backend-aware, reaching disk the same way memory does.
  • MCP server — an opt-in inbound adapter so external clients can reach the same engine.

How they fit: everything points inward to the seam — engine → resolution → seam → backend — and nothing in the substrate depends on a backend or a tool below it.

Where it touches crickets: the obsidian-vault backend ships as a crickets plugin, depending one-way up on the seam; harness-state is shared with crickets' phase tools; and the availability-query is the runtime half of crickets' enhances: composition.

Detail — the seam contract, the write protocol, the recall loop, the storage-serving layers, the memory layers, and the V5-14 as-built/target gap — in the Memory System design.

Experience — what's worked before, and what's worth knowing

How the person gets better, in two directions. Backward: it learns from its own past — every finished session leaves something behind. Forward: on a schedule (when configured), it goes out and learns from the world — approved sources, feeds, the web — and surfaces what's worth knowing back to you.

Components:

  • Reflection (backward) — mines a finished session's transcript for durable preferences, workflows, and fixes (a reflect.py engine + a Stop-event hook + an idle-recovery hook).
  • Scheduled learning (forward — largely designed) — a periodic, opt-in pass that pulls from approved sources (RSS, the web, named repos) to mine ideas for improving the agent, then surfaces them to you to accept or pass on. The import watchlist (adapt-don't-import: a rubric plus a judge sub-agent) is one element of this — the part that screens external skills worth borrowing.
  • The scheduler (the cron element) — what lets forward learning, and other upkeep, run on a schedule rather than only as in-session hooks.
  • Heat policy — curates which memories load every session, promoting frequently-hit ones and demoting cold ones.
  • Idea incubation — captures a half-formed idea as a skeleton a researcher sub-agent later fills.
  • Dreaming (designed, not built) — a scheduled whole-corpus consolidation pass; the design is locked, the build is still ahead.

How they fit: backward learning runs as session hooks (reflection → Memory / incubation / watchlist); forward learning runs on the scheduler (out to approved sources → surfaced to you); heat curation keeps the always-load set lean; dreaming (future) would consolidate the whole corpus.

Where it touches crickets: lightly — the sub-agents run inside the harness, and forward learning reaches out to sources rather than into crickets. The Experience pillar keeps working on a bare agentm.

Detail — backward vs. forward learning, the scheduler, the approved-source pipeline, the import watchlist, the heat thresholds, incubation, and the full dream-mode design — in the Experience & Dreaming design.

Opinions — how things should go

agentm's opinions are abstract, named buckets of opinionated knowledge — what good work looks like, captured once and given a name. They stay deliberately abstract: an opinion doesn't reach into any tool, and it isn't a tool. A capability asks for an opinion by name and uses it to inform what it does. One opinion can serve many tools, and an opinion can sharpen over time without touching a single tool.

The surfaces — the named opinions agentm holds today, or means to:

  • What "done" looks like — completeness: is the work actually finished? The most deterministic surface; it confirms against a checklist. Its implementation is the check battery — the gates that must pass — plus the written conventions for shape.
  • What "good" looks like — quality: is the work well done? Confirmed by adversarial review — a fresh, skeptical pass that assumes there are flaws and goes looking for them.
  • What's efficient — cost: do the work cheaply when you can, without giving up too much quality. Don't spend tokens (or time) the job doesn't need.
  • How we engineer — process: the phase discipline, how bugs get fixed (the bugfix track), and how to size the approach to the work — a small change needs only a plan, a large one needs a design, a huge one needs an architecture pass before any design.

How they fit: each surface is a named bucket a capability can ask for; together they answer "should this proceed, and is it good enough?" at the moments that matter. The buckets are independent — a tool might consult done and efficient but not good.

Where it touches crickets: by request, not by wiring. A crickets tool names the opinion it needsdone, good, efficient, the engineering process — and the substrate hands back the opinionated knowledge; the tool stays free to act on it. (The check battery does run inside crickets' review/release phases, and the phase commands are crickets per ADR 0011 — but those are one surface's implementation, not the pillar.)

Detail — each surface, the request-by-name model, the gate inventory behind "done," the adversarial-review contract behind "good," the efficiency budget, and the system-sizing ladder (plan → design → architecture) — in the Opinions design.

Personas — the hats it wears

The top tier: a persona is a stance the person takes for a job — a named "who" that composes capabilities, leans on the Opinions it needs, and stands on the memory underneath. Define it once; launch it several ways.

A persona declares:

  • its stance — the cross-capability judgment it makes (this is also what tells a persona apart from a plain tool);
  • what it composes — the capabilities (crickets tools) it wields, named by enhances:;
  • which Opinions it leans on — the Engineer leans on what "done" looks like; the Reviewer on what "good" looks like;
  • how it's adopted — a persona is either explicitly launched in a mode it declares (not all support all) — sub-agent (scoped, returns a result), interactive session (you talk to it in that stance), loop (a cadence, /loop), goal (autonomous toward a goal, /goal) — or automatically adopted when the work calls for it: a crickets workflow puts the fitting persona on for a step (the work phase adopts the Engineer, the review phase the Reviewer), or agentm detects the need mid-conversation and adopts it on the spot.

Memory is the pseudo-persona beneath them all. rememberer isn't a peer in the roster — it sits under every persona, giving each one the memory it stands on. Every other persona is, in effect, "Memory + a stance."

The roster (the person's hats — Memory and the Planner seed exist today; the rest are designed):

Persona Stance Leans on Natural modes
Memory (pseudo) keep the record true, under all personas always-on
Planner (TPM) turn intent into a plan and a board how we engineer loop · sub-agent
Architect shape the broad picture across systems — the HLD/architecture pass how we engineer interactive · goal
Designer design a single system in depth before it's built how we engineer interactive · goal
Tech-Lead hold the technical bar across the work done · good interactive · sub-agent
Engineer (worker) build the thing done · efficient goal · interactive
Reviewer assume it's broken, find the flaw good sub-agent
Operator run things and report — makes no changes loop · sub-agent
Troubleshooter / SRE diagnose failures in complex systems good · how we engineer interactive · sub-agent
Researcher go learn what we don't know what's worth knowing goal · loop
Maintainer keep the house clean (deps, docs, drift) done loop

Scope separates Architect from Designer: the Architect zooms out (the broad picture across systems — an HLD/architecture pass), the Designer zooms in (one system's design). They are the top two rungs of the engineering-sizing ladder — plan → design → architecture.

There is no separate "role." A role is a persona. crickets provides the tools and the packages that bundle them; a package named after the persona that usually wields it (a "worker" bundle) can look like a role, but it's only a correlation of tools. The stance lives in the agentm persona; the tools live in crickets. (This resolves the persona-vs-role question deferred in design-doc §9.6.)

The persona gate (check-personas.py) keeps a persona honest: it hard-requires only substrate-native primitives (requires ⊆ substrate) and carries no always-load weight — so it composes capabilities without becoming a layer everything else depends on.

Where it touches crickets: a persona composes crickets tools by name (enhances:) — lighting up richer behavior when a tool is present and degrading gracefully when it's absent — while hard-requiring only the substrate. And a crickets workflow can adopt a persona for a step (the Engineer for the work phase, the Reviewer for review), so the right stance and the right tool meet at the moment of use.

Detail — each persona's composition, the launch-mode mechanics, the role-is-a-persona resolution, and the cross-capability-judgment discriminator — in the Personas design.

How the pillars fit together

The four feed each other. Experience writes into Memory and, over time, sharpens Opinions; Opinions inform what Experience keeps and what any tool does, on request; Personas sit on top and wield all three plus the tools. Memory is the ground the other three stand on — lose it and the person forgets, and the rest has nothing to act on.

One rule holds across all four: the dependency arrow points one way. The pillars and their components rest on the substrate; crickets tools reach up into the pillars — drawing on Memory, asking Opinions by name, wielded by Personas — and the substrate reaches for nothing below it. A bare agentm — all four pillars, no tools bolted on — is whole on its own.

References

The component-level sources now live in each pillar's child design (linked above). This parent keeps the high-level map.

Child designs (children/)

  • Memory System — the seam, backends, write protocol, recall loop, storage layers, V5-14
  • Experience & Dreaming — reflection, heat, incubation, adapt-watchlist, dreaming
  • Opinions & Gates — the check battery, conventions, phase discipline
  • Personas — the persona tier, the gate, the two personas

Anchors

  • design-doc Appendix B — the ratified agentm Overview this HLD expands (the input spec, not a sibling HLD)
  • Foundations HLD — the four pillars and shared beliefs, inherited by reference
  • wiki/designs/memory-os-architecture.md (v0.1) — the superseded predecessor this HLD succeeds (its "device-local default; vault is a backing plugin" framing predates the V5-7/V5-8 fail-loud change)
  • wiki/designs/agent-memory-evolution.md — the V1→V8 arc context

Amendment log

2026-06-20 — authored, reviewed, and finalized.

Authored 2026-06-19 from the ratified Overview (design-doc Appendix B) and a read-only grounding sweep (components, memory-layers, lifecycle, storage-serving), then restructured through operator review around the four Foundations pillars — Experience · Memory · Opinions · Personas. The parent stays high-level: each pillar names its components and where it touches crickets, and the in-the-weeds mechanics were migrated, not deleted, into four seeded child designs under children/ (memory-system · experience-and-dreaming · opinions-and-gates · personas). Diagrams are hand-authored vector SVGs.

The review rounds settled the model. Opinions = four named, abstract surfaces a tool requests by name — what done looks like (the check battery is its implementation), what good looks like (adversarial review), what's efficient (a budget with a quality floor), and how we engineer (the phase discipline + the plan→design→architecture sizing ladder). Experience = backward (reflection from past sessions) + forward (scheduled, opt-in learning from approved sources), with a scheduler. Personas = a full model: a persona declares a stance + composition + the Opinions it leans on + its launch modes (sub-agent / interactive / loop / goal), and may be adopted explicitly or automatically; Memory is the pseudo-persona beneath all; the Coordinator is renamed Planner; the roster includes the Architect/Designer split by scope. "Role" is retired — a role is a persona, while crickets provides tools + packages — resolving design-doc §9.6.

Honesty calls: forward learning, the scheduler, the request-by-name Opinion registry, the persona roster + adoption modes, and the MCP-server-as-seam-client storage convergence (V5-14) are designed, not built. Approved 2026-06-20; status stays proposed until the Phase-1 lift flips it to launched, and the four children stay status: seeded for their own passes. Re-audit triggers: flip status at the lift; flip each designed component to as-built as it ships; give every child its own voice/structure pass.

2026-06-20 — lifted + launched (AG Phase 2, A0/A1). Lifted into tracked wiki/designs/, flipped status: proposed → launched, and superseded the predecessor memory-os-architecture.md with a forward-pointer (its basename preserved so crickets' up-links resolve). Stamped the AG governance frontmatter: kind: design, scope: arc, area: agentm, governs: [scripts, harness] — deliberately broad until the four children/ lift narrower governs: patterns in Phase 3, at which point most-specific-wins refines resolution automatically. Now resolvable by governs_resolver.py. Re-audit trigger satisfied: status flipped at the lift.

Clone this wiki locally