The capture + feedback (RL) layer for 8090's Software Factory.
Every AI coding session creates knowledge — the why, the dead-ends, the gotchas, the intent. Today it evaporates the moment the session ends. CAPSULE captures it, scores it, versions it into enterprise skills, and feeds it back so the next developer or agent inherits full context instantly.
CAPSULE turns a finished coding session into a Capsule — a compressed, scored record of what was
learned — distilled locally (Ollama qwen2.5-coder:14b), judged by an LLM scorer, stored in
portable Backboard memory, versioned into a local skill registry, and promoted at end-of-day into a
versioned enterprise skills registry. It's a reinforcement-learning loop for the enterprise SDLC: every
session makes the next one cheaper and better. There's also an in-app agent chat so you can feel the
warm-start directly — talk to a local model that already remembers.
CAPSULE is built on a single thesis: context dies at the session boundary. That one root cause is the source of all four "Build for Builders" themes — and CAPSULE solves them together:
| Theme | How CAPSULE serves it |
|---|---|
| Handoff | The capsule is the handoff artifact. The in-app agent chat lets the next dev/agent inherit it and keep going — warm, not cold. |
| Productivity & flow | Warm-start injection (real Backboard retrieval) + measured token savings — a new session starts oriented, not from scratch. |
| Code quality & confidence | Provenance trail (every skill version → the capsule/finding that produced it) + a measured agentic-CI gate before a version publishes. |
| Junior developer | Each capsule's distilled finding coaches the dev; loading an enterprise skill into the chat hands a junior the senior's distilled knowledge. |
coding session
│ ambient capture (Stop-hook → queue → watcher) ~/.claude/*.jsonl
▼
CAPTURE compress the real transcript (src/lib/capture.ts)
▼
DISTILL LOCAL Ollama qwen2.5-coder:14b · CHUNKED map-reduce for big
sessions so the WHOLE session is distilled (src/lib/cerebras.ts)
▼
SCORE LLM-JUDGE transfer score (scoreCapsuleLLM, heuristic fallback)
+ noveltyLLM (src/lib/scorer.ts)
▼
GATE keep if transferScore ≥ threshold OR novelty ≥ 80
▼
BACKBOARD store the DISTILLED briefing only — live memory
(X-API-Key, send_to_llm:false) (src/lib/backboard.ts)
▼
LOCAL REGISTRY write the skill bump to ~/.capsule/local-registry on branch
`local-deepak` — REAL local git commit, no push
(src/lib/local-registry.ts)
▼
END-OF-DAY PROMOTE one CI-gated PR `local-deepak → master`
(scripts/eod-promote.ts)
▼
ENTERPRISE master published after agentic CI + review
(github.com/aptsalt/capsule-enterprise-skills)
│
└──────────────► token savings = the RL reward, feeding the next session
Multi-developer at scale: upgrades are promoted as PRs, tested by an agentic CI (multi-sample A/B vs the current version), deduped when two devs find the same thing, and conflict-resolved (do/undo) by measured reward + recency. The registry also has an opposite pole — purge/retire — so it never silts up.
A real chat panel (it lives in RightPanel, rendered with react-markdown) lets you talk to a local
Ollama agent that has Backboard memory — the whole point of CAPSULE made tangible:
- Warm by default — when context is on, the latest capsule briefing is injected as ground truth and relevant tenant memory (prior capsules + past chat turns) is recalled from Backboard, so the agent answers oriented instead of cold.
- Skills composer — a
Skills ▾dropdown (8090 categories: Requirements · Blueprints · Work Orders · Feedback · General, seesrc/lib/skillCatalog.ts) loads enterprise skills into the chat's system prompt. - Context inspector —
POST /api/chat/contextshows exactly what the agent would see and what touches Backboard, without running a generation. The observability seam for the whole feature. - Durable + capturable — every conversation is saved under
~/.relay/chats/<id>.jsonwith an LLM-generated title, so a chat can later be distilled into a capsule just like a Claude Code session.
Routes: POST /api/chat (streamed) · POST /api/chat/context · GET /api/chats · POST /api/chats/save.
Libs: src/lib/chatContext.ts · chats.ts · skillCatalog.ts. Shipped as PR #1 (merged).
npm install
# Local model (primary distiller + chat agent) — install Ollama, then:
ollama pull qwen2.5-coder:14b
# Optional: live Backboard memory + Cerebras cloud distill
cp .env.example .env.local # add BACKBOARD_API_KEY (and optionally CEREBRAS_API_KEY)
npm run dev # http://localhost:3010- No keys required to run: distillation falls back local-only (Ollama → heuristic), Backboard → local
JSON store under
~/.relay. - With
BACKBOARD_API_KEY, capsules and chat turns are written to live Backboard memory (app.backboard.io/api,X-API-Key,send_to_llm:false). - Env keys (all optional):
BACKBOARD_API_KEY,CEREBRAS_API_KEY,OLLAMA_URL,RELAY_OLLAMA_MODEL.
docker compose up # app + a containerized Ollama
docker compose exec ollama ollama pull qwen2.5-coder:14b # one-time model pull
# → http://localhost:3010Docker shows up in three places: the app container (Dockerfile), the whole stack
(docker-compose.yml, app + Ollama), and the handoff devcontainer (.devcontainer/) — a capsule ships
its runtime, not just its notes. See docs/TECH-STACK.html.
Sessions are captured automatically as they close — no manual button.
-
Stop hook (in
~/.claude/settings.json) fires the instant a session turn finishes and runs the fast, non-blocking enqueuer:{ "Stop": [ { "hooks": [ { "type": "command", "command": "cmd /c start /b node \"C:/Users/deepc/relay/scripts/capture-enqueue.js\"" } ] } ] }capture-enqueue.jsonly appends the just-finished transcript path to a queue and exits in microseconds — it never calls a model, never touches Backboard, never does git, so it can never slow Claude Code down. -
Watcher (long-running, out-of-band) drains the queue and also scans
~/.claude/projectsfor sessions gone idle (~10 min = "closed"), dedups against a persistentprocessed.json, and runs the real pipeline (capture → distill → score → gate → store → bump) for each genuinely new session:cd C:/Users/deepc/relay && npx tsx scripts/capture-watcher.ts
See the
OPERATIONSblock at the bottom ofscripts/capture-watcher.tsfor Task Scheduler setup + how to disable. Manual single-session capture:npm run capsule(scripts/make-capsule.ts).
A Next.js 15 App Router app. Two halves: a real server-side pipeline (src/lib + src/app/api) and a
single-page workspace UI. Each pipeline stage degrades gracefully — the demo must work offline with no
keys.
src/lib/
| File | Role |
|---|---|
capture.ts |
Read + compress real ~/.claude session transcripts (server-only). |
cerebras.ts |
Distiller — local Ollama primary → Cerebras optional → heuristic. Chunked map-reduce for big sessions. |
scorer.ts |
LLM-judge transfer score (scoreCapsuleLLM, heuristic fallback) + noveltyLLM. |
backboard.ts |
Live Backboard memory — assistant-per-tenant, thread-per-project, retrieveMemory. |
chatContext.ts · chats.ts · skillCatalog.ts |
The in-app agent chat: shared context builder, durable chat store, composer menu. |
promote.ts |
Live, on-demand promotion of a capsule into a proposed enterprise skill version (staged, not force-merged). |
local-registry.ts |
The local half of the loop — bumpSkillLocal writes SKILL.md + CHANGELOG + a real git commit on local-deepak. |
eval.ts |
The real eval harness — multi-sample paired A/B (mean ± stdev, consistent-direction, real token counts) + regression check. |
purge.ts |
Skill retirement — active → deprecated → archived → purged with a PURGE-LEDGER. Dry-run default. |
metrics.ts |
Dashboard roll-up computed from real entities (no hand-set numbers). |
data.ts |
The generated dataset (data.mock.ts is the seeded backup). |
capsule.ts · selectors.ts · store.ts (Zustand) · types.ts · docs.ts |
Type backbone, selectors, UI state, doc model. |
src/app/api/ — capsule · capsules · sessions · skills · graph · inherit · promote ·
chat · chat/context · chats · chats/save. POST /api/capsule runs the full
capture→distill→score→store flow.
src/components/ — TopBar · Sidebar · DocumentEditor · RightPanel (the agent chat) · ForceGraph ·
SkillCard · ui.tsx, plus panels/ (KnowledgeGraph · Skills · Versions · AbTrials · Capture · Inherit).
scripts/ — capture-enqueue.js (Stop-hook) · capture-watcher.ts (ambient) · eod-promote.ts
(end-of-day PR) · purge-skills.ts (retire) · eval-ab.ts · build-real-dataset.ts · make-capsule.ts.
Stack: Next.js 15 (App Router) · React 19 · TypeScript (strict) · Tailwind v4 · Zustand 5 · react-markdown. Local Ollama primary, Cerebras optional, live Backboard.
The published skills live in a separate, public repo: github.com/aptsalt/capsule-enterprise-skills.
master= the enterprise head: 28 skills (13 capsule-distilled + 15 popular engineering seeds).dee/ven/saim= personal/local developer repos, each pinning a unique role-aligned set of 5.- Promotion is by PR + agentic CI (multi-sample A/B + regression), recorded in
MERGE-LEDGER.mdand governed byPROMOTION.md. Multi-dev reconciliation is real: dedup (ML-001) and do/undo conflict (ML-002) are in the ledger. - Purge/retire mirrors promotion in reverse, ledgered in
PURGE-LEDGER.md.
Pull a pinned, reproducible version:
capsule pull skill/<id>@<ver> # exact version
capsule pull skill/<id> # latest on masterThe pipeline is real: CAPSULE reads your real sessions, distills + scores + stores capsules locally / in live Backboard, bumps a real local git registry, and opens a real enterprise PR. But the labels matter:
- Scoring is LLM-judged, not trained —
scoreCapsuleLLMasks the local model; it is not a learned reward model. - Eval is a multi-sample measured proxy — mean ± stdev over real Ollama token counts with a consistent-direction signal, not a t-test or p-value claim.
- A thin layer is derived — novelty/importance heuristics, non-A/B reuse estimates, and requirements/work-order scaffolding.
docs/DATA-REALITY.html is the canonical, line-by-line what's-real-vs-derived
breakdown.
CAPSULE captured its own build session: capsule CAP-SESSION-1a6fcc9b (session 1a6fcc9b, project
relay), distilled locally on qwen2.5-coder:14b, became skill/ui-modularity@1.0.0 and was promoted
into enterprise master (PR #4). Independently, dee promoted rest-api-design@1.0.1 +
oauth2-jwt-auth@1.0.1. The loop closed on itself — that is the proof it runs.
Open any of these in a browser:
| Doc | What it is |
|---|---|
docs/CAPSULE-LAUNCH.html |
The launch site — every feature with screenshots + video |
docs/DEMO-SCRIPT.html · docs/DEMO-SCRIPT.md |
The 3-minute pitch, mapped to the four themes + Q&A cheat-sheet |
docs/PITCH.html |
The pitch deck |
docs/RL-LOOP.html |
The full RL-loop architecture diagram |
docs/TECH-STACK.html |
Full stack, where local LLM + Cerebras live in code, Docker, cloud roadmap |
docs/BACKEND.html |
The working backend architecture (pipeline + APIs + Backboard) |
docs/MULTI-DEV.html |
Multi-dev flow: promotion, agentic CI, dedup, do/undo conflict |
docs/AGENTIC-VS-MANUAL.html |
The two capsule-creation flows, side by side |
docs/FEATURES.html |
Plain-language explainer of every feature |
docs/REPO-FLOW.html |
Enterprise registry vs personal repo flow |
docs/DATA-REALITY.html |
Honest what's-real-vs-derived breakdown (canonical) |
docs/ARCHITECTURE.html · docs/ARCHITECTURE.md |
Architecture notes |
docs/VALUE.md · docs/MEMORY-MODEL.md |
Engineering notes |
docs/UX-AUDIT.html · docs/UX-AUDIT.md · docs/RELAY.html |
UX audit + legacy RELAY note |
docs/README.md |
Docs index |
docs/factory.html · docs/factory-v1.html · docs/index.html |
Earlier standalone HTML prototypes |
- Hosted Backboard tenants — per-org assistant isolation + SSO, so a real team shares one warm memory.
- Trained reward model — replace the LLM-judge with a model fine-tuned on accepted-vs-rejected capsules.
- Real agentic CI runner — move the multi-sample A/B into GitHub Actions on the enterprise repo.
capsuleCLI — first-classcapsule pull / status / promoteinstead of git plumbing.- IDE surface — warm-start injection inside the editor, not just the in-app chat.
Deepak Singh Kandari · github.com/aptsalt