Skip to content

aptsalt/capsule

Repository files navigation

CAPSULE

The capture + feedback (RL) layer for 8090's Software Factory.

Every AI coding session creates knowledge — the why, the dead-ends, the gotchas, the intent. Today it evaporates the moment the session ends. CAPSULE captures it, scores it, versions it into enterprise skills, and feeds it back so the next developer or agent inherits full context instantly.

CAPSULE turns a finished coding session into a Capsule — a compressed, scored record of what was learned — distilled locally (Ollama qwen2.5-coder:14b), judged by an LLM scorer, stored in portable Backboard memory, versioned into a local skill registry, and promoted at end-of-day into a versioned enterprise skills registry. It's a reinforcement-learning loop for the enterprise SDLC: every session makes the next one cheaper and better. There's also an in-app agent chat so you can feel the warm-start directly — talk to a local model that already remembers.


One root cause → four hackathon themes

CAPSULE is built on a single thesis: context dies at the session boundary. That one root cause is the source of all four "Build for Builders" themes — and CAPSULE solves them together:

Theme How CAPSULE serves it
Handoff The capsule is the handoff artifact. The in-app agent chat lets the next dev/agent inherit it and keep going — warm, not cold.
Productivity & flow Warm-start injection (real Backboard retrieval) + measured token savings — a new session starts oriented, not from scratch.
Code quality & confidence Provenance trail (every skill version → the capsule/finding that produced it) + a measured agentic-CI gate before a version publishes.
Junior developer Each capsule's distilled finding coaches the dev; loading an enterprise skill into the chat hands a junior the senior's distilled knowledge.

The full RL loop

   coding session
        │  ambient capture (Stop-hook → queue → watcher)   ~/.claude/*.jsonl
        ▼
   CAPTURE              compress the real transcript (src/lib/capture.ts)
        ▼
   DISTILL             LOCAL Ollama qwen2.5-coder:14b · CHUNKED map-reduce for big
                        sessions so the WHOLE session is distilled (src/lib/cerebras.ts)
        ▼
   SCORE               LLM-JUDGE transfer score (scoreCapsuleLLM, heuristic fallback)
                        + noveltyLLM                            (src/lib/scorer.ts)
        ▼
   GATE                keep if transferScore ≥ threshold  OR  novelty ≥ 80
        ▼
   BACKBOARD           store the DISTILLED briefing only — live memory
                        (X-API-Key, send_to_llm:false)         (src/lib/backboard.ts)
        ▼
   LOCAL REGISTRY      write the skill bump to ~/.capsule/local-registry on branch
                        `local-deepak` — REAL local git commit, no push
                        (src/lib/local-registry.ts)
        ▼
   END-OF-DAY PROMOTE  one CI-gated PR `local-deepak → master`
                        (scripts/eod-promote.ts)
        ▼
   ENTERPRISE master   published after agentic CI + review
                        (github.com/aptsalt/capsule-enterprise-skills)
        │
        └──────────────►  token savings = the RL reward, feeding the next session

Multi-developer at scale: upgrades are promoted as PRs, tested by an agentic CI (multi-sample A/B vs the current version), deduped when two devs find the same thing, and conflict-resolved (do/undo) by measured reward + recency. The registry also has an opposite pole — purge/retire — so it never silts up.


In-app agent chat + skills composer

A real chat panel (it lives in RightPanel, rendered with react-markdown) lets you talk to a local Ollama agent that has Backboard memory — the whole point of CAPSULE made tangible:

  • Warm by default — when context is on, the latest capsule briefing is injected as ground truth and relevant tenant memory (prior capsules + past chat turns) is recalled from Backboard, so the agent answers oriented instead of cold.
  • Skills composer — a Skills ▾ dropdown (8090 categories: Requirements · Blueprints · Work Orders · Feedback · General, see src/lib/skillCatalog.ts) loads enterprise skills into the chat's system prompt.
  • Context inspectorPOST /api/chat/context shows exactly what the agent would see and what touches Backboard, without running a generation. The observability seam for the whole feature.
  • Durable + capturable — every conversation is saved under ~/.relay/chats/<id>.json with an LLM-generated title, so a chat can later be distilled into a capsule just like a Claude Code session.

Routes: POST /api/chat (streamed) · POST /api/chat/context · GET /api/chats · POST /api/chats/save. Libs: src/lib/chatContext.ts · chats.ts · skillCatalog.ts. Shipped as PR #1 (merged).


Quickstart

npm install
# Local model (primary distiller + chat agent) — install Ollama, then:
ollama pull qwen2.5-coder:14b
# Optional: live Backboard memory + Cerebras cloud distill
cp .env.example .env.local   # add BACKBOARD_API_KEY (and optionally CEREBRAS_API_KEY)
npm run dev                  # http://localhost:3010
  • No keys required to run: distillation falls back local-only (Ollama → heuristic), Backboard → local JSON store under ~/.relay.
  • With BACKBOARD_API_KEY, capsules and chat turns are written to live Backboard memory (app.backboard.io/api, X-API-Key, send_to_llm:false).
  • Env keys (all optional): BACKBOARD_API_KEY, CEREBRAS_API_KEY, OLLAMA_URL, RELAY_OLLAMA_MODEL.

Or run the whole stack with Docker

docker compose up                                          # app + a containerized Ollama
docker compose exec ollama ollama pull qwen2.5-coder:14b   # one-time model pull
# → http://localhost:3010

Docker shows up in three places: the app container (Dockerfile), the whole stack (docker-compose.yml, app + Ollama), and the handoff devcontainer (.devcontainer/) — a capsule ships its runtime, not just its notes. See docs/TECH-STACK.html.


Ambient capture (Stop-hook + watcher)

Sessions are captured automatically as they close — no manual button.

  1. Stop hook (in ~/.claude/settings.json) fires the instant a session turn finishes and runs the fast, non-blocking enqueuer:

    { "Stop": [ { "hooks": [
      { "type": "command",
        "command": "cmd /c start /b node \"C:/Users/deepc/relay/scripts/capture-enqueue.js\"" }
    ] } ] }

    capture-enqueue.js only appends the just-finished transcript path to a queue and exits in microseconds — it never calls a model, never touches Backboard, never does git, so it can never slow Claude Code down.

  2. Watcher (long-running, out-of-band) drains the queue and also scans ~/.claude/projects for sessions gone idle (~10 min = "closed"), dedups against a persistent processed.json, and runs the real pipeline (capture → distill → score → gate → store → bump) for each genuinely new session:

    cd C:/Users/deepc/relay && npx tsx scripts/capture-watcher.ts

    See the OPERATIONS block at the bottom of scripts/capture-watcher.ts for Task Scheduler setup + how to disable. Manual single-session capture: npm run capsule (scripts/make-capsule.ts).


Architecture

A Next.js 15 App Router app. Two halves: a real server-side pipeline (src/lib + src/app/api) and a single-page workspace UI. Each pipeline stage degrades gracefully — the demo must work offline with no keys.

src/lib/

File Role
capture.ts Read + compress real ~/.claude session transcripts (server-only).
cerebras.ts Distiller — local Ollama primary → Cerebras optional → heuristic. Chunked map-reduce for big sessions.
scorer.ts LLM-judge transfer score (scoreCapsuleLLM, heuristic fallback) + noveltyLLM.
backboard.ts Live Backboard memory — assistant-per-tenant, thread-per-project, retrieveMemory.
chatContext.ts · chats.ts · skillCatalog.ts The in-app agent chat: shared context builder, durable chat store, composer menu.
promote.ts Live, on-demand promotion of a capsule into a proposed enterprise skill version (staged, not force-merged).
local-registry.ts The local half of the loop — bumpSkillLocal writes SKILL.md + CHANGELOG + a real git commit on local-deepak.
eval.ts The real eval harness — multi-sample paired A/B (mean ± stdev, consistent-direction, real token counts) + regression check.
purge.ts Skill retirement — active → deprecated → archived → purged with a PURGE-LEDGER. Dry-run default.
metrics.ts Dashboard roll-up computed from real entities (no hand-set numbers).
data.ts The generated dataset (data.mock.ts is the seeded backup).
capsule.ts · selectors.ts · store.ts (Zustand) · types.ts · docs.ts Type backbone, selectors, UI state, doc model.

src/app/api/capsule · capsules · sessions · skills · graph · inherit · promote · chat · chat/context · chats · chats/save. POST /api/capsule runs the full capture→distill→score→store flow.

src/components/ — TopBar · Sidebar · DocumentEditor · RightPanel (the agent chat) · ForceGraph · SkillCard · ui.tsx, plus panels/ (KnowledgeGraph · Skills · Versions · AbTrials · Capture · Inherit).

scripts/capture-enqueue.js (Stop-hook) · capture-watcher.ts (ambient) · eod-promote.ts (end-of-day PR) · purge-skills.ts (retire) · eval-ab.ts · build-real-dataset.ts · make-capsule.ts.

Stack: Next.js 15 (App Router) · React 19 · TypeScript (strict) · Tailwind v4 · Zustand 5 · react-markdown. Local Ollama primary, Cerebras optional, live Backboard.


Enterprise registry + multi-dev link

The published skills live in a separate, public repo: github.com/aptsalt/capsule-enterprise-skills.

  • master = the enterprise head: 28 skills (13 capsule-distilled + 15 popular engineering seeds).
  • dee / ven / saim = personal/local developer repos, each pinning a unique role-aligned set of 5.
  • Promotion is by PR + agentic CI (multi-sample A/B + regression), recorded in MERGE-LEDGER.md and governed by PROMOTION.md. Multi-dev reconciliation is real: dedup (ML-001) and do/undo conflict (ML-002) are in the ledger.
  • Purge/retire mirrors promotion in reverse, ledgered in PURGE-LEDGER.md.

Pull a pinned, reproducible version:

capsule pull skill/<id>@<ver>     # exact version
capsule pull skill/<id>           # latest on master

What's real (honesty note)

The pipeline is real: CAPSULE reads your real sessions, distills + scores + stores capsules locally / in live Backboard, bumps a real local git registry, and opens a real enterprise PR. But the labels matter:

  • Scoring is LLM-judged, not trainedscoreCapsuleLLM asks the local model; it is not a learned reward model.
  • Eval is a multi-sample measured proxy — mean ± stdev over real Ollama token counts with a consistent-direction signal, not a t-test or p-value claim.
  • A thin layer is derived — novelty/importance heuristics, non-A/B reuse estimates, and requirements/work-order scaffolding.

docs/DATA-REALITY.html is the canonical, line-by-line what's-real-vs-derived breakdown.

Self-capture proof

CAPSULE captured its own build session: capsule CAP-SESSION-1a6fcc9b (session 1a6fcc9b, project relay), distilled locally on qwen2.5-coder:14b, became skill/ui-modularity@1.0.0 and was promoted into enterprise master (PR #4). Independently, dee promoted rest-api-design@1.0.1 + oauth2-jwt-auth@1.0.1. The loop closed on itself — that is the proof it runs.


Documentation (/docs)

Open any of these in a browser:

Doc What it is
docs/CAPSULE-LAUNCH.html The launch site — every feature with screenshots + video
docs/DEMO-SCRIPT.html · docs/DEMO-SCRIPT.md The 3-minute pitch, mapped to the four themes + Q&A cheat-sheet
docs/PITCH.html The pitch deck
docs/RL-LOOP.html The full RL-loop architecture diagram
docs/TECH-STACK.html Full stack, where local LLM + Cerebras live in code, Docker, cloud roadmap
docs/BACKEND.html The working backend architecture (pipeline + APIs + Backboard)
docs/MULTI-DEV.html Multi-dev flow: promotion, agentic CI, dedup, do/undo conflict
docs/AGENTIC-VS-MANUAL.html The two capsule-creation flows, side by side
docs/FEATURES.html Plain-language explainer of every feature
docs/REPO-FLOW.html Enterprise registry vs personal repo flow
docs/DATA-REALITY.html Honest what's-real-vs-derived breakdown (canonical)
docs/ARCHITECTURE.html · docs/ARCHITECTURE.md Architecture notes
docs/VALUE.md · docs/MEMORY-MODEL.md Engineering notes
docs/UX-AUDIT.html · docs/UX-AUDIT.md · docs/RELAY.html UX audit + legacy RELAY note
docs/README.md Docs index
docs/factory.html · docs/factory-v1.html · docs/index.html Earlier standalone HTML prototypes

Roadmap

  • Hosted Backboard tenants — per-org assistant isolation + SSO, so a real team shares one warm memory.
  • Trained reward model — replace the LLM-judge with a model fine-tuned on accepted-vs-rejected capsules.
  • Real agentic CI runner — move the multi-sample A/B into GitHub Actions on the enterprise repo.
  • capsule CLI — first-class capsule pull / status / promote instead of git plumbing.
  • IDE surface — warm-start injection inside the editor, not just the in-app chat.

Deepak Singh Kandari · github.com/aptsalt

About

CAPSULE — the capture+feedback (RL) layer for 8090 Software Factory: local-model capsule distillation, live Backboard memory, versioned enterprise skills, multi-dev flow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages