Persistent local memory for AI coding agents. Your agent remembers every session, learns from its mistakes, and gets smarter the longer you work with it.
Claude Code, Cursor, Codex, and every other MCP client forget everything when a session ends. LLM Wiki Memory fixes that: it captures your conversations, compiles them into durable project knowledge and lessons your agent applies next time, and recalls the right context through a local MCP server. Memory lives on your machine as plain Markdown in an LLM wiki versioned in git, searched with local embeddings, and consolidated offline while you sleep.
No RAG stack. No vector database. No Docker. No cloud. Install with one prompt and your agent never starts from zero again.
Paste this one-liner into your AI coding agent (copy button on the right) — it covers both a fresh install and an update of an existing one. The full procedure lives in AI-INSTALL-PROMPT.md; the agent fetches and follows it:
Set up llm-wiki-memory in this project: fetch https://raw.githubusercontent.com/ctxr-dev/llm-wiki-memory/main/AI-INSTALL-PROMPT.md and follow it EXACTLY (it covers fresh install and update; if already installed, the same file is local at @.llm-wiki-memory/src/AI-INSTALL-PROMPT.md).
Or run it yourself — fresh install:
git clone https://github.com/ctxr-dev/llm-wiki-memory ./.llm-wiki-memory/src
./.llm-wiki-memory/src/bootstrap.sh # add --commit-memory to commit the wiki
./.llm-wiki-memory/src/bootstrap.sh --schedule daily # optional: hourly cron / launchdUpdate an existing install:
git -C .llm-wiki-memory/src fetch origin
# Runbooks you have NOT applied yet — READ THESE FIRST, oldest → newest:
git -C .llm-wiki-memory/src diff --name-only HEAD origin/main -- docs/releases | grep 'update-prompt\.md$' | sort
git -C .llm-wiki-memory/src merge --ff-only origin/main
( cd .llm-wiki-memory/src && npm install --no-audit --no-fund )
./.llm-wiki-memory/src/bootstrap.sh # idempotent; runbooks may add one-shot steps + verificationThe bootstrap is idempotent — re-running preserves your edits to .env and your rule files.
What bootstrap does (8 steps)
- Installs dependencies in
./.llm-wiki-memory/src. - Auto-detects the LLM provider:
claudeCLI →codexCLI →ANTHROPIC_API_KEY→OPENAI_API_KEY→MEMORY_LLM_BASE_URL→ ollama at:11434→mock(with a stderr warning). - Writes
./.llm-wiki-memory/settings/.env(preserves your edits on re-run). - Merges hooks into
.claude/settings.jsonand the stdio server into.mcp.json. - Renders vendor-neutral configs into
.agents/and discipline rules into.agents/rules/,.claude/skills/,.claude/rules/,.cursor/rules/. - Materialises the hosted wiki at
./.llm-wiki-memory/wiki(with the layout template that declaresconsolidate: refine | noneper category) and validates it. - Adds
/.llm-wiki-memoryto.gitignore(--commit-memorycommits the wiki instead). - Optionally installs the hourly compile + consolidate cron via a wrapper script (
--schedule daily).
Register with a non-Claude client
./.llm-wiki-memory/src/scripts/mcp-config.sh cursor # .cursor/mcp.json
./.llm-wiki-memory/src/scripts/mcp-config.sh codex # ~/.codex/config.toml
./.llm-wiki-memory/src/scripts/mcp-config.sh claude-desktop # claude_desktop_config.json
./.llm-wiki-memory/src/scripts/mcp-config.sh allEverything lives in a local .llm-wiki-memory/ folder. No vector DB, no container, no API service to run.
Every memory is a markdown leaf with full history, maintained by @ctxr/skill-llm-wiki. Every change commits itself to the wiki's own repo with what, when, and why in the message (one commit per save, flush, compile, or consolidate run), so git log alone explains how your memory evolved. Disable via wiki.autoCommit; your project repo is never touched.
Self-improvement lessons save only with explicit user consent. Three layers of enforcement: discipline instructions, a Claude Code hook enabled by default (disable via gate.claudeHookEnabled), and an airtight MCP server-side gate (covers Cursor, Codex, generic clients).
Long sessions are chunked and distilled in pieces (header-aware → paragraph fallback → hard cut), so a 100K-char transcript never single-passes its way into a CLI timeout. Failed runs persist a full-body stash + structured audit; one cli.mjs redistill retries with no data loss.
A YAML-declared provider chain (anthropic API → openai API → claude CLI → codex CLI → cursor CLI) and per-provider model fallback lists let a deprecated model or a missing CLI cascade automatically — without inlining model names in code.
An hourly cron + a search-driven orchestrator deduplicate near-identical leaves, archive stale entries, and optionally rewrite bodies via the same LLM the rest of the pipeline uses. Never hard-deletes; always reversible.
Health is judged per entity, not per run: every cron tick keeps a slim attempt entry (last consolidate.attemptsKeep runs) plus a full sharded log under state/logs/<yyyy>/<mm>/ for deep diagnosis. A failure that resolves on a later tick stays silent; an entity still failing after consolidate.escalateAfterAttempts consecutive runs (or one error signature recurring across many entities) escalates into a redacted skeleton issue report at issues/<yyyy>/<mm>/<dd>/<signature>.<version>.md that your next session surfaces and offers to investigate — ready to copy upstream or turn into a fix PR.
Transformer embeddings rank queries on-device (default Xenova/bge-large-en-v1.5). One setting swaps in a lighter model — or falls back to a lexical scorer with no model download.
Every category declares its consolidation eligibility in <wiki>/.layout/layout.yaml (consolidate: refine | none). No magic defaults — author intent is always in plain view.
Paste one prompt into your agent or run one script. Idempotent.
RAG memory stacks are powerful but heavy: a vector database, a container, an embedding service, ongoing ops. For small and medium projects that overhead is rarely worth it, yet you still want the agent to remember everything and improve itself across sessions.
llm-wiki-memory gives you that loop with a local hosted wiki as the substrate. Every category stays a nested tree (never a flat pile of files): non-daily categories nest by the metadata facets you search by; daily by date; an additional subject axis scatters leaves by what they're about. Git history and validation come free, and the tree stays readable by humans. Recall runs on local embeddings — nothing leaves your machine.
%%{init: {"theme":"base","flowchart":{"curve":"linear"},"themeVariables":{"lineColor":"#00B8C4","primaryColor":"#0D0D14","primaryTextColor":"#FCEE0A","primaryBorderColor":"#FCEE0A","secondaryColor":"#16161E","tertiaryColor":"#16161E","clusterBkg":"#16161E","clusterBorder":"#00B8C4","edgeLabelBackground":"#0D0D14","textColor":"#00B8C4"}}}%%
flowchart TD
S[AI session]
S -- "pre/post-compact, session-end hooks" --> FL[flush: extract typed atoms]
S -- "ExitPlanMode hook" --> PL[plans tree]
FL --> DA[daily tree]
DA -- "hourly cron-job + session-start hook" --> CMP[compile: promote daily atoms]
CMP --> KSI[knowledge + self_improvement trees]
CMP -. supersedes daily source .-> DA
KSI -- "hourly cron-job + skill rule" --> CN[consolidate: search-driven refinement]
CN --> MG[dedup + LLM merge near-duplicates]
CN --> RF[staleness + LLM semantic refresh]
CN --> HK[orphan / compress / GC / index]
MG --> KSI
RF --> KSI
HK --> KSI
CR[hourly cron tick] --> LG[state/.consolidate-attempts.log]
LG -. "cron-health surfaces unresolved errors" .-> S
AG[Agent recall calls] --> EM[embed.mjs: local embeddings]
EM --> KSI
AG --> PL
The loop in one sentence: session hooks capture typed atoms into daily/; the hourly cron promotes them into knowledge/ and self_improvement/ (compile) and then refines those trees over time (consolidate); every recall hits the same embedding index; every cron attempt logs its outcome so the next session can surface unresolved failures.
The flush worker (PostCompact / SessionEnd hooks) chunks oversized transcripts and runs each chunk through a provider/model chain. A clean "nothing durable" verdict writes no leaf at all (the breadcrumb log keeps visibility); a partial or total failure preserves the full body to a stash so cli.mjs redistill can re-attempt later with no data loss.
%%{init: {"theme":"base","flowchart":{"curve":"linear"},"themeVariables":{"lineColor":"#00B8C4","primaryColor":"#0D0D14","primaryTextColor":"#FCEE0A","primaryBorderColor":"#FCEE0A","secondaryColor":"#16161E","tertiaryColor":"#16161E","clusterBkg":"#16161E","clusterBorder":"#00B8C4","edgeLabelBackground":"#0D0D14","textColor":"#00B8C4"}}}%%
flowchart TD
SRC["source.body<br/>(redacted, ≤MAX_CHARS)"]
SRC --> CK{"size > chunk<br/>threshold?"}
CK -- no --> SP[single-pass distill]
CK -- yes --> CH["chunk by:<br/>1. ### User/Assistant headers<br/>2. paragraph breaks<br/>3. hard cut (last resort)"]
CH --> MAP["map: distill each chunk<br/>via provider chain"]
MAP --> RED["reduce: LLM merge atoms<br/>(depth-capped, deterministic fallback)"]
SP --> WR["write daily leaf<br/>+ audit frontmatter"]
RED --> WR
MAP -.->|any chunk failed| STASH["state/failed-distill-*.json<br/>(full body + audit)"]
MAP -.->|all chunks failed| RAW["raw-fallback leaf<br/>(FULL body, fenced as UNTRUSTED)"]
STASH -.->|"cli.mjs redistill"| CH
The audit fields recorded on every leaf — chunks_total, chunks_succeeded, failed_chunks, provider_chain_tried, final_provider — make every distillation reproducible from frontmatter alone. Redistilled leaves carry redistilled_from, redistill_attempts, and original_outcome.
Self-improvement lessons are propose-then-confirm: the agent NEVER calls save_lesson (or save_to_dataset(dataset="self_improvement", ...) / write_memory(datasetId="self_improvement", ...)) on its own. It proposes the save in chat, waits for an explicit user yes in the same turn, then calls the tool with userRequested: true. The server refuses gated writes without the flag.
Three enforcement layers, defence-in-depth:
Reconciliation: layers are independent and additive. Any one of them can refuse a save. The model can NOT bypass them: it can't suppress the discipline (sent at initialize), can't disable the Claude Code hook from inside a tool call, and can't forge the userRequested flag (the only legitimate-bypass path is the internal withSystemMaintenance async frame that consolidate uses for its own bookkeeping — entered only by the orchestrator's own code, never by a client request body).
Knowledge, plans, investigations, daily, and tracker-issue writes are not gated — their routing rules apply directly. Set gate.selfImprovementEnabled: false in settings.yaml to disable the server-side check as an operator escape hatch (the other two layers still apply). Set gate.claudeHookEnabled: false to disable the Claude Code hook the same way: it exits with no decision and the normal permission flow applies.
The consolidate orchestrator runs hourly via the cron (chained after compile) and at session end via a hook-less skill rule. It walks the layout-declared consolidate: refine categories and refines each leaf against its similarity cluster.
%%{init: {"theme":"base","flowchart":{"curve":"linear"},"themeVariables":{"lineColor":"#00B8C4","primaryColor":"#0D0D14","primaryTextColor":"#FCEE0A","primaryBorderColor":"#FCEE0A","secondaryColor":"#16161E","tertiaryColor":"#16161E","clusterBkg":"#16161E","clusterBorder":"#00B8C4","edgeLabelBackground":"#0D0D14","textColor":"#00B8C4"}}}%%
flowchart TD
A["active leaves in layout-declared<br/>'consolidate: refine' categories"]
A --> B{"per-leaf loop"}
B -- "(1) LOCAL EMBEDDING<br/>searchMemoryFiltered + cosine" --> C["similarity cluster<br/>top-K above scoreThreshold"]
C --> D["dedup-by-sha256<br/>(deterministic, exact body hash)"]
C --> E["dedup-by-lesson-key<br/>(deterministic, error_pattern key)"]
C --> F["dedup-by-cosine<br/>(deterministic, cosine ≥ 0.97)"]
D --> G[" (keeper, loser) candidates "]
E --> G
F --> G
G -- "(2) LLM CALL (if provider available)" --> H["llm-merge-near-duplicates<br/>LLM rewrites keeper body from both inputs"]
G -- "LLM disabled / unreachable" --> I["archive loser as-is (deterministic fallback)"]
H --> I
I --> J{more leaves?}
J -- yes --> B
J -- no --> K["corpus passes"]
K --> L["staleness-flag<br/>(deterministic, atom-type + age)"]
L -- "(3) LLM CALL (if provider available)" --> M["llm-semantic-refresh<br/>LLM: keep / rewrite / archive"]
L -- "LLM disabled" --> N["leave flag for next run"]
M --> O["prune-orphan-leaves<br/>(deterministic, no inbound link + old)"]
N --> O
O --> P["compress-archived<br/>(deterministic, preserves sha256)"]
P --> Q["housekeeping<br/>prune-empty-ancestors / gc-embeddings / index-rebuild"]
(1) Local embedding lights up only inside the per-leaf cluster lookup. The bge model runs on-device; nothing leaves your machine to find which leaves are similar. Cosine similarity (a pure math op) then ranks the cluster — also local.
(2) LLM call · merge near-duplicates runs once per (keeper, loser) pair found by any of the three dedup passes — but only when an LLM provider is reachable. The LLM sees both bodies + frontmatter, decides whether to merge them into one fresher body or leave the keeper as-is. If the provider is missing or the call fails, consolidate falls back to "archive the loser unchanged" so the run never blocks.
(3) LLM call · semantic refresh runs once per stale-flagged leaf, capped at consolidate.refreshMaxPerRun. The LLM sees the leaf + its current cluster context and chooses keep / rewrite / archive. The deterministic staleness-flag pass nominates candidates; the LLM only acts when it can.
Why each pass:
A memory store that only ever GROWS becomes a graveyard. Bug root-causes get fixed permanently. Feedback rules get reversed. Pattern-gotchas survive an API rename and start pointing at functions that no longer exist. Without a way to revisit aged knowledge, recall starts surfacing leaves that contradict the current codebase — and your agent confidently gives advice that was correct two quarters ago.
consolidate's answer is a deliberate two-step pipeline. The cheap deterministic step nominates candidates; the expensive LLM step judges them.
%%{init: {"theme":"base","flowchart":{"curve":"linear"},"themeVariables":{"lineColor":"#00B8C4","primaryColor":"#0D0D14","primaryTextColor":"#FCEE0A","primaryBorderColor":"#FCEE0A","secondaryColor":"#16161E","tertiaryColor":"#16161E","clusterBkg":"#16161E","clusterBorder":"#00B8C4","edgeLabelBackground":"#0D0D14","textColor":"#00B8C4"}}}%%
flowchart TD
A["all active leaves in<br/>refine-eligible categories"]
A --> B{"atom_type eligible<br/>(self-improvement-lesson / bug-root-cause /<br/>feedback-rule / pattern-gotcha)<br/>AND last_recalled_at > N months?"}
B -- "no" --> SKIP["leave as-is"]
B -- "yes (deterministic, ~1ms/leaf)" --> F["memory.stale = true<br/>(reversible — clears on next recall)"]
F --> CAP{"first N of stale-flagged<br/>(sorted by last_recalled_at desc;<br/>cap = consolidate.refreshMaxPerRun)"}
CAP -- "overflow" --> CARRY["carries to next hourly tick"]
CAP -- "within cap" --> LLM["LLM reads leaf body<br/>+ current similarity cluster<br/>(local embeddings provide the cluster)"]
LLM --> K["keep: still relevant<br/>→ clear stale flag"]
LLM --> R["rewrite: rule still applies,<br/>specifics drifted<br/>→ replace body, stamp last_refreshed_at"]
LLM --> AR["archive: obsolete<br/>→ status:archived, reversible via enable_document"]
LLM --> FB["fallback (provider unreachable<br/>or schema invalid after retries)<br/>→ leave flag, retry next tick"]
Step 1 — staleness-flag (deterministic). Pure file-metadata rule: atom_type in the eligible set + max(last_recalled_at, frontmatter.updated) older than consolidate.staleAfterMonths (default 6). No LLM, no body inspection — just a flag. It also flips OFF: a single recall hit on a previously-stale leaf clears the flag on the next run, so freshly-relevant content un-flags itself automatically.
Step 2 — llm-semantic-refresh (LLM, capped, runs on the stale-flagged subset only). For each candidate, the LLM sees the leaf's body, its frontmatter, and a small bundle of currently-active leaves on the same topic (the similarity cluster — pulled via local embeddings, no network). It returns one of four verdicts:
Why an LLM, and not a deterministic rule? The flag is structural ("when was this leaf last touched?"); the verdict is semantic ("is what this leaf SAYS still true?"). No deterministic rule can read a bug-root-cause body and decide whether the bug was fixed in v1.4.2; no rule can tell that a pattern-gotcha about an apply factory still applies after a team-wide migration to def resource(...) smart constructors. Reading the leaf body in current context and producing a trinary decision (keep / rewrite / archive) is exactly the kind of judgment an LLM does well — and exactly what a deterministic policy can't reach without becoming either too aggressive ("archive everything aged" — loses live knowledge) or too timid ("never touch anything" — the wiki ages into noise).
Why capped per run? consolidate.refreshMaxPerRun (default 25) bounds the LLM call budget per hourly tick. A corpus with 100 stale-flagged leaves makes 25 calls this hour, 25 the next, and so on — steady progress without billing surprises. Recently-recalled leaves are processed first (they're more likely to be load-bearing in active work), so the budget always lands on the highest-leverage candidates.
Why opt-out exists. Set consolidate.llmPassesEnabled: false in settings.yaml to keep the deterministic flag but skip the LLM verdict. The flag still gets set; nothing acts on it. Useful for cost-sensitive setups, sealed environments, or running consolidate purely for dedup + housekeeping. You can flip it back on later — the flags accumulated in the meantime become this-run's working set.
Net effect on the wiki's shape.
- Recall keeps finding correct, current advice instead of two-year-old reruns.
- Leaf count plateaus instead of growing forever (archives count toward "compressed", not "live").
- Knowledge that's still right is left alone (
keep); knowledge that drifted is updated in place (rewrite); knowledge that's obsolete moves out of the active set (archive) but stays recoverable. - Every change is reversible — the wiki is its own git repo, and
consolidateusesdisableDocumentexclusively. There is nodeleteDocumentpath inside the orchestrator; the user is the only one who can hard-delete, and only via the explicit MCP tool. - The next hourly tick reads the now-cleaner corpus, so the cluster quality for dedup + refresh compounds: less noise to dedup against, sharper similarity scores, fewer false positives, more confident verdicts.
Every category in <wiki>/.layout/layout.yaml must say consolidate: refine or consolidate: none — no defaults applied. consolidate: none categories (plans, investigations, daily by default — owned by other lifecycles) are never walked by per-leaf passes. The orchestrator refuses to run with a clear error envelope if any category lacks the field.
Each hourly cron tick runs cli.mjs cron-job. Logging is two-tier: a slim attempt entry (timings, exit codes, totals, a pointer to the full log) appends to state/.consolidate-attempts.log (last consolidate.attemptsKeep runs), and the complete record of the run — redacted stdout/stderr plus the full per-entity consolidate report — lands at state/logs/<yyyy>/<mm>/cron-<ts>.json, pruned after consolidate.fullLogRetentionDays. The internal --if-due throttle bounds the heavy lifting to once per consolidate.intervalDays. When daily docs are pending but no LLM provider is reachable, compile exits 69 (EX_UNAVAILABLE): the tick records a FAILED attempt (so cron-health flips healthy:false immediately and self-clears on the next good tick) while consolidate's deterministic passes still run. The scheduled job's PATH is baked by bootstrap (your login PATH plus well-known CLI install dirs), and provider spawns append the same dirs at runtime, so launchd/cron's minimal PATH can no longer hide the provider CLIs.
Health is judged per ENTITY across runs, not per tick: a failure that a later tick resolves stays silent, while an entity still failing after consolidate.escalateAfterAttempts consecutive attempts — or one error signature recurring across several distinct entities, which smells like a code bug — escalates. Provider availability itself is tracked the same way: persistent provider-unavailable compile aborts and consolidate LLM-skips accrue as the synthetic entities system:compile-llm-providers / system:consolidate-llm-providers and escalate after the same threshold; the first healthy tick resolves the episode. Escalation deterministically writes a redacted skeleton issue report to issues/<yyyy>/<mm>/<dd>/<signature>.<version>.md (episodes version on recurrence; resolution flips status: resolved in place, files are never auto-pruned). The SessionStart hook (cli.mjs cron-health for hook-less agents) surfaces open escalations with a one-line summary and the newest report path, and offers to investigate; copy the report to the llm-wiki-memory issues or use it to draft a fix PR.
Deterministic passes produce byte-identical state across two runs on the same wiki + frozen clock. LLM passes are reproducible via MEMORY_LLM_MOCK_FILE / MEMORY_LLM_MOCK_RESPONSE for tests. Locking is shared with compile.mjs, so they never race; the cron-job wrapper sequences them.
Never hard-deletes — every archival uses disableDocument (status flip), recoverable via enable_document.
Hook-driven auto-capture is Claude Code only; every other client gets the same MCP tools + the same discipline. Hook-less clients invoke cli.mjs cron-health at session start (per the rule rendered into .agents/rules/) to surface unresolved cron failures.
The LLM provider that extracts typed atoms during capture / compile / consolidate is set in .llm-wiki-memory/settings/.env and is independent of the client:
openai-compatible covers ollama, vLLM, lm-studio, llama.cpp server, and litellm proxies — point MEMORY_LLM_BASE_URL at a local endpoint and OPENAI_API_KEY becomes optional on loopback / RFC1918. The provider is auto-detected at install; explicit --provider or a user-edited settings/settings.yaml chain always wins.
Provider chain + model fallback are declared in ./.llm-wiki-memory/settings/settings.yaml (materialised by bootstrap). Each API provider has a models: [...] list tried newest-first on model_not_found / 404 errors; the cross-provider chain: [...] advances on timeout / unavailable. CLI providers (claude / codex / cursor) defer to whatever their binary is logged into — model names live ONLY in YAML, never in code.
Settings live in two files in ./.llm-wiki-memory/settings/:
.env— secrets, provider switches, deployment paths, workspace identity, test seams. Things that genuinely need shell precedence. Seetemplates/env.example.settings.yaml— every other knob, nested by concern:consolidate,flush,hook,embed,recall,compile,gc,gate,providers,crossCuttingAreas. Seetemplates/settings.yaml.
The .env file's strict subset overrides the YAML where it overlaps (e.g. MEMORY_LLM_PROVIDER collapses the YAML chain). As of the 2026-06-03 v2 release, every MEMORY_* env var that's NOT on the strict allow-list is a silent no-op — application config moved into settings.yaml. The runbook covers the migration.
Strict-subset .env keys:
Highlights from settings.yaml:
Full schema
See templates/settings.yaml for the complete annotated set with every knob in each of the nine config sections plus the top-level crossCuttingAreas list.
Choosing an embedding model
Recall ranks queries with an on-device transformers.js model, set by embed.model in settings.yaml. The default Xenova/bge-large-en-v1.5 gives the best routing quality; lighter models trade some accuracy for a much smaller download. Sizes below are the quantized ONNX weights transformers.js downloads by default (full-precision is ≈ 4× larger), lightest first:
Set a lighter model in settings.yaml:
embed:
model: Xenova/bge-small-en-v1.5Changing the model invalidates the embedding cache automatically. Stay within the MiniLM / BGE / GTE / mxbai families: they're mean-pooled with no query prefix, which is how this engine embeds. Prefix-based models (e5, nomic) underperform here because the engine doesn't add the query: / search_document: prefixes they expect.
cd .llm-wiki-memory/src
# Inspect what consolidate WOULD do (no mutations).
node scripts/cli.mjs consolidate --dry-run --force --json | jq
# Run consolidate for real (bypass the daily throttle).
node scripts/cli.mjs consolidate --force --json | jq '.totals'
# Full cron-job (compile + consolidate + attempt log entry).
node scripts/cli.mjs cron-job
# Inspect cron health (what SessionStart shows you on a failure).
node scripts/cli.mjs cron-health | jq
# Inspect the per-run report + the attempt log history.
cat ../state/.consolidate.json | jq
cat ../state/.consolidate-attempts.log | jq -s 'reverse | .[:5]'
# The classic ops trio.
node scripts/cli.mjs init # materialise or repair the wiki shell
node scripts/cli.mjs validate # skill-llm-wiki validate
node scripts/cli.mjs heal # classify state and name the next command
# Recall / search from the terminal.
node scripts/cli.mjs recall "<query>"
node scripts/cli.mjs search "<query>"
# Resolved paths + LLM provider + skill location.
node scripts/cli.mjs where
# Recover a failed distillation. Reads either the stash (from a recent
# failure) or the in-leaf raw fallback (for older leaves with no stash).
node scripts/cli.mjs redistill --leaf <path> # one daily leaf
node scripts/cli.mjs redistill --session <id> # newest stash for a session
node scripts/cli.mjs redistill --all # every pending stashSchedule the hourly cron (or remove it):
./.llm-wiki-memory/src/bootstrap.sh --schedule daily # cron on Linux, launchd on macOS, hourly
./.llm-wiki-memory/src/bootstrap.sh --schedule off # removeThe cron entry calls a generated wrapper (state/cron-daily.sh) — safe across workspaces whose paths contain single-quotes, percents, or spaces.
Architecture (responsibility matrix)
Full per-concern responsibility split (this package vs the underlying engine) and known smells: ARCHITECTURE.md.
npm test # unit suite
npm run test:e2e # full lifecycle against the real skill-llm-wiki CLI (LLM stubbed)905 tests in total. The unit suite covers the chunker (header/paragraph/hard-cut boundaries, surrogate-safe cuts), the provider+model chain (model-not-found iteration, cross-provider fallback, provenance accumulation), the map-reduce flow (depth cap, shrink check, partial-failure stash, in-leaf recovery), the redistill CLI, the wiki auto-commit layer (batching, repo-safety probe, injection guards), and the entity-level self-healing pipeline (escalations, episode-versioned issue reports, log retention, provider-availability tracking: compile's EX_UNAVAILABLE exit, synthetic system: entities, the hybrid cron PATH builder), word-boundary truncation, the facet vocabulary collector, and the LLM-only cosine merge band. The e2e suite builds a wiki from scratch in a temp directory and asserts genesis, daily capture, lesson + knowledge + plan + investigation absorption, compile promotion + dedup, recall, tree-growth integrity, and idempotency — against the real skill-llm-wiki CLI with mocked LLM responses.
Node 20 or newer, and git. No Docker, no Python. The embedding model downloads on first recall (set embed.backend: lexical in settings.yaml to skip it entirely).