Releases: RodCor/kimetsu
v2.0.0
v2.0.0 — Never explore twice
The biggest token sink is RE-EXPLORATION — the agent re-deriving what the brain
(or the repo) already knows. v2.0 attacks it from three flagship directions and
adds a pluggable storage tier. Backward-compatible: existing project.toml and
brain.db files upgrade in place (schema v3 → v6, automatic on open).
Flagship — Session warm-start (never re-establish your work)
ADDED
- Episodic work-resume. A new
work_episoderecord (event-sourced,
schema v5) captures your working state at SessionEnd — the task, what you
did, what FAILED and why (dead-ends), open threads, and the working
hypothesis — scoped per-repo (one live episode per repo).kimetsu resume
prints it;kimetsu checkpoint [note]saves one manually. Distilled via
the cheap model when configured, else a rule-based summary (never blocks).
Episodes are LOCAL-ONLY and never sync/export. - Project digest.
kimetsu brain digest [--refresh]assembles a compact
(~400-token) digest — repo manifest + top-usefulness memories + current
focus — cached at.kimetsu/digest.mdby content hash, refreshed on git/
corpus drift. (The digest is currently rule-based; a cheap-model
distillation hook-point is present but not yet wired.) - SessionStart injection. A new
kimetsu brain session-start-hookemits
the digest + resume asadditionalContext, gated by[broker] warm_start
(default on). Wired for Claude Code; other hosts' SessionStart context
surface is not yet verified and is intentionally left unwired.
Flagship — The active brain (never wait on the brain)
ADDED
kimetsu brain ask "<question>". Terminal Q&A answered entirely from
the brain — full retrieval + a grounded answer composed by a LOCAL/cheap
model, with memory-id citations. Zero frontier tokens; works offline. Falls
back to distiller credentials, then to verbatim top-capsules when no model
is configured. Grounded-only: refuses rather than hallucinating when memory
has no answer.--json; an answer marked helpful counts like a citation.kimetsu_brain_answerMCP tool. A read-only synthesis tool the host
agent can call mid-task ("what do we know about X?") for a grounded, cited
brief instead of re-discovering.- Command recall fast-path. "How do I…" queries surface matching
command-kind memories runnable-line-first. - Answer-grade injection. Very-high-confidence capsules (score ≥
[broker] answer_grade_min_score, default 0.92) are prefixed "Verified
answer from project memory:" so the model can act in one turn. Render-time
only — ranking is untouched — and suppressed when the memory was recently a
floor-drop regret. - Proactive pre-fetch. Opt-in
[broker] proactive_prefetch(default off)
warms trajectory-relevant memories at PreToolUse.
Flagship — Memory → skill synthesis (never re-derive a solution)
ADDED
- Skill synthesis. A memory cited ≥3 times (or a tight cluster) becomes a
synthesis candidate;kimetsu brain skills [--review]drafts an executable
skill from it — grounded strictly in the cited memories — and, on explicit
accept, installs it into the host-native skill dir with provenance back to
the source memory ids. Propose-only (never auto-installed); flagged stale
when a source memory is superseded. Schema v6 (skill_proposals).
Local-model independence
ADDED
- Ollama as a first-class provider (
provider = "ollama"— OpenAI-compat,
localhost:11434/v1default, no key) and a single optional[cheap_model]
config that all cheap-model consumers resolve (distiller, consolidation,
digest, resume, skill draft,ask). Back-compatible with
[learning.distiller]; OPTIONAL everywhere — every consumer degrades
gracefully when no model is configured.kimetsu doctorprobes the local
endpoint. New guide:docs/LOCAL-MODELS.md(fully-local = zero external
calls).
Pluggable storage backends
ADDED
RetrievalBackendtrait with[storage] backend = "flat" | "graph-lite" | "graph"(defaultflat). The broker stays backend-agnostic; switching
backends re-projects from the event log.- Graph-lite (Tier 1) — a typed-edge projection (
memory_edges, schema
v4) with 1–2 hop expansion blended as graph-provenance candidates (a strict
superset of flat — no recall loss; the broker still filters).supersedes
edges populate now; episode-sourced edge types are reserved. - Petgraph (Tier 2, remote-only) behind the
graphfeature (off in local
lean/embeddings builds — petgraph is never compiled there). In-memory graph
with centrality / shortest-path / community-detection helpers, plus a
cross-backend benchmark harness. Spike verdict: an embedded graph DB
(Kùzu/Cozo) is not justified through ~100k memories.
Continuous self-tuning + ROI v2
ADDED
- Re-tune triggers (≥50 new memories since last tune, or elevated regret
rate) surfaced viakimetsu brain tune --status+ a Stop-hook one-liner —
proposed, never auto-applied. Model re-selection advisor
(tune --models) with reindex/download cost stated. Regret-driven
objective: the tune objective now penalizes floor configs that produced
regrets. ROI v2: per-memory ROI (roi --top), output-token accounting,
and warm-start (digest_served/resume_served) savings attribution.
Personal brain sync
ADDED
- Event-log replication (not file copying).
kimetsu brain sync export --since <cursor>/importmove durable memory-lifecycle events as
portable JSONL with per-event idempotency; telemetry, raw queries, and
local-only episodes are excluded by an allowlist; redaction is respected.
[sync] dirdrives a server-less directory protocol (Dropbox/Syncthing/NAS)
with per-machine batches and per-source cursors;--dry-runand
--statusthroughout. Object-storage backends (e.g. S3) are deferred.
Hardening & release
FIXED
- Analytics/doctor active counts, reindex, peek/undo, and
memory listnow
consistently exclude superseded rows;config set/tune --applypreserve
toml comments + unknown keys (toml_edit); dropped-capsule + proactive-state
sidecars write atomically. Cursor/Gemini MCP schemas verified against live
docs. - Release pipeline: a
version-guardjob fails in seconds on a
tag/workspace-version mismatch (and the built binary must self-report the
tag) — closing the gap that produced the v1.5.0 botch; core GitHub Actions
bumped to Node-24-ready versions;scripts/bump-version.shcodifies the
one-step version bump.
KNOWN LIMITATIONS
- SessionStart warm-start is wired for Claude Code only; the digest is
rule-based pending the cheap-model distillation wiring; full retrieval-
quality and first-turn-token benchmarks require the embeddings + Docker
environment and were not run in CI; remote-bench process isolation (#23)
remains open.
v1.5.1
v1.5.1 — version-stamp re-release
Identical feature set to v1.5.0. The v1.5.0 artifacts were built before the
workspace version bump, so their binaries self-report 1.0.0 (which also
broke the crates.io publish — kimetsu-core@1.0.0 already existed). npm
forbids reusing a published version, so the corrected release ships as
v1.5.1. If you installed v1.5.0 from npm or the GitHub release, update.
FIXED
- Workspace and inter-crate versions stamped correctly (
kimetsu --version
now reports the release version; crates.io publish unblocked).
v1.5.0
Warning
Superseded by v1.5.1 — these artifacts were built before the workspace version bump and self-report 1.0.0. Identical feature set; install v1.5.1 instead.
v1.5.0 — pays for itself
ADDED
-
Telemetry capture. Raw query text is now stored in
context.served
telemetry events when[learning] store_queries = true(default; set
falseto revert to the pre-v1.5 query-hash-only behavior). A
session_idfield is also written when the host provides one (Claude Code
hooks; absent for hosts that do not emit it). Dropped-capsule sidecar
(~/.kimetsu/cache/<hash>/dropped-recent.json) records capsules that
were floor-filtered out so that a later citation of one is detected as a
retrieval.regretevent, feeding the self-tuning loop. All telemetry
stays on-machine; nothing is exported. -
ROI ledger —
kimetsu brain roi. Conservative per-kind token-savings
estimates (failure_pattern=1500, command=400, convention=300, fact=500,
preference=200 tokens per citation) minus brain-injection overhead give a
net-positive / net-negative verdict. Dollar estimates are shown when the
active model is recognized from a built-in price table (Claude 3/4, GPT-4/5
families, Bedrock routing prefixes) or when[model] price_per_mtokis set
inproject.toml.--window 7d|30d|alland--jsonfor scripting. The
Stop hook appends a per-session savings sentence when ≥1 citation occurred;
zero-citation sessions are silent. Calibration methodology and honest
limitations:docs/ROI-METHODOLOGY.md. -
Token budget — render-time capsule compression and session dedupe.
Two[broker]toggles, both defaulttrue:compress_capsules: capsule summaries are compressed at render time
(strips[tags: ...]/(context: ...)annotations, caps at 3
sentences). Ranking is never affected — this runs only after retrieval
and reranking. Setfalseto inject full memory text.session_dedupe: theUserPromptSubmithook skips capsules whose
handle was already injected earlier in the same session (tracked via
the proactive-state sidecar). Soft policy: falls back to injecting all
capsules when dedupe would produce an empty set. This addresses the
pre-v1.5 behavior where the main hook re-injected the same top capsule
on every prompt of a long session.
-
Self-Tuning Brain —
kimetsu_brain_citeMCP tool +kimetsu brain tune.
kimetsu_brain_citeis a new write-gated MCP tool that records a
memory.citedevent from inside an MCP session, closing the ground-truth
gap when the model leans on a memory but doesn't explicitly call
cite_memory. Personal eval set:tunesetbuilds positive eval cases from
context.served+ citation joins (exact session_id or ±30-minute window
fallback).kimetsu brain tune --statusreports case count and kind
coverage.kimetsu brain tune(dry-run by default) sweepsbroker.min_lexical_coverage
∈ {0.3, 0.4, 0.5, 0.6} ×broker.min_semantic_score∈ {-1.0(AUTO), 0.0, 0.25,
0.35, 0.45} against the production embedder;--applywrites only the
floor parameters (not the reranker — that change is recommended separately);
--revertrestores the previous tune-history entry. A holdout guardrail
(deterministic 20% split) prevents writing a config that regresses the
holdout objective. -
Consolidation —
kimetsu brain consolidate+kimetsu brain triage.
Schema migrated to v3 (superseded_bycolumn + index onmemories).
brain consolidate(Story 3.1, default): brute-force cosine scan within the
same embedding model; clusters at ≥ 0.92 cosine (configurable with
--threshold) are merged — survivor keeps its id and text, members get
superseded_byset and amemory.supersededevent written (rebuild-safe);
citations are reassigned to the survivor.--distill(Story 3.2): looser
clusters (0.75–0.85 cosine band, ≥3 memories, ≥1 shared tag) are fed to
the configured distiller; result lands as a memory proposal for human
review; prints clusters and exits 0 when no distiller is configured.
brain triage(Story 3.3): interactive per-item keep / prune / skip of
memories below a usefulness and age threshold (--score-floor 0.2,
--age-days 30);--prune-all --yesfor batch non-interactive pruning. -
Reach — export redact, Cursor + Gemini CLI installers, CI embeddings job.
kimetsu brain export --redactstrips the(context: …)segment from
exported memory text;--redact-tags(requires--redact) additionally
strips the[tags: …]prefix. Both flags are useful for sharing brains
without leaking workspace-specific file paths or tag metadata.
kimetsu plugin install cursorandkimetsu plugin install gemini-cli
write MCP config (.cursor/mcp.json/.gemini/settings.json) and an
always-on guidance file (.cursor/rules/kimetsu-brain/rule.mdfor workspace
installs;GEMINI.mdmerged into the project root or~/.gemini/GEMINI.md
for global installs). Neither host has aUserPromptSubmit-style hook
system, so MCP + the guidance file are the complete integration surface.
Note: Cursor and Gemini CLI config schemas were inferred from their public
docs as of June 2026 and have not been verified against live host builds —
treat these installers as a starting-point and check the generated files
before committing them. CI: a newtest-embeddingsjob (ubuntu-only,
--features embeddings, HuggingFace + fastembed cache) runs alongside the
existing lean test matrix.
CHANGED
kimetsu.mcp_write_toolsgate now coverskimetsu_brain_citealongside
the existing write tools (same env / config / default-true logic for the
local stdio server; remote server remains env-only, default-deny).- Stop hook output includes a per-session token-savings line when the brain
was cited at least once that session. brain.dbschema advanced from v2 → v3 (automatic on first
read-write open; sidecar backup taken per the existing migration policy).brain exportgains--redact/--redact-tagsflags (no behavior
change for existing callers).
FIXED
- Tune sweep now runs against the production embedder (not the Noop embedder
used in tests), so floor calibration is on real vectors.
v1.0.0
v1.0.0 — durable migrations, analytics, semantic retrieval, proactive recall
ADDED
-
Remote cross-encoder rerank stage.
kimetsu-remote servenow applies a
cross-encoder reranker to everykimetsu_brain_contextcall (--reranker,
defaultjina-reranker-v1-tiny-en, operator-level;"off"disables; any
curated or HuggingFace ONNX id accepted). The default was chosen by the
100-memory benchmark: jina-tiny MRR 0.931 vs 0.914 for TinyBERT on the local
bench; the remote path has no hook-latency budget so the fastest effective
reranker wins. Benchmark lift with reranker: jina-v2-base-code MRR 0.904 →
0.906, bge-small MRR 0.901 → 0.909 (production floors active). -
Model-aware AUTO semantic floor + kimetsu-remote benchmark.
kimetsu brain bench --remoteboots a real kimetsu-remote server per embedder,
drives the 100-case dataset over HTTP MCP (sequential + concurrent), and
reports quality/latency/throughput/server-RSS. Its first run caught a
real bug: the 0.35 cosine floor (calibrated on bge) was KILLING relevant
jina-v2 results on every production path (remote MRR 0.90 -> 0.77).
broker.min_semantic_scorenow defaults to -1.0 = AUTO: 0.35 on
bge-family models, disabled elsewhere (jina-v2 own precision keeps noise
low); explicit values still win. Confirmed: jina-v2 remote recovered to
MRR 0.906 / recall@4 0.939 on the 100-memory corpus (with the server
reranker: ~416ms/request, ~5 rps at concurrency 4, ~1.2GB peak RSS). -
Benchmark-chosen retrieval defaults: jina-v2-base-code + TinyBERT.
kimetsu brain bench(100 real-memory cases) drove the defaults:
embedderjina-v2-base-code(recovers oblique queries bge-small never
pools; ~4x less off-topic noise) + rerankerms-marco-tinybert-l-2-v2
(~43ms, within noise of the best MRR). Existing brains need
kimetsu brain reindexafter upgrading (vector dims 384 -> 768); set
embedder.model/embedder.rerankerback to taste and re-judge with
kimetsu brain bench. Lean-RAM alternative: bge-small + tinybert. -
Local MCP write tools enabled by default (
kimetsu.mcp_write_tools).
The brain's own workflow tells the agent to record lessons
(kimetsu_brain_record, the Stop-hook harvest cue), but the privileged-
write gate default-denied unlessKIMETSU_MCP_ENABLE_WRITE_TOOLS=1was
in the MCP server's env — so every session ended with the agent goaded
into a blocked call. The gate is now config-driven for the LOCAL stdio
server:kimetsu.mcp_write_tools(default true), personalizable via
kimetsu config set kimetsu.mcp_write_tools false. Precedence: the env
var when set always wins (both directions) > config > default. The
REMOTE server is unchanged — env-only, default-deny — because a cloned
repo's project.toml is untrusted input and must never enable writes. -
Cross-encoder reranking (opt-in) + retrieval eval harness + 300ms
hook budget. The warm daemon can apply a final cross-encoder rerank
stage: over-fetch a 12-capsule pool, score each (query, memory) pair
jointly with a fastembed reranker ([embedder] reranker= a curated id;
local default ms-marco-tinybert-l-2-v2; "off" disables), drop below a 0.30 relevance floor, truncate to the cap.
Every knob was chosen by measurement:jina-reranker-v1-tiny-en
(default) beat the larger turbo model on both quality and speed in the
head-to-head benchmark, a pool of 6 matches pool-12 quality exactly at
half the latency, and summaries must stay FULL (snippet truncation
cratered recall@4 from 0.83 to 0.66 — worse than FTS). With those
settings a warm rerank answers inside the 300ms hook budget on a real
brain (~265ms measured); slower machines degrade gracefully to
floored-FTS for that turn, or setreranker = "off". Backing the
default path, the cosine floor:broker.min_semantic_scoreis
now ON by default (0.35) and actually wired (it previously defaulted
to 0.0 and was never populated from config in any production path). The
hook's daemon budget tightens 750ms → 300ms; a warm semantic answer
fits with ~70ms to spare, and misses fall back to floored FTS. Daemon
spawn hygiene for the lazy path: the entrypoint now binds the socket
BEFORE loading models (a redundant spawn exits in ms, not seconds), and
on Windows the hook clears HANDLE_FLAG_INHERIT on its std handles before
spawning so the long-lived daemon can never hold the harness's stdout
pipe open (previously the first prompt of a session could stall until
the host's hook timeout). Newkimetsu brain eval [--fixture ...]
measures recall@2/4 + MRR across fts / semantic / semantic+rerank modes
against a committed fixture (fixtures/eval-retrieval.json), so ranking
changes are measurable: baseline shows semantic recall@4 0.90 / MRR 0.91
vs FTS 0.72 / 0.81.--rerankers <ids>benchmarks cross-encoders
head-to-head (quality + per-query latency) and--poolsweeps pool
sizes; non-curated models load as user-defined ONNX from any HuggingFace
repo via hf-hub. Benchmark results: jina-tiny recall@4 0.896 / MRR 0.938
at ~44ms per query (pool 6) vs turbo 0.833 / 0.875 at ~137ms (pool 12);
ms-marco TinyBERT-L-2 is ~5× faster still (8.5ms) but its quality
(0.715) merely matches FTS. -
Warm embedder daemon — semantic recall at hook time. The
UserPromptSubmitcontext-hook can now match memories by meaning, not
just lexically. A single per-user daemon (kimetsu brain embed-daemon,
keyed by embedder model) loads the ONNX model once and serves full
embedding/ANN retrieval to the hook over a local socket / named pipe
(interprocess); the hook is a thin client with a ≤300ms budget and a
hard fall-back to floored-FTS, so the prompt is never blocked.kimetsu brain warm(wired to each harness's startup hook) pre-warms it so a
running session never pays a cold model load. One model in RAM regardless
of how many projects/agents are active. Config:[embedder] daemonand
warm_on_starttoggle it;[embedder] model(andkimetsu brain model set) pick the model.KIMETSU_EMBED_DAEMON=0is a runtime kill switch.
Embeddings builds only; lean builds keep the floored-FTS hook. New
kimetsu brain daemon status|stopto inspect/control it. -
Durable schema migrations. brain.db now migrates forward
automatically on open via a versioned, forward-only runner (each
migration applied in one transaction). The DB is backed up to a
brain.db.bak-*sidecar before any version-advancing migration
(skipped for empty brains; the three newest backups are kept).
Read-only opens of an un-migrated brain degrade gracefully; an
event-upcast seam keeps old traces replayable. The project.toml
config version is decoupled from the DB schema version so the
schema can evolve without breaking existing projects. -
Proof-of-value analytics. New
kimetsu brain insightscommand
andkimetsu_brain_insightsMCP tool: retrieval hit-rate &
skip-rate, citation rate, proposal acceptance rate, usefulness
trend, harvest yield, corpus health, and token economy — computed
over a configurable recent-runs window. A newcontext.served
event records every retrieval (hit or miss);context.injected
now carries injected-token counts. -
Semantic retrieval (usearch HNSW ANN). On the embeddings build, an
approximate-nearest-neighbour index (usearch HNSW) finds memories whose
meaning matches the query even with no shared words — O(log N) per
query, so retrieval stays fast as the corpus grows. The index is
candidate generation only; final ranking is an exact cosine rerank over the
stored f32 vectors, so the index can be quantized (f16 by default,
i8/f32viaKIMETSU_ANN_QUANTIZATION) for a large RAM saving with
negligible quality loss. Retrieval is sharpened with embedding-MMR
(collapses true paraphrase duplicates) and an absolute semantic-relevance
floor (genuinely off-topic queries return nothing). Capsule caps are
config-driven (default 8) and injected tokens drop while the relevant
capsule is preserved. The lean (FTS-only) build is unchanged. -
Scales to ~1M memories. A million-memory corpus runs on modest
hardware: ~1.8 s p99 semantic retrieval and ~3 GB RAM at 1M (f16
default; ~2.8 GB withKIMETSU_ANN_QUANTIZATION=i8). Both retrieval and
conflict-detection-on-write are O(log N) via the HNSW index — no
brute-force vector scan. Bulk ingest batches embedding; the index builds in
parallel across cores, maintains itself incrementally, persists a sidecar
so a restarted server loads instead of rebuilding, and Kimetsu Remote
pre-warms each repo's index on startup. -
Proactive & cost-shrinking recall (the agent brain). Before the
first implementation attempt, a tight retrieval surfaces a "Known
pitfalls" block (failure patterns / conventions) — proactive, not
just post-failure. Tasks are classified (Debug / Feature / Refactor /
Docs / Investigation) to route recall by kind. A per-run recall
ledger deduplicates capsules across stages (rendered once,
back-referenced after), and the long tail is injected as one-line
headlines the agent expands on demand via a newexpand_capsule
tool — so brain overhead shrinks in relative terms as tasks grow
(an adaptive sublinear, per-run-capped budget). -
kimetsu config editandkimetsu run abortare now fully
implemented:config editopens$EDITORon project.toml and
re-validates on save;run abortcleanly finalizes a dangling run.
No stub subcommands ship. -
kimetsu doctor --selftestproves the brain pipeline works
end-to-end (ingest → retrieve → record) without needing a live
model...
v0.9.0
v0.9.0 — auto-harvested memories + SessionEnd distiller
ADDED
- Credentialed SessionEnd distiller (opt-in). A second, deterministic
memory-harvest path alongside the v0.9.0 in-agent harvester.kimetsu plugin install claude-codeandkimetsu plugin install codexnow run an
interactive wizard (on a TTY; skip with--no-setup, force with
--setup-harvest) that configures a cheap distiller
model: Anthropic (recommendedclaude-haiku-4-5), OpenAI (recommended
gpt-5.4-mini), or compatible endpoints viaANTHROPIC_BASE_URL/
OPENAI_BASE_URL. The key + base URL are written to a gitignored.env;
the selection lands in[learning.distiller]
inproject.toml. A newSessionEndhook for Claude Code runs
kimetsu brain session-end-hook; Codex uses its supportedStophook with
--distill-on-stop. When enabled + credentialed, the distiller reads the
transcript with that model and records lessons through the confidence-gated
propose_or_merge_memory. When the distiller is enabled, the Stop hook's
end-of-session cue is suppressed (the distiller owns end-of-session; the
mid-session PostToolUse resolved-failure cue stays).AnthropicProvider
gained an optional base URL for the LiteLLM case, and the distiller gained
an OpenAI Responses API provider. With--scope globalthe
wizard configures the distiller once in~/.kimetsu/(config +.env); that
global distiller then runs in every project and records into the user brain
(~/.kimetsu/brain.db, available everywhere), with a per-project distiller
taking precedence when one is configured. - Auto-harvested memories (in-agent). The hooks now drive memory
generation, not just retrieval. When a command that failed earlier in the
session succeeds (PostToolUse), or a non-trivial session ends with nothing
recorded (Stop), Kimetsu emits a[kimetsu-harvest]cue telling the agent to
dispatch a new backgroundkimetsu-memory-harvestersubagent (installed
at.claude/agents/for Claude Code and.codex/agents/for Codex, pinned
to a cheap model). The subagent distills 0–3
generalizable lessons and records them through the confidence-gated
kimetsu_brain_recordpath — no separate API key or kimetsu-side model
credentials, billed in-agent at the cheap model's rate, non-blocking. Cues
are throttled (at most ~once per resolved failure / once per session) and can
be disabled with[learning] auto_harvest = falseinproject.toml.
CHANGED
- Cleaner help & tool menus. Stripped internal version-history prefixes
(v0.x:,MP-…:) from--helptext and the MCP tool/argument descriptions
so menus read as plain present-tense descriptions. (Internal code comments
are unchanged.)
FIXED
- Stop hook read the wrong field. The Stop hook counted
kimetsu_brain_recordcalls from a non-existent inlinetranscriptarray;
Claude Code actually passes atranscript_pathto a JSONL file. It now reads
the JSONL (with the inline array kept as a fallback), counts both the bare
and MCP-namespaced (mcp__kimetsu__kimetsu_brain_record) tool names, so the
"lessons recorded" banner and end-of-session cue actually fire. The JSONL is
now streamed line-by-line so a long session's multi-MB transcript never
lands in memory at once. - BOM-tolerant install.
kimetsu plugin installnow strips a leading
UTF-8 BOM before parsing an existingsettings.json/hooks.json/
config.toml/.mcp.json, so a config saved by a BOM-emitting editor
(e.g. older Windows Notepad) no longer fails with "expected value at line 1
column 1". - Install polish. The installer now merges Kimetsu's guidance into an
existingCLAUDE.md— workspace.claude/CLAUDE.mdor the global
~/.claude/CLAUDE.md— inside<!-- kimetsu:begin -->/<!-- kimetsu:end -->
markers, appending and upgrading in place, never overwriting the user's
content.--forceno longer overwritesCLAUDE.md(the whole install is
idempotent and non-destructive; the flag is retained only for compatibility).
A--scope globalon the workspace-onlykimetsutarget warns instead of
silently doing nothing;--workspaceis canonicalized leniently so a global
install doesn't fail on a missing workspace path.
v0.8.4
v0.8.4 — non-destructive plugin install + global scope
ADDED
- Global plugin install.
kimetsu plugin install <target> --scope global
installs the Kimetsu surface into the user's home for every session —
~/.claude/+~/.claude.json(mcpServers) for Claude Code, and
~/.codex/for Codex — instead of the workspace.--scopedefaults to
workspace(the prior behavior). Also exposed as thescopeargument on
thekimetsu_plugin_installMCP tool.
FIXED
- Hook install no longer clobbers existing hooks.
kimetsu plugin install
now merges its hooks into existing Claudesettings.json/ Codex
hooks.jsoninstead of replacing them. Hooks you already have — even on the
same events Kimetsu uses (UserPromptSubmit,PreToolUse, …) — are
preserved, with Kimetsu's group added alongside. Re-running install is
idempotent (no duplicate groups) and the MCP config + generated docs refresh
without requiring--force.
v0.8.3
v0.8.3 — npm distribution
ADDED
- npm distribution. Kimetsu now publishes to npm —
npm install -g kimetsu-ai
installs the prebuilt native binary for your platform, no Rust toolchain
required. Uses the esbuild/turbo model: per-platform packages
(@kimetsu-ai/linux-x64,@kimetsu-ai/darwin-x64,@kimetsu-ai/darwin-arm64,
@kimetsu-ai/win32-x64) selected viaoptionalDependencies, with a thin
bin/cli.jslauncher that execs the matching binary. No postinstall, so it
works undernpm install --ignore-scripts. The semantic build is fetched on
demand whenKIMETSU_NPM_FLAVOR=embeddingsis set. Published from the
existing release pipeline (publish-npmjob, gated on thePUBLISH_NPM
repo variable +NPM_TOKENsecret, mirroring the crates.io gate). Sources
live innpm/. Installs thekimetsucommand (also available as
kimetsu-ai).
FIXED
- Windows npm package. The
publish-npmjob now extracts the Windows
archive with7zinstead ofunzip(PowerShellCompress-Archiveuses
backslash path separators thatunzipflattens), and publishing is
idempotent so a re-run after a partial failure skips already-published
versions.
v0.8.2
v0.8.2 — npm distribution
ADDED
- npm distribution. Kimetsu now publishes to npm —
npm install -g kimetsu-ai
installs the prebuilt native binary for your platform, no Rust toolchain
required. Uses the esbuild/turbo model: per-platform packages
(@kimetsu-ai/linux-x64,@kimetsu-ai/darwin-x64,@kimetsu-ai/darwin-arm64,
@kimetsu-ai/win32-x64) selected viaoptionalDependencies, with a thin
bin/cli.jslauncher that execs the matching binary. No postinstall, so it
works undernpm install --ignore-scripts. The semantic build is fetched on
demand whenKIMETSU_NPM_FLAVOR=embeddingsis set. Published from the
existing release pipeline (publish-npmjob, gated on thePUBLISH_NPM
repo variable +NPM_TOKENsecret, mirroring the crates.io gate). Sources
live innpm/. Installs thekimetsucommand (also available as
kimetsu-ai).
v0.8.1
v0.8.1 — npm distribution
ADDED
- npm distribution. Kimetsu now publishes to npm —
npm install -g kimetsu
installs the prebuilt native binary for your platform, no Rust toolchain
required. Uses the esbuild/turbo model: per-platform packages
(@kimetsu/linux-x64,@kimetsu/darwin-x64,@kimetsu/darwin-arm64,
@kimetsu/win32-x64) selected viaoptionalDependencies, with a thin
bin/cli.jslauncher that execs the matching binary. No postinstall, so it
works undernpm install --ignore-scripts. The semantic build is fetched on
demand whenKIMETSU_NPM_FLAVOR=embeddingsis set. Published from the
existing release pipeline (newpublish-npmjob, gated on thePUBLISH_NPM
repo variable +NPM_TOKENsecret, mirroring the crates.io gate). Sources
live innpm/.
v0.8.0
v0.8.0 — proactive recall, selectable embedding model, full MCP control
The release that makes the brain proactive and gives the agent (and user)
full control over it from inside Claude Code / Codex.
ADDED
- Proactive recall (mid-work). New
PreToolUse/PostToolUseBash hooks
surface a relevant memory while the agent works, not just on prompt:- after a failed Bash command, surface a matching
failure_pattern/
commandfix; - before a risky Bash command, warn from a matching
failure_pattern; - on a repeated failing command (loop), lower the score floor and bypass
the throttle so help arrives sooner.
Retrieval is lexical-FTS-only (no embedding-model load), gated by a high
score floor (0.45; loop 0.35), capped at one capsule, with per-session
dedupe + a refractory throttle. Token cost stays ~0 (silent when nothing
qualifies). Wired into both Claude Code (.claude/settings.json) and Codex
(.codex/hooks.json) with amatcher: "Bash"; opt out with
kimetsu plugin install --no-proactive(orproactive:falseover MCP).
Per-session state lives in<repo>/.kimetsu/proactive/<session_id>.json
and is garbage-collected after 7 days.
- after a failed Bash command, surface a matching
- Selectable embedding model. New
[embedder]table inproject.toml
(precedence:KIMETSU_BRAIN_EMBEDDERenv > config > default). Inspect and
change it withkimetsu brain model list/kimetsu brain model set <id>
(the latter writes the config and re-embeds the corpus with the new model).
Curated built-ins:bge-small-en-v1.5(384d, default),bge-m3(1024d),
jina-v2-base-code(768d). - Full MCP control surface. New tools so an agent can manage the brain
without leaving Claude Code / Codex:kimetsu_brain_model_list/
kimetsu_brain_model_set(re-embeds in-process with the new model),
kimetsu_brain_reindex,kimetsu_brain_memory_search(FTS over memory
text),kimetsu_brain_conflict_resolve,kimetsu_brain_prune, and
kimetsu_brain_config_show.kimetsu_brain_memory_listand
..._memory_proposalsgainedlimit/offsetpagination.
CHANGED
reindexcan now run against an explicit embedder
(reindex_all_with_embedder), so a model switch re-embeds with the chosen
model regardless of the process's cached default embedder.
FIXED
- Test isolation. Tests created project roots under the system temp dir;
on a machine whose$HOMEis itself a git repo,ProjectPaths::discover
climbed to$HOMEand made parallel tests share onebrain.db+
project.lock. Each test root now gets its own git boundary, so plain
cargo testis hermetic.