Skip to content

v0.2.0 — same answers, measured

Choose a tag to compare

@PDgit12 PDgit12 released this 11 Jun 18:47
· 51 commits to main since this release

The credibility release: knitbrain now proves the answers survive, not just that tokens shrink.

Headline numbers (real transcripts, reproduce with one command)

  • 48.7% of ALL tool-result tokens saved (3.33M tokens, 63 real sessions; 55.4% on sizable blocks ≥400 chars) — npx knitbrain profile
  • Answer-preservation gates PASS on 4,500+ real blocksnpx knitbrain evals: error-fidelity 100%, summary-fidelity 100%, identifier-fidelity 99.4%, round-trip lossless 100%, never-expand 100%
  • Holding the fidelity line cost ~1pp of savings vs 0.1.x — paid deliberately, both numbers published

New

  • knitbrain evals — deterministic answer-preservation suite (no LLM judge), exit 1 on any gate failure
  • Search / log / diff handlers — grep dumps collapse per-file, test logs keep every error + summary, diffs keep headers with ±counts; error and result lines are never elided by any handler
  • Real CacheAligner — volatile lines moved out of the system prefix; Anthropic cache_control breakpoints inserted only when the client has none; stacks with compression because compression is deterministic
  • 11 AST grammars — Go, Rust, Java, C++, C#, Ruby, PHP, Bash join TS/TSX/Python
  • knitbrain learn — offline failure mining with success correlation; writes precision-gated, secret-redacted corrections into CLAUDE.md
  • Puppeteer agents — complex tasks materialize pre-briefed, scope-guarded .claude/agents/*.md files (same persistence model as skills); creation events are team-board and hub visible
  • Plan-mode adherence — complex verdicts carry an explicit ENTER-PLAN-MODE imperative; the full closed-loop protocol rides the MCP handshake (zero per-project setup, any MCP client)
  • Library APIimport { createOptimizer } from "knitbrain"
  • Platforms — Claude Code, Cursor, VS Code + Copilot agent mode, Windsurf written natively by knitbrain setup; Codex / Copilot CLI / Zed snippets; knitbrain prompt for everything else
  • Real-shape CI bench — fixture mix mirrors the profiled burn distribution with per-shape regression floors; best-case fixtures clearly labeled, never quoted as real-world

Verified

196 tests · per-tool live E2E (25/25 tools over stdio) · production audit 50/50 (cold clone → packed install → tools + proxy + hook + dashboard + hub) · cold-registry check: npx knitbrain@0.2.0 profile → 48.6%, evals → PASS