raggity v0.9.0 — performance overhaul
Everything faster, nothing lost — retrieval results are bit-identical to v0.8.0 (reused stored vectors are byte-equal to fresh embeds; every change is behind regression tests; 374 tests, 0 warnings).
Measured on the same machine (v0.8.0 → v0.9.0):
rag askretrieval stage: removes ~2.8s/query of redundant CPU (dedup now reuses the vectors already returned by the store instead of re-embedding all 30 candidates)- No-op
rag ingest: 1898ms → 248ms (7.7×) — mtime+size fast-path (manifest v2, auto-migrating), and a no-op ingest no longer loads the embedding model or opens the store at all.rag watchbenefits on every save. rag status: 1935ms → 1045ms — lazy component construction; status never loads the two ML models.- CLI import: 0.387s → 0.152s;
rag --help: 522ms → 296ms — the Claude Agent SDK (full MCP stack) now imports only on first LLM use.
Streaming: the web UI / session SSE path now streams true incremental token deltas (previously it delivered the whole answer as one chunk — and the Claude backend never token-streamed at all). SSE responses add :ok flush + Cache-Control: no-cache + X-Accel-Buffering: no (proxy-safe) and newline-safe framing.
LLM efficiency: setting_sources=[] (your CLAUDE.md / settings no longer leak into every call), the SDK's per-call claude -v subprocess spawn is skipped, max_turns=1; parallel GraphRAG extraction (retrieval.graph_concurrency, default 8); LLM-judge does 1 call per row instead of 2; optional transform-output caching; the answer-cache lock no longer serializes concurrent generation.
Server/multi-tenant: per-tenant instances now share the embedding + reranker models (thread-safe; ~0.8s + one model-copy of RAM saved per tenant). Embed cache moved from whole-file JSON to SQLite (auto-migrates; a 50k-vector cache was 193MB JSON rewritten in full per batch).
Ingest: atomic manifest writes, batched source deletes (~170× on bulk changes), connector ingests are now incremental (hash-diff + scoped pruning), optimize/ANN skipped on no-op runs.
No API changes; caches remain opt-in; claude-agent-sdk>=0.2 floor. Default single-user local behavior unchanged.