Skip to content

raggity v0.9.0 — performance overhaul

Choose a tag to compare

@IxMxAMAR IxMxAMAR released this 03 Jul 14:10

Everything faster, nothing lost — retrieval results are bit-identical to v0.8.0 (reused stored vectors are byte-equal to fresh embeds; every change is behind regression tests; 374 tests, 0 warnings).

Measured on the same machine (v0.8.0 → v0.9.0):

  • rag ask retrieval stage: removes ~2.8s/query of redundant CPU (dedup now reuses the vectors already returned by the store instead of re-embedding all 30 candidates)
  • No-op rag ingest: 1898ms → 248ms (7.7×) — mtime+size fast-path (manifest v2, auto-migrating), and a no-op ingest no longer loads the embedding model or opens the store at all. rag watch benefits on every save.
  • rag status: 1935ms → 1045ms — lazy component construction; status never loads the two ML models.
  • CLI import: 0.387s → 0.152s; rag --help: 522ms → 296ms — the Claude Agent SDK (full MCP stack) now imports only on first LLM use.

Streaming: the web UI / session SSE path now streams true incremental token deltas (previously it delivered the whole answer as one chunk — and the Claude backend never token-streamed at all). SSE responses add :ok flush + Cache-Control: no-cache + X-Accel-Buffering: no (proxy-safe) and newline-safe framing.

LLM efficiency: setting_sources=[] (your CLAUDE.md / settings no longer leak into every call), the SDK's per-call claude -v subprocess spawn is skipped, max_turns=1; parallel GraphRAG extraction (retrieval.graph_concurrency, default 8); LLM-judge does 1 call per row instead of 2; optional transform-output caching; the answer-cache lock no longer serializes concurrent generation.

Server/multi-tenant: per-tenant instances now share the embedding + reranker models (thread-safe; ~0.8s + one model-copy of RAM saved per tenant). Embed cache moved from whole-file JSON to SQLite (auto-migrates; a 50k-vector cache was 193MB JSON rewritten in full per batch).

Ingest: atomic manifest writes, batched source deletes (~170× on bulk changes), connector ingests are now incremental (hash-diff + scoped pruning), optimize/ANN skipped on no-op runs.

No API changes; caches remain opt-in; claude-agent-sdk>=0.2 floor. Default single-user local behavior unchanged.