dikw-core v0.6.5

Latest

Latest

github-actions released this 28 Jun 15:10

· 1 commit to main since this release

f8c1045

0.6.5 — Default scaffold ships Gitee embed + rerank; eval cache keys retrieval config and surfaces absolute relevance scores

Added

Eval rows surface absolute relevance scores for OOD calibration.
(#249) Each retrieval-eval per-query and negative row now carries
top1_score (the top hit's score) and top1_vec_cosine — the
reranker/fusion-independent raw top-1 vector cosine, captured by an
eval-internal HybridSearcher.top_vector_cosine probe (the production
search() path and its ranking are untouched). Fusion scores (RRF is
rank-based; combsum/combmnz are per-leg min-max normalized) can't carry an
absolute magnitude, so expect_none / out-of-distribution robustness was
previously immeasurable from rank order alone; the absolute cosine makes it
observable (covered query high, OOD query low). The vector probe is skipped
for pure-bm25 ablations so they stay embedding-free. A score-based OOD
metric is deferred — this release only surfaces the signals.

Changed

Default scaffold ships Gitee embed + rerank; unified rerank/embed
degrade-logging. dikw init (config.default_config) now defaults the
embedder to Gitee bge-m3 (dim 1024) and ships a Gitee
bge-reranker-v2-m3 reranker, both keyed by one GITEE_API_KEY, so a
fresh base reranks out of the box (OpenAI has no /rerank endpoint, so the
prior OpenAI embedding default couldn't pair a matching reranker). The LLM
default stays Anthropic. Read-path resilience is now uniformly observable: a
transient query-embedding failure degrades the hybrid query to FTS-only
(vec leg dropped) instead of 500-ing — hybrid mode only, single-leg
vector/bm25 ablations re-raise for eval purity; transient rerank /
embed-batch-skip degrades now log at ERROR (a configured leg that failed,
was WARNING); an enabled-but-unconfigured reranker and a write path that
defers embedding (no embedder wired / version drift) each log a WARNING so
the silently-off leg is visible. Permanent provider errors (401/403/404,
bad key/model) still fail fast on both the read path (→ 500) and the write
path (ingest aborts) — the fail-fast-on-misconfig invariant is unchanged.

Fixed

Eval snapshot cache keys the ingest-time tokenizer and reads query-time
retrieval config live. (#250) The eval corpus-snapshot cache key omitted
RetrievalConfig, so under the default --cache read_write changing any
retrieval knob (rrf_k / weights / fusion / rerank_enabled / graph_*)
and re-running silently hit the stale snapshot and reused the previous
config — no error, wrong numbers, exactly on the retrieval-ablation workflow.
The cache key now includes the only ingest-time retrieval field,
cjk_tokenizer (a change forces re-ingest); every search-time knob is read
from the live config on each _run_queries, so ablations sharing one
cache_root are now both fast and correct. A defensive guard re-raises if a
cache hit's baked tokenizer ever disagrees with the live one.

Assets 4