Skip to content

dikw-core v0.6.5

Latest

Choose a tag to compare

@github-actions github-actions released this 28 Jun 15:10
· 1 commit to main since this release
f8c1045

0.6.5 — Default scaffold ships Gitee embed + rerank; eval cache keys retrieval config and surfaces absolute relevance scores

Added

  • Eval rows surface absolute relevance scores for OOD calibration.
    (#249) Each retrieval-eval per-query and negative row now carries
    top1_score (the top hit's score) and top1_vec_cosine — the
    reranker/fusion-independent raw top-1 vector cosine, captured by an
    eval-internal HybridSearcher.top_vector_cosine probe (the production
    search() path and its ranking are untouched). Fusion scores (RRF is
    rank-based; combsum/combmnz are per-leg min-max normalized) can't carry an
    absolute magnitude, so expect_none / out-of-distribution robustness was
    previously immeasurable from rank order alone; the absolute cosine makes it
    observable (covered query high, OOD query low). The vector probe is skipped
    for pure-bm25 ablations so they stay embedding-free. A score-based OOD
    metric is deferred — this release only surfaces the signals.

Changed

  • Default scaffold ships Gitee embed + rerank; unified rerank/embed
    degrade-logging.
    dikw init (config.default_config) now defaults the
    embedder to Gitee bge-m3 (dim 1024) and ships a Gitee
    bge-reranker-v2-m3 reranker, both keyed by one GITEE_API_KEY, so a
    fresh base reranks out of the box (OpenAI has no /rerank endpoint, so the
    prior OpenAI embedding default couldn't pair a matching reranker). The LLM
    default stays Anthropic. Read-path resilience is now uniformly observable: a
    transient query-embedding failure degrades the hybrid query to FTS-only
    (vec leg dropped) instead of 500-ing — hybrid mode only, single-leg
    vector/bm25 ablations re-raise for eval purity; transient rerank /
    embed-batch-skip degrades now log at ERROR (a configured leg that failed,
    was WARNING); an enabled-but-unconfigured reranker and a write path that
    defers embedding (no embedder wired / version drift) each log a WARNING so
    the silently-off leg is visible. Permanent provider errors (401/403/404,
    bad key/model) still fail fast on both the read path (→ 500) and the write
    path (ingest aborts) — the fail-fast-on-misconfig invariant is unchanged.

Fixed

  • Eval snapshot cache keys the ingest-time tokenizer and reads query-time
    retrieval config live.
    (#250) The eval corpus-snapshot cache key omitted
    RetrievalConfig, so under the default --cache read_write changing any
    retrieval knob (rrf_k / weights / fusion / rerank_enabled / graph_*)
    and re-running silently hit the stale snapshot and reused the previous
    config — no error, wrong numbers, exactly on the retrieval-ablation workflow.
    The cache key now includes the only ingest-time retrieval field,
    cjk_tokenizer (a change forces re-ingest); every search-time knob is read
    from the live config on each _run_queries, so ablations sharing one
    cache_root are now both fast and correct. A defensive guard re-raises if a
    cache hit's baked tokenizer ever disagrees with the live one.