Skip to content

dikw-core v0.6.2

Choose a tag to compare

@github-actions github-actions released this 25 Jun 13:27
· 13 commits to main since this release
7946fb3

0.6.2 — optional cross-encoder reranking; atomic writes + concurrency hardening; repositioned as a self-managed knowledge engine

Changed

  • Docs: repositioned as a self-managed knowledge base engine. README and the
    user-facing docs now frame dikw-core as a client/server engine whose server
    authoritatively manages the base's directory tree, index, and database — distinct
    from Obsidian — with knowledge persisted as an open, portable Markdown format
    (same lineage as Karpathy's LLM-Wiki pattern and Google's Open Knowledge Format).
    docs/** no longer frames the base as "an Obsidian vault"; the on-disk format is
    described as open Markdown openable in any editor. No engine behavior, on-disk
    layout, or invariant changed.
  • README professionalized. Restructured with a documentation index, a
    client/server + DIKW mermaid diagram, container/Ruff/status badges, and the
    missing dikw client delete / dikw client wisdom write verbs; the maintainer
    release mechanics moved out of the README into docs/releasing.md.
    The header is now centered with the OpenDIKW logo
    (.github/assets/opendikw-avatar.png).

Added

  • Optional cross-encoder reranking in the retrieval pipeline. A new
    RerankProvider seam (providers/base.py) + build_reranker factory +
    OpenAICompatReranker (providers/rerank.py, the Jina/Cohere-compatible
    /rerank wire shape that Gitee AI / SiliconFlow / Jina / Cohere share) add a
    rerank stage to HybridSearcher.search, between RRF fusion and the top-K
    truncation: the top retrieval.rerank_candidate_k (default 40) fused
    candidates are scored by (query, chunk) relevance, re-ordered, then cut to
    the query limit. It recovers precision@k from the recall pool without
    changing recall, and does not touch any storage adapter or retrieve's
    response shape. Configure via provider.rerank / rerank_model /
    rerank_base_url / rerank_api_key_env (+ rerank_timeout_seconds /
    rerank_batch_size) and retrieval.rerank_enabled / rerank_candidate_k;
    the candidate window is split into rerank_batch_size batches per /rerank
    call so it respects per-vendor document caps (Gitee: ≤25). On once configured
    (rerank_enabled defaults true; a base that configures no reranker runs no
    rerank leg). A reranker is a deterministic scoring model — the same category
    as the embedding model, part of scoping not reasoning — so it is
    consistent with the "LLMs only enter at synth" invariant; an LLM-as-reranker
    is deliberately excluded. On the read path a transient rerank failure degrades
    to the fused order, a permanent one fails loud. See
    docs/adr/0006-reranker-deterministic-scoping.md,
    the docs/providers.md reranking cookbook, and the SciFact ablation in
    evals/BASELINES.md.
  • Community-health files: CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT.md,
    and GitHub issue-form templates under .github/ISSUE_TEMPLATE/.

Fixed

  • Atomic on-disk page writes. Knowledge and wisdom page writes now go through a
    shared temp-file-then-os.replace helper (domains/_atomic.py), so a crash or a
    full disk mid-write can no longer leave a half-written page at the visible path —
    a reader sees the old bytes or the new bytes, never a truncated file. The
    trash/ collision loop claims its destination name via O_CREAT | O_EXCL instead
    of an exists() probe, closing the window where two concurrent trashings of the
    same path could clobber each other.
  • Task status is terminal-immutable. update_status (both the SQLite and
    Postgres task stores) now refuses to overwrite a row that is already
    succeeded / failed / cancelled: a late write is a silent no-op, so a cancel that
    lands first wins over a runner's trailing failure. An unknown task_id still
    raises TaskNotFound.
  • Atomic base-instance-id creation. <base>/.dikw/base_id is now created with an
    exclusive open(..., "x"), so two server processes cold-starting the same base
    converge on one id instead of each minting a different one (which silently split
    the shared-Postgres task-store scope). DIKW_BASE_INSTANCE_ID still overrides.
  • Orphan import-staging cleanup is gated like task reaping. Startup no longer
    wipes <base>/.dikw/staging/ unconditionally; it does so only when this process
    owns the task store exclusively (per-base SQLite, or DIKW_TASK_REAP_ON_START=1),
    so a replica sharing a Postgres task store can no longer delete a live peer's
    in-flight import staging.