You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
0.6.2 — optional cross-encoder reranking; atomic writes + concurrency hardening; repositioned as a self-managed knowledge engine
Changed
Docs: repositioned as a self-managed knowledge base engine. README and the
user-facing docs now frame dikw-core as a client/server engine whose server
authoritatively manages the base's directory tree, index, and database — distinct
from Obsidian — with knowledge persisted as an open, portable Markdown format
(same lineage as Karpathy's LLM-Wiki pattern and Google's Open Knowledge Format). docs/** no longer frames the base as "an Obsidian vault"; the on-disk format is
described as open Markdown openable in any editor. No engine behavior, on-disk
layout, or invariant changed.
README professionalized. Restructured with a documentation index, a
client/server + DIKW mermaid diagram, container/Ruff/status badges, and the
missing dikw client delete / dikw client wisdom write verbs; the maintainer
release mechanics moved out of the README into docs/releasing.md.
The header is now centered with the OpenDIKW logo
(.github/assets/opendikw-avatar.png).
Added
Optional cross-encoder reranking in the retrieval pipeline. A new RerankProvider seam (providers/base.py) + build_reranker factory + OpenAICompatReranker (providers/rerank.py, the Jina/Cohere-compatible /rerank wire shape that Gitee AI / SiliconFlow / Jina / Cohere share) add a
rerank stage to HybridSearcher.search, between RRF fusion and the top-K
truncation: the top retrieval.rerank_candidate_k (default 40) fused
candidates are scored by (query, chunk) relevance, re-ordered, then cut to
the query limit. It recovers precision@k from the recall pool without
changing recall, and does not touch any storage adapter or retrieve's
response shape. Configure via provider.rerank / rerank_model / rerank_base_url / rerank_api_key_env (+ rerank_timeout_seconds / rerank_batch_size) and retrieval.rerank_enabled / rerank_candidate_k;
the candidate window is split into rerank_batch_size batches per /rerank
call so it respects per-vendor document caps (Gitee: ≤25). On once configured
(rerank_enabled defaults true; a base that configures no reranker runs no
rerank leg). A reranker is a deterministic scoring model — the same category
as the embedding model, part of scoping not reasoning — so it is
consistent with the "LLMs only enter at synth" invariant; an LLM-as-reranker
is deliberately excluded. On the read path a transient rerank failure degrades
to the fused order, a permanent one fails loud. See docs/adr/0006-reranker-deterministic-scoping.md,
the docs/providers.md reranking cookbook, and the SciFact ablation in evals/BASELINES.md.
Community-health files: CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT.md,
and GitHub issue-form templates under .github/ISSUE_TEMPLATE/.
Fixed
Atomic on-disk page writes. Knowledge and wisdom page writes now go through a
shared temp-file-then-os.replace helper (domains/_atomic.py), so a crash or a
full disk mid-write can no longer leave a half-written page at the visible path —
a reader sees the old bytes or the new bytes, never a truncated file. The trash/ collision loop claims its destination name via O_CREAT | O_EXCL instead
of an exists() probe, closing the window where two concurrent trashings of the
same path could clobber each other.
Task status is terminal-immutable.update_status (both the SQLite and
Postgres task stores) now refuses to overwrite a row that is already
succeeded / failed / cancelled: a late write is a silent no-op, so a cancel that
lands first wins over a runner's trailing failure. An unknown task_id still
raises TaskNotFound.
Atomic base-instance-id creation.<base>/.dikw/base_id is now created with an
exclusive open(..., "x"), so two server processes cold-starting the same base
converge on one id instead of each minting a different one (which silently split
the shared-Postgres task-store scope). DIKW_BASE_INSTANCE_ID still overrides.
Orphan import-staging cleanup is gated like task reaping. Startup no longer
wipes <base>/.dikw/staging/ unconditionally; it does so only when this process
owns the task store exclusively (per-base SQLite, or DIKW_TASK_REAP_ON_START=1),
so a replica sharing a Postgres task store can no longer delete a live peer's
in-flight import staging.