Skip to content

Releases: OpenDIKW/dikw-core

dikw-core v0.6.5

28 Jun 15:10
f8c1045

Choose a tag to compare

0.6.5 — Default scaffold ships Gitee embed + rerank; eval cache keys retrieval config and surfaces absolute relevance scores

Added

  • Eval rows surface absolute relevance scores for OOD calibration.
    (#249) Each retrieval-eval per-query and negative row now carries
    top1_score (the top hit's score) and top1_vec_cosine — the
    reranker/fusion-independent raw top-1 vector cosine, captured by an
    eval-internal HybridSearcher.top_vector_cosine probe (the production
    search() path and its ranking are untouched). Fusion scores (RRF is
    rank-based; combsum/combmnz are per-leg min-max normalized) can't carry an
    absolute magnitude, so expect_none / out-of-distribution robustness was
    previously immeasurable from rank order alone; the absolute cosine makes it
    observable (covered query high, OOD query low). The vector probe is skipped
    for pure-bm25 ablations so they stay embedding-free. A score-based OOD
    metric is deferred — this release only surfaces the signals.

Changed

  • Default scaffold ships Gitee embed + rerank; unified rerank/embed
    degrade-logging.
    dikw init (config.default_config) now defaults the
    embedder to Gitee bge-m3 (dim 1024) and ships a Gitee
    bge-reranker-v2-m3 reranker, both keyed by one GITEE_API_KEY, so a
    fresh base reranks out of the box (OpenAI has no /rerank endpoint, so the
    prior OpenAI embedding default couldn't pair a matching reranker). The LLM
    default stays Anthropic. Read-path resilience is now uniformly observable: a
    transient query-embedding failure degrades the hybrid query to FTS-only
    (vec leg dropped) instead of 500-ing — hybrid mode only, single-leg
    vector/bm25 ablations re-raise for eval purity; transient rerank /
    embed-batch-skip degrades now log at ERROR (a configured leg that failed,
    was WARNING); an enabled-but-unconfigured reranker and a write path that
    defers embedding (no embedder wired / version drift) each log a WARNING so
    the silently-off leg is visible. Permanent provider errors (401/403/404,
    bad key/model) still fail fast on both the read path (→ 500) and the write
    path (ingest aborts) — the fail-fast-on-misconfig invariant is unchanged.

Fixed

  • Eval snapshot cache keys the ingest-time tokenizer and reads query-time
    retrieval config live.
    (#250) The eval corpus-snapshot cache key omitted
    RetrievalConfig, so under the default --cache read_write changing any
    retrieval knob (rrf_k / weights / fusion / rerank_enabled / graph_*)
    and re-running silently hit the stale snapshot and reused the previous
    config — no error, wrong numbers, exactly on the retrieval-ablation workflow.
    The cache key now includes the only ingest-time retrieval field,
    cjk_tokenizer (a change forces re-ingest); every search-time knob is read
    from the live config on each _run_queries, so ablations sharing one
    cache_root are now both fast and correct. A defensive guard re-raises if a
    cache hit's baked tokenizer ever disagrees with the live one.

dikw-core v0.6.4

28 Jun 03:06
8b4279a

Choose a tag to compare

0.6.4 — K-layer system prompts extracted to packaged .md; product self-reference unified to dikw

Changed

  • K-layer system prompts extracted to prompts/*.md; product self-reference
    unified to dikw.
    The three inline system-prompt constants under
    domains/knowledge (DEFAULT_SYNTH_SYSTEM, _MERGE_SYSTEM, _GROUNDED_SYSTEM)
    now load from packaged prompts/synthesize_system.md,
    prompts/lint_fix_orphan_merge_system.md, and
    prompts/lint_fix_broken_wikilink_grounded_system.md via prompts.load(...)
    like every other prompt (CLAUDE.md "Don't inline prompts in code"). Constant
    names, call sites, and the public surface are unchanged. Every packaged prompt
    (synth + lint + eval) now refers to the product as dikw rather than
    dikw-core, and the two lint system prompts gained the same self-intro the
    other authoring prompts already carry. No on-disk format, schema, CLI, or API
    change.

dikw-core v0.6.3

27 Jun 03:38
aab4ebd

Choose a tag to compare

0.6.3 — delivery-loop instrumentation (no engine changes)

Changed

  • Cadence release — no shipped-engine change. The published wheel is
    byte-identical to 0.6.2; src/ is untouched, so every runtime behavior, on-disk
    format, database schema, and CLI surface is unchanged. The delta lives entirely in
    the repo's own maintainer-facing delivery loop (neither shipped in the wheel nor
    user-facing): a resumable-STATE + PR-receipt artifact convention for the delivery
    workflow, and a new tools/loop_metrics.py that reads the accumulated PR-body
    receipts plus the GitHub API to report the loop's own effectiveness
    (first-pass-green rate, escape rate, codex rounds). The bump keeps the GHCR image,
    the pip pin, and the docs in lockstep with the tag.

dikw-core v0.6.2

25 Jun 13:27
7946fb3

Choose a tag to compare

0.6.2 — optional cross-encoder reranking; atomic writes + concurrency hardening; repositioned as a self-managed knowledge engine

Changed

  • Docs: repositioned as a self-managed knowledge base engine. README and the
    user-facing docs now frame dikw-core as a client/server engine whose server
    authoritatively manages the base's directory tree, index, and database — distinct
    from Obsidian — with knowledge persisted as an open, portable Markdown format
    (same lineage as Karpathy's LLM-Wiki pattern and Google's Open Knowledge Format).
    docs/** no longer frames the base as "an Obsidian vault"; the on-disk format is
    described as open Markdown openable in any editor. No engine behavior, on-disk
    layout, or invariant changed.
  • README professionalized. Restructured with a documentation index, a
    client/server + DIKW mermaid diagram, container/Ruff/status badges, and the
    missing dikw client delete / dikw client wisdom write verbs; the maintainer
    release mechanics moved out of the README into docs/releasing.md.
    The header is now centered with the OpenDIKW logo
    (.github/assets/opendikw-avatar.png).

Added

  • Optional cross-encoder reranking in the retrieval pipeline. A new
    RerankProvider seam (providers/base.py) + build_reranker factory +
    OpenAICompatReranker (providers/rerank.py, the Jina/Cohere-compatible
    /rerank wire shape that Gitee AI / SiliconFlow / Jina / Cohere share) add a
    rerank stage to HybridSearcher.search, between RRF fusion and the top-K
    truncation: the top retrieval.rerank_candidate_k (default 40) fused
    candidates are scored by (query, chunk) relevance, re-ordered, then cut to
    the query limit. It recovers precision@k from the recall pool without
    changing recall, and does not touch any storage adapter or retrieve's
    response shape. Configure via provider.rerank / rerank_model /
    rerank_base_url / rerank_api_key_env (+ rerank_timeout_seconds /
    rerank_batch_size) and retrieval.rerank_enabled / rerank_candidate_k;
    the candidate window is split into rerank_batch_size batches per /rerank
    call so it respects per-vendor document caps (Gitee: ≤25). On once configured
    (rerank_enabled defaults true; a base that configures no reranker runs no
    rerank leg). A reranker is a deterministic scoring model — the same category
    as the embedding model, part of scoping not reasoning — so it is
    consistent with the "LLMs only enter at synth" invariant; an LLM-as-reranker
    is deliberately excluded. On the read path a transient rerank failure degrades
    to the fused order, a permanent one fails loud. See
    docs/adr/0006-reranker-deterministic-scoping.md,
    the docs/providers.md reranking cookbook, and the SciFact ablation in
    evals/BASELINES.md.
  • Community-health files: CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT.md,
    and GitHub issue-form templates under .github/ISSUE_TEMPLATE/.

Fixed

  • Atomic on-disk page writes. Knowledge and wisdom page writes now go through a
    shared temp-file-then-os.replace helper (domains/_atomic.py), so a crash or a
    full disk mid-write can no longer leave a half-written page at the visible path —
    a reader sees the old bytes or the new bytes, never a truncated file. The
    trash/ collision loop claims its destination name via O_CREAT | O_EXCL instead
    of an exists() probe, closing the window where two concurrent trashings of the
    same path could clobber each other.
  • Task status is terminal-immutable. update_status (both the SQLite and
    Postgres task stores) now refuses to overwrite a row that is already
    succeeded / failed / cancelled: a late write is a silent no-op, so a cancel that
    lands first wins over a runner's trailing failure. An unknown task_id still
    raises TaskNotFound.
  • Atomic base-instance-id creation. <base>/.dikw/base_id is now created with an
    exclusive open(..., "x"), so two server processes cold-starting the same base
    converge on one id instead of each minting a different one (which silently split
    the shared-Postgres task-store scope). DIKW_BASE_INSTANCE_ID still overrides.
  • Orphan import-staging cleanup is gated like task reaping. Startup no longer
    wipes <base>/.dikw/staging/ unconditionally; it does so only when this process
    owns the task store exclusively (per-base SQLite, or DIKW_TASK_REAP_ON_START=1),
    so a replica sharing a Postgres task store can no longer delete a live peer's
    in-flight import staging.

dikw-core v0.6.1

23 Jun 13:48
40bc459

Choose a tag to compare

0.6.1 — official GHCR image; client/server version handshake; SQLite + write-path concurrency fixes

Added

  • Official container image published to GHCR on every release. The release
    workflow's new publish-image job builds examples/docker/Dockerfile (the
    same image the Trivy PR-scan builds) and pushes a public, multi-arch
    (linux/amd64 + linux/arm64) image to
    ghcr.io/opendikw/dikw-core:X.Y.Z after the PyPI publish (it waits for the
    wheel to be installable, then pip installs it). There is intentionally no
    floating :latest
    — downstream pins an exact X.Y.Z so its debug
    environment stays reproducible. examples/docker/docker-compose.yml now pulls
    this image by default (pinned via the required DIKW_VERSION env var, with the
    build: block retained as a local-source fallback), giving downstream systems
    a stable, ready-to-run dikw-core to develop and debug their HTTP / dikw client integration against. See examples/docker/README.md and
    docs/deployment-docker.md.
  • Client/server version handshake. dikw client now probes GET /v1/info
    once per invocation and compares the server's engine_version to the client's
    own installed dikw-core version. A confirmed mismatch hard-fails the
    command (exit 1, a version skew: line naming both versions) so a downstream
    system catches silent wire drift immediately — dikw-core is alpha, so a skewed
    client/server pair can misbehave in subtle ways. Ambiguous cases (server
    unreachable, /v1/info non-200, engine_version missing, or the client run
    from an uninstalled source checkout) skip the check and let the real request
    surface its own error, so the handshake never raises a false skew. Set
    DIKW_ALLOW_VERSION_SKEW=1 to downgrade the hard-fail to a one-line stderr
    warning for deliberate mixed-version debugging. The probe is layering-clean
    (reads its own version via importlib.metadata, never imports the engine).
  • GitHub Release is now cut automatically on every tag. release.yml gains a
    github-release job (needs: publish) that creates the GitHub Release for the
    pushed tag once the PyPI publish succeeds: the body is the tag's CHANGELOG.md
    section (extracted from its ## X.Y.Z heading to the next ## ; falls back to
    GitHub's auto-generated notes with a ::warning:: when no section matches), and
    the built wheel + sdist ride along as downloadable assets. Previously the
    workflow only published to PyPI + GHCR and opened the Dockerfile-bump PR —
    Releases were created by hand, and several tags (v0.6.0 and earlier) shipped
    without one. A backfilled v0.6.0 GitHub Release was created out-of-band.

Fixed

  • Concurrency: serialize the SQLite adapter and the synth/lint-apply write
    path.
    Three race conditions could surface under concurrent task execution
    on a single base. (1) Each SQLiteStorage instance shares one
    sqlite3.Connection across the asyncio.to_thread workers a verb fans out
    (retrieval already runs its fts/vec/asset legs via asyncio.create_task),
    and that connection's Python-level state is not thread-safe — overlapping
    workers could trip sqlite3.InterfaceError / phantom rows (the same hazard
    already worked around three times in storage/base.py, eval/runner.py, and
    server/tasks/store_sqlite.py). Every adapter method body now runs under a
    per-instance threading.RLock (acquired inside the worker thread), closing
    the window for every call site rather than one ad-hoc gather at a time.
    (2) synth and lint apply previously took no lock, so they could
    interleave with ingest/delete/wisdom write/each other — racing the same
    deterministic doc_id rows + on-disk page with no enclosing transaction
    (silent anchor loss, embed-version drift, or a healthy page mistakenly
    deactivated). Both now acquire the server's existing base write lock
    (ServerRuntime.ingest_lock), which already covered ingest/import/wisdom/
    delete, so the whole D/K/W write surface is single-writer per base within a
    process. (3) The SQLite adapter now sets an explicit busy_timeout=30000
    (and opens with connect(timeout=30) so the budget also covers the
    connection-time pragmas) — parity with the task store — so a cross-connection
    WAL writer blocks-then-succeeds instead of immediately raising database is locked. No on-disk format, schema, Storage Protocol, or CLI change
    these are internal serialization fixes. ingest_lock stays per-process, so
    multi-replica Postgres deployments are unchanged (cross-replica doc-level
    locking is tracked separately). See docs/server.md § Storage concurrency.

Docs

  • Install-from-PyPI path for downstream consumers. docs/getting-started.md
    §1 now splits installation into Option A (uv pip install 'dikw-core[...]'
    the published wheel, for systems that use dikw-core) and Option B (git clone
    • uv sync — for contributors), and adds an optional-extras matrix
      documenting all three user-facing extras (postgres, cjk, otel): what each
      pulls in, when to install it, and how the feature degrades without it. The
      README install section gains a matching "Install from PyPI" block. Previously
      both entry points led only with the from-source checkout flow, leaving the
      pip-install consumer path (and the cjk extra entirely) undocumented.

dikw-core v0.6.0

21 Jun 11:27
8eb13b3

Choose a tag to compare

0.6.0 — config-driven provider API-key env vars (BREAKING); DeepSeek V4 Pro + Gitee bge-m3; horizontal model comparison

Changed

  • BREAKING — provider API-key env var is now config-driven, and DIKW_EMBEDDING_API_KEY
    is removed.
    ProviderConfig gains two required fields, llm_api_key_env and
    embedding_api_key_env, naming the environment variable that holds each leg's key.
    The engine no longer hardcodes any key var name: anthropic_compat/openai_compat
    read exactly the var named in dikw.yml, with no fallback. The dikw-invented
    DIKW_EMBEDDING_API_KEY magic name is gone — embedding keys now use vendor-canonical
    names (OPENAI_API_KEY, GITEE_API_KEY, …) chosen via embedding_api_key_env. The
    LLM/embedding "two separate keys" separation is now achieved by naming distinct vars
    (point both legs at one var to share a key, or at different vars to split vendors)
    rather than by a special name + no-fallback rule. Migration: add the two fields to
    every dikw.yml provider: block (a fresh dikw init scaffold writes them), and in
    .env rename DIKW_EMBEDDING_API_KEY → the vendor var your config names; a same-vendor
    Anthropic+MiniMax .env that reused ANTHROPIC_API_KEY for a MiniMax key should move
    the MiniMax key to MINIMAX_API_KEY and set llm_api_key_env: MINIMAX_API_KEY. Wipe
    the local evals/.cache/snapshots/ after upgrading (its snapshot dikw.ymls predate
    the fields). /v1/health's api_key_present and the dikw client check probe now key
    off the configured var; the tools/e2e_verify.py real-leg gate derives its required
    keys from the active profile's provider.{llm,embedding}_api_key_env.

Added

  • DeepSeek V4 Pro (LLM) + Gitee AI bge-m3 (embeddings) support — config-only. DeepSeek
    runs via the existing anthropic_compat protocol against its Anthropic-compatible
    endpoint (llm_base_url: https://api.deepseek.com/anthropic, llm_model: deepseek-v4-pro,
    key in DEEPSEEK_API_KEY); DeepSeek ignores the cache_control field the provider
    sends (no error — only the Anthropic prompt-cache discount is absent, same cost note as
    openai_compat). bge-m3 runs via openai_compat embeddings against Gitee
    (embedding_base_url: https://ai.gitee.com/v1, embedding_model: bge-m3,
    embedding_dim: 1024, embedding_batch_size: 16, key in GITEE_API_KEY). No engine
    code; a committed reference config ships at tests/fixtures/live-deepseek-gitee-bgem3.dikw.yml.
    See docs/providers.md.
  • Horizontal model-comparison harness (evals/tools/compare_models.py). A dev tool
    (not shipped in the wheel) that runs the same eval dataset against N model arms and emits
    an arm-by-metric comparison matrix + per-arm JSON. compare compares embedding models
    via retrieval eval (deterministic, 1 run/arm: hit@k / mrr / nDCG@10 / recall@100);
    compare-synth compares LLM models via synth eval (N runs/arm + a Welch t-test of each
    arm vs the baseline arm: grounding / atomicity / duplicate / wikilink / language, plus judge
    dims with --judge). Each arm carries a full provider: block, so two same-protocol
    vendors (DeepSeek + MiniMax) resolve distinct keys via their *_api_key_env. Reuses the
    tested statistics from ab_experiment.py and the direction rule from client/baseline.py.
    See evals/README.md and docs/providers.md.
  • Real-environment end-to-end verification harness (tools/e2e_verify.py). A dev
    tool (not shipped in the wheel) that drives every dikw client verb against a
    live server in one of two throwaway environments, then destroys it: --mode local
    (temp-dir base + long-lived dikw serve on SQLite) and --mode docker (server +
    pgvector Postgres via a generated compose project, image built from the local
    working tree
    — not the released PyPI examples/docker/Dockerfile). CLI coverage is
    asserted against the live Typer tree, so adding a verb without a sequence step fails
    the run. Provider posture is tiered + skip-loud: structural legs (ingest --no-embed,
    pages/graph/lint/delete/tasks) run with no keys; real legs
    (check/embed/synth/vector-retrieve/eval) run when the keys named by the
    active profile's provider.{llm,embedding}_api_key_env are present (from .env)
    and SKIP loudly otherwise. Both modes
    use a free host port (never a fixed 8765) so concurrent runs don't collide; docker
    teardown is guaranteed (down -v --rmi local removes containers, volumes and the
    built image
    ; --prune sweeps crashed-run leftovers by label/name). --observe wires the
    docs/observability OTel stack and surfaces a Jaeger trace link on failure. Registered
    as a cli/server/client leg in the dikw-core-verify skill; wrapped by
    tests/test_e2e_verify_{local,docker}.py (-m slow). Default provider profile is the
    committed MiniMax + Qwen3-Embedding-0.6B template; swap vendor/model via
    --provider-profile <dikw.yml>.
  • dangling_provenance drift lint kind — flag a K/W page citing a deleted source
    (read-only).
    A new deterministic lint kind that flags a knowledge/ (K) or
    wisdom/ (W) page whose sources: provenance edge points at a source file that
    no longer exists on disk. It is read-only — surfaced, never auto-repaired: there
    is no fixer (like duplicate_title, lint propose reports it for human triage and
    lands every issue in skipped), because the sources: frontmatter is the user's to
    edit (ADR-0001's non-cascade design — delete never rewrites another page's content).
    Disk is the source of truth (ADR-0005), so detection stats the file, not the
    documents projection: a source present on disk but not yet ingest-ed (no active D
    row) is not dangling — there the fix is ingest, not editing frontmatter. A
    provenance path that escapes the base is dangling and its external target is never
    stat-ed. Runs in the default lint scan, sharing the per-page provenance read with
    missing_provenance (zero extra storage round-trips); suppressible per page via
    lint: {skip: [dangling_provenance]}. Final slice of ADR-0005
    (filesystem-as-source-of-truth) — the arc (the delete verb + missing_file /
    untracked_file / stale_index / dangling_provenance drift kinds) is now complete,
    and docs/design.md gains a "Disk is the source of truth" invariant section.
  • stale_index + untracked_file drift lint kinds — re-project hand-edited /
    hand-written K/W pages (and unlock hand-authored knowledge pages as first-class).

    Two new deterministic lint kinds, both fixed by one ReindexPageFixer:
    stale_index flags an active knowledge/ (K) or wisdom/ (W) row whose on-disk
    body hash no longer matches the indexed hash (a hand-edit outside dikw);
    untracked_file flags a .md / .markdown file under knowledge/ or wisdom/
    with no active row (hand-written, or restored outside dikw). Both propose a single
    reindex_page op that re-projects the current on-disk bytes through
    persist_knowledge / persist_wisdom — re-chunk, re-link, re-provenance,
    inline-or-deferred re-embed — without rewriting the file (disk is the source of
    truth, ADR-0005) and without re-running synth (so a hand-edit is preserved, not
    regenerated from the D-source). Run in the default lint scan; fix with
    dikw client lint propose --rule stale_index (or untracked_file) →
    dikw client lint apply <task_id>. untracked_file closes the "hand-write a K page,
    the engine never indexes it" gap and makes hand-authored pages first-class;
    stale_index closes the "edit a K/W file on disk, the storage projection silently
    drifts" gap. Detection is near-free: stale_index reuses the per-page read the
    other lexical checks already do (no separate mtime-prefiltered hashing pass), and
    untracked_file is a cheap disk walk (stat + membership, no read) rooted at
    knowledge/ + wisdom/ so the sibling trash/ / .dikw/ / assets/ trees are
    naturally excluded and .gitkeep / non-markdown files never trip. Both are K/W-only
    (D-layer adds/edits stay ingest's job); a page failing its re-projection is
    deactivated and surfaced via ApplyReport.persist_errors, successes under
    ApplyReport.reindexed_documents. Third slice of ADR-0005 (dangling_provenance
    is the fourth, above). This supersedes the never-built dikw client reindex <path> — the
    reindex story is now dikw client lint propose --rule stale_index (or
    --rule untracked_file) followed by dikw client lint apply <task_id>.
  • missing_file drift lint kind — purge orphaned document rows (D/K/W). A new
    deterministic lint kind (with MissingFileFixer) that detects an active
    documents row whose backing file is gone from disk — a sources/ (D),
    knowledge/ (K), or wisdom/ (W) file deleted outside dikw — and proposes a
    single purge_document op that drops the orphaned row + its outgoing edges via
    Storage.delete_document. Runs in the default lint scan; fix it with
    dikw client lint propose --rule missing_filedikw client lint apply <task_id>.
    Closes the original gap where deleting a source file left its row stuck at
    active=True forever (run_lint never scanned D rows). Inbound [[wikilink]]s
    from live pages are left to surface as broken_wikilink (delete_document clears
    only outgoing edges; the kind never rewrites a user's page); a truly dangling edge
    (both ends purged) clears itself. The op carries the resolved layer, re-checks
    at apply time that the file is still absent and the row still exists (propose→apply
    race / restored-file safety), and reports purged paths under
    ApplyReport.purged_documents. Second slice of ADR-0005
    (filesystem-as-source-of-truth); untracked_file / stale_index /
    dangling_provenance land in follow-ups.
  • dikw client delete <path> — first-class document deletion (D/K/W). A new
    immediat...
Read more

dikw-core v0.5.3

16 Jun 14:38
01d3f2c

Choose a tag to compare

OpenTelemetry observability — complete (PR1–PR5, #200#207)

dikw-core now ships full traces + metrics + logs instrumentation behind an optional [otel] extra — off by default, zero-cost when unused — so its runtime data integrates with any OTLP backend (Jaeger/Tempo, Prometheus/Grafana, Datadog, …).

  • Traces — one trace spans client → HTTP server → background task → engine op → provider call. gen_ai.* spans carry model + token usage (incl. Anthropic prompt-cache tokens); each retrieval fusion leg (BM25/vector/asset/graph) gets its own span.
  • Metrics — GenAI token.usage + operation.duration histograms and dikw-domain counters/histograms (ingest / synth / embed / retrieve / task), exported over OTLP/HTTP.
  • LogsDIKW_LOG_FORMAT=json emits structured log records carrying trace_id / span_id / service for log↔trace correlation (text default unchanged, byte-for-byte).
  • Docs + validation stack — new docs/observability.md operator cookbook + a one-command docker compose stack (OTel Collector → Jaeger / Prometheus / Grafana).

Enable server-side via a telemetry: section in dikw.yml; the dikw client CLI joins the same trace via the standard OTEL_* env vars.

Synth prompt restructure (#199)

The cached synth system prompt is rewritten into a six-invariant standing-policy spine with a correspondingly slimmed user prompt (single source of truth per rule); synth now forbids sources / lint in emitted front-matter — both are engine-owned.


Full changelog: CHANGELOG.md → 0.5.3

Install: pip install 'dikw-core==0.5.3' — add the [otel] extra for observability, [postgres] for the pgvector backend.

dikw-core v0.5.2

13 Jun 06:53
b178df1

Choose a tag to compare

synth-quality measurement + prompt tuning + post-synth self-check

This release is a synth-quality arc: it makes synth output measurable, then tunes the authoring prompt against those measurements, and adds a self-check the agent layer can gate on.

Highlights

Synth prompt-quality overhaul — Zettelkasten framing + worked English/Chinese examples (Phase 1), existing-page slug disambiguation + priority-create feedback to later fan-out groups (Phase 2), six targeted UP revisions (PR1), and an SP rewrite + cache-friendly prompt layout so the instruction prefix is byte-stable for OpenAI/codex prefix caching (PR2). The llm_max_tokens_synth default rises 2048 → 3072 (~768 tokens/page).

Synth-quality measurement foundation (Phase 0a/0b) — deterministic --eval synth diagnostics (source-chunk coverage, fallback/slug-merge ratios, category distribution) plus four opt-in, $0-by-default LLM judges with bootstrap 95% CIs: fact_entailment_ratio, category_correctness_ratio, wikilink_correctness_ratio, semantic_atomicity_ratio. Adds an A/B experiment harness (Welch t-test, no scipy) and a calibrated --judge-sample auto (n≥25 guarantees ≤±0.2 CI).

dikw client synth --verify — post-synth self-check over just this run's pages: persist / lint / semantic-duplicate legs emit one PASS/FAIL verdict, plus a report-only --judge grounding leg.

title_slug_quality lint — deterministic, zero-false-positive K-page title/slug hygiene (also a synth --verify gated leg).

dikw client eval --against / --write-baseline — machine-readable, direction-aware single-run regression gate.

Provider robustnessanthropic_compat + openai_compat complete() now stream, so the read timeout applies per SSE event (fixes reasoning-model timeouts on long syntheses, e.g. MiniMax-M3) and SDK failures are classified transient vs permanent. dikw client check no longer false-fails reasoning-model LLMs/embeddings; openai_compat embeddings are re-ordered by response index.

See the CHANGELOG for the complete, itemized list.

Install: pip install dikw-core==0.5.2 (or dikw-core[postgres]==0.5.2).