fix(compiler): enable JSON mode + harden plan handling (closes #71) by KylinMountain · Pull Request #75 · VectifyAI/OpenKB

KylinMountain · 2026-05-28T10:39:41Z

Summary

Fixes #71. openkb compile could silently produce zero concept pages when the LLM emitted malformed JSON for the concepts plan — the failure was caught, logged at WARNING, and the command still printed [OK], leaving the user to discover the empty wiki/concepts/ on their own.

Two failure modes from the issue:

Mode 2 — _parse_json raised; the raw plan was only logged at DEBUG and the function returned silently.
Mode 1 — LLM returned a structurally-wrong shape (e.g. nested list [[{...}]]), the list fallback at compiler.py:946-947 passed list items through as if they were dicts, and each per-concept task crashed at concept.get("title").

Fix (two layers)

Primary — pass response_format={\"type\": \"json_object\"} on the four LLM calls whose prompt asks for a JSON object (summary, concepts-plan, concept create, concept update). Constrains the decoder so providers that support json mode (OpenAI, DeepSeek, Qwen, Kimi, GLM, MiniMax, Doubao) can no longer return prose. The existing "Return ONLY valid JSON" prompt language already satisfies the DeepSeek/Qwen "prompt must mention json" requirement.

Defense in depth (addresses the issue's 3 suggestions):

New _filter_concept_items drops non-dict entries from create / update before they reach _gen_create / _gen_update. Logs the dropped count and the offending types.
Plan parse-failure WARNING now includes the first 500 chars of the raw LLM output (was DEBUG-only).
Compile prints a stdout [WARN] line on both "plan unparseable" and "planned N, wrote K" — silent regressions no longer masquerade as [OK].

Test plan

Reproduced and verified end-to-end against the issue's failure surface: pandoc <epub> -t markdown | openkb add with a ~152K-char Chinese document.
- gpt-5.4-mini: clean compile, no regressions (3 concepts + summary + rewrite).
- deepseek/deepseek-v4-flash (issue author's provider family): clean compile, no parse warnings.
Read through the diff — defense paths (_filter_concept_items, parse-fail message, planned-vs-written stdout line) reviewed for behavior under each malformed-input shape discussed in the issue. They are not exercised by the happy-path runs above; if reviewers want explicit unit coverage I'm happy to add it.
(Reviewer) Sanity check on at least one of: Qwen / Kimi / GLM / MiniMax — response_format is widely supported but provider-by-provider edge cases (e.g. older Qwen variants) may still ignore or reject the kwarg.

Concept generation can fail in two ways when the LLM emits malformed JSON for the concepts plan: (a) the parser raises and the function returns with zero concepts written, (b) a structurally-wrong shape (nested list, bare strings) slips past the list fallback and crashes each per-concept task at `concept.get("title")`. Both paths previously exited via `[OK]`, leaving users to discover an empty `wiki/concepts/` on their own. - Pass `response_format={"type": "json_object"}` on the four LLM calls whose prompt requests a JSON object (summary, plan, concept create, concept update). Constrains the decoder so providers that support json mode — OpenAI, DeepSeek, Qwen, Kimi, GLM, MiniMax, Doubao — can no longer return prose. The existing "Return ONLY valid JSON" prompt language already satisfies the DeepSeek/Qwen "must mention json" requirement. - Add `_filter_concept_items` to drop non-dict entries from the plan before they reach `_gen_create` / `_gen_update`. Logs the dropped count and the offending types so the cause is diagnosable. - Include the first 500 chars of `plan_raw` in the parse-failure WARNING (was DEBUG-only). Print a stdout `[WARN]` line on both "plan unparseable" and "planned N, wrote K" so a silent regression no longer masquerades as `[OK]`.

Code review on PR #75 surfaced two function-aborting bugs that the original defenses did not cover. Both shift the same silent-loss class one step upstream — `_compile_concepts` raises before any concept task runs, the v1 summary is never written on the short-doc path, and the new `[WARN] planned vs written` line never fires. - `_filter_concept_items` also requires a non-empty string `name`. Without this, dicts that omit the `name` key (JSON mode constrains syntax, not schema) reach the `planned_slugs` set comprehension at line 1014 and raise `KeyError: 'name'`. - New `_filter_related_slugs` mirrors the same guard for the `related` list, dropping non-strings. The previous code passed `parsed.get("related", [])` straight into `_sanitize_concept_name`, which calls `unicodedata.normalize("NFKC", name)` and raises `TypeError` on any non-`str` entry. Verified end-to-end against the original screenwriter EPUB on deepseek-v4-flash (no regression) and via direct unit-style calls that feed every mishape into both helpers and observe the expected drops + WARN messages.

Three observable-failure-mode improvements from the PR #75 review. The fourth finding (%r expanding Chinese 3-6×) was refuted — Python 3 repr() preserves printable Unicode 1:1, only control chars are escaped. - Detect ``finish_reason == "length"`` in ``_llm_call`` / ``_llm_call_async`` and emit both a logger.warning and a stdout [WARN] line. Truncation was previously silent: ``json_repair`` would salvage the prefix and parsing succeeded with a smaller-than-intended plan, with no signal anything was cut off. - Bump the concepts-plan ``max_tokens`` from 1024 to 2048. ``json_object`` mode adds quoting/escaping overhead, and the cap was tight even for modest plans; the truncation detector above guards against the 2048 case too. - Re-add a full DEBUG log of the raw plan on parse failure. The earlier diff replaced ``logger.debug("Raw: %s", plan_raw)`` with a 500-char preview embedded in the WARNING, which strictly reduced what DEBUG users could recover when the actual bad JSON lived past char 500. Now both: 500-char preview at WARNING, full payload at DEBUG. - Change "check warnings above" to "see log (stderr)" and include the failing exception type names inline in the partial-failure [WARN] line. ``logger.warning`` lands on stderr (Python default); users capturing only stdout previously got a WARN that pointed at logs they couldn't see — now the stdout line is self-contained. Verified end-to-end against the screenwriter EPUB on deepseek-v4-flash (no regression). Unit-tested ``_warn_if_truncated`` across stop/length/malformed-response inputs.

…eview The previous round of #75 added a partial-failure [WARN] line but missed three paths where concept generation still silently produces zero or broken pages while [OK] prints normally: - ``_filter_concept_items`` could strip an LLM plan down to nothing (e.g. ``{"create":[{"foo":"bar"}, "x"]}`` — both rejected). The early-return at ``if not create_items and not update_items and not related_items`` then fired with no stdout signal, indistinguishable from "LLM legitimately had nothing to add". Now compares pre-filter vs post-filter totals and emits a ``[WARN]`` when filtering wiped a non-empty plan. - ``parsed.get("content", raw)`` only used the default when the ``"content"`` key was absent; under ``response_format=json_object`` the LLM can legally return ``{"content": null}`` (refusal / content-policy hit). ``None`` then propagated into ``pending_writes`` and crashed ``strip_ghost_wikilinks(None, ...)`` mid-batch. Switched to ``parsed.get("content") or raw`` so null/empty collapses to the raw fallback. - An empty or whitespace-only ``content`` sailed through ``pending_writes`` as a successful concept and got committed as a blank Markdown page with no signal. New ``_require_nonempty_content`` raises ``ValueError`` after parsing/fallback; the gather loop's existing ``failure_types`` collector catches it and the partial-failure ``[WARN]`` surfaces it. Verified ``_require_nonempty_content`` across normal / null / empty / whitespace / non-str inputs. Existing tests (70) pass with no regression.

Strip multi-paragraph docstrings down to one-line summaries, remove inline comments that re-stated what the code does, and drop all issue/PR cross-references — those belong in commit messages and PR descriptions, not in code that has to age. Kept the WHYs a cold reader actually needs: the DeepSeek/Qwen prompt requirement on _JSON_RESPONSE_FORMAT, and the .get("content") or raw note about JSON-null distinction.

KylinMountain added 5 commits May 28, 2026 18:38

KylinMountain merged commit 7289cfe into main May 29, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compiler): enable JSON mode + harden plan handling (closes #71)#75

fix(compiler): enable JSON mode + harden plan handling (closes #71)#75
KylinMountain merged 5 commits into
mainfrom
fix/compile-json-response-format

KylinMountain commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KylinMountain commented May 28, 2026

Summary

Fix (two layers)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant