fix(compiler): enable JSON mode + harden plan handling (closes #71)#75
Merged
Conversation
Concept generation can fail in two ways when the LLM emits malformed
JSON for the concepts plan: (a) the parser raises and the function
returns with zero concepts written, (b) a structurally-wrong shape
(nested list, bare strings) slips past the list fallback and crashes
each per-concept task at `concept.get("title")`. Both paths previously
exited via `[OK]`, leaving users to discover an empty `wiki/concepts/`
on their own.
- Pass `response_format={"type": "json_object"}` on the four LLM calls
whose prompt requests a JSON object (summary, plan, concept create,
concept update). Constrains the decoder so providers that support
json mode — OpenAI, DeepSeek, Qwen, Kimi, GLM, MiniMax, Doubao —
can no longer return prose. The existing "Return ONLY valid JSON"
prompt language already satisfies the DeepSeek/Qwen "must mention
json" requirement.
- Add `_filter_concept_items` to drop non-dict entries from the plan
before they reach `_gen_create` / `_gen_update`. Logs the dropped
count and the offending types so the cause is diagnosable.
- Include the first 500 chars of `plan_raw` in the parse-failure
WARNING (was DEBUG-only). Print a stdout `[WARN]` line on both
"plan unparseable" and "planned N, wrote K" so a silent regression
no longer masquerades as `[OK]`.
Code review on PR #75 surfaced two function-aborting bugs that the original defenses did not cover. Both shift the same silent-loss class one step upstream — `_compile_concepts` raises before any concept task runs, the v1 summary is never written on the short-doc path, and the new `[WARN] planned vs written` line never fires. - `_filter_concept_items` also requires a non-empty string `name`. Without this, dicts that omit the `name` key (JSON mode constrains syntax, not schema) reach the `planned_slugs` set comprehension at line 1014 and raise `KeyError: 'name'`. - New `_filter_related_slugs` mirrors the same guard for the `related` list, dropping non-strings. The previous code passed `parsed.get("related", [])` straight into `_sanitize_concept_name`, which calls `unicodedata.normalize("NFKC", name)` and raises `TypeError` on any non-`str` entry. Verified end-to-end against the original screenwriter EPUB on deepseek-v4-flash (no regression) and via direct unit-style calls that feed every mishape into both helpers and observe the expected drops + WARN messages.
Three observable-failure-mode improvements from the PR #75 review. The fourth finding (%r expanding Chinese 3-6×) was refuted — Python 3 repr() preserves printable Unicode 1:1, only control chars are escaped. - Detect ``finish_reason == "length"`` in ``_llm_call`` / ``_llm_call_async`` and emit both a logger.warning and a stdout [WARN] line. Truncation was previously silent: ``json_repair`` would salvage the prefix and parsing succeeded with a smaller-than-intended plan, with no signal anything was cut off. - Bump the concepts-plan ``max_tokens`` from 1024 to 2048. ``json_object`` mode adds quoting/escaping overhead, and the cap was tight even for modest plans; the truncation detector above guards against the 2048 case too. - Re-add a full DEBUG log of the raw plan on parse failure. The earlier diff replaced ``logger.debug("Raw: %s", plan_raw)`` with a 500-char preview embedded in the WARNING, which strictly reduced what DEBUG users could recover when the actual bad JSON lived past char 500. Now both: 500-char preview at WARNING, full payload at DEBUG. - Change "check warnings above" to "see log (stderr)" and include the failing exception type names inline in the partial-failure [WARN] line. ``logger.warning`` lands on stderr (Python default); users capturing only stdout previously got a WARN that pointed at logs they couldn't see — now the stdout line is self-contained. Verified end-to-end against the screenwriter EPUB on deepseek-v4-flash (no regression). Unit-tested ``_warn_if_truncated`` across stop/length/malformed-response inputs.
…eview The previous round of #75 added a partial-failure [WARN] line but missed three paths where concept generation still silently produces zero or broken pages while [OK] prints normally: - ``_filter_concept_items`` could strip an LLM plan down to nothing (e.g. ``{"create":[{"foo":"bar"}, "x"]}`` — both rejected). The early-return at ``if not create_items and not update_items and not related_items`` then fired with no stdout signal, indistinguishable from "LLM legitimately had nothing to add". Now compares pre-filter vs post-filter totals and emits a ``[WARN]`` when filtering wiped a non-empty plan. - ``parsed.get("content", raw)`` only used the default when the ``"content"`` key was absent; under ``response_format=json_object`` the LLM can legally return ``{"content": null}`` (refusal / content-policy hit). ``None`` then propagated into ``pending_writes`` and crashed ``strip_ghost_wikilinks(None, ...)`` mid-batch. Switched to ``parsed.get("content") or raw`` so null/empty collapses to the raw fallback. - An empty or whitespace-only ``content`` sailed through ``pending_writes`` as a successful concept and got committed as a blank Markdown page with no signal. New ``_require_nonempty_content`` raises ``ValueError`` after parsing/fallback; the gather loop's existing ``failure_types`` collector catches it and the partial-failure ``[WARN]`` surfaces it. Verified ``_require_nonempty_content`` across normal / null / empty / whitespace / non-str inputs. Existing tests (70) pass with no regression.
Strip multi-paragraph docstrings down to one-line summaries, remove
inline comments that re-stated what the code does, and drop all
issue/PR cross-references — those belong in commit messages and PR
descriptions, not in code that has to age.
Kept the WHYs a cold reader actually needs: the DeepSeek/Qwen prompt
requirement on _JSON_RESPONSE_FORMAT, and the .get("content") or raw
note about JSON-null distinction.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #71.
openkb compilecould silently produce zero concept pages when the LLM emitted malformed JSON for the concepts plan — the failure was caught, logged atWARNING, and the command still printed[OK], leaving the user to discover the emptywiki/concepts/on their own.Two failure modes from the issue:
_parse_jsonraised; the raw plan was only logged at DEBUG and the function returned silently.[[{...}]]), the list fallback atcompiler.py:946-947passed list items through as if they were dicts, and each per-concept task crashed atconcept.get("title").Fix (two layers)
Primary — pass
response_format={\"type\": \"json_object\"}on the four LLM calls whose prompt asks for a JSON object (summary, concepts-plan, concept create, concept update). Constrains the decoder so providers that support json mode (OpenAI, DeepSeek, Qwen, Kimi, GLM, MiniMax, Doubao) can no longer return prose. The existing "Return ONLY valid JSON" prompt language already satisfies the DeepSeek/Qwen "prompt must mention json" requirement.Defense in depth (addresses the issue's 3 suggestions):
_filter_concept_itemsdrops non-dict entries fromcreate/updatebefore they reach_gen_create/_gen_update. Logs the dropped count and the offending types.[WARN]line on both "plan unparseable" and "planned N, wrote K" — silent regressions no longer masquerade as[OK].Test plan
pandoc <epub> -t markdown | openkb addwith a ~152K-char Chinese document._filter_concept_items, parse-fail message, planned-vs-written stdout line) reviewed for behavior under each malformed-input shape discussed in the issue. They are not exercised by the happy-path runs above; if reviewers want explicit unit coverage I'm happy to add it.response_formatis widely supported but provider-by-provider edge cases (e.g. older Qwen variants) may still ignore or reject the kwarg.