Skip to content

fix(compiler): enable JSON mode + harden plan handling (closes #71)#75

Merged
KylinMountain merged 5 commits into
mainfrom
fix/compile-json-response-format
May 29, 2026
Merged

fix(compiler): enable JSON mode + harden plan handling (closes #71)#75
KylinMountain merged 5 commits into
mainfrom
fix/compile-json-response-format

Conversation

@KylinMountain
Copy link
Copy Markdown
Collaborator

Summary

Fixes #71. openkb compile could silently produce zero concept pages when the LLM emitted malformed JSON for the concepts plan — the failure was caught, logged at WARNING, and the command still printed [OK], leaving the user to discover the empty wiki/concepts/ on their own.

Two failure modes from the issue:

  • Mode 2_parse_json raised; the raw plan was only logged at DEBUG and the function returned silently.
  • Mode 1 — LLM returned a structurally-wrong shape (e.g. nested list [[{...}]]), the list fallback at compiler.py:946-947 passed list items through as if they were dicts, and each per-concept task crashed at concept.get("title").

Fix (two layers)

Primary — pass response_format={\"type\": \"json_object\"} on the four LLM calls whose prompt asks for a JSON object (summary, concepts-plan, concept create, concept update). Constrains the decoder so providers that support json mode (OpenAI, DeepSeek, Qwen, Kimi, GLM, MiniMax, Doubao) can no longer return prose. The existing "Return ONLY valid JSON" prompt language already satisfies the DeepSeek/Qwen "prompt must mention json" requirement.

Defense in depth (addresses the issue's 3 suggestions):

  • New _filter_concept_items drops non-dict entries from create / update before they reach _gen_create / _gen_update. Logs the dropped count and the offending types.
  • Plan parse-failure WARNING now includes the first 500 chars of the raw LLM output (was DEBUG-only).
  • Compile prints a stdout [WARN] line on both "plan unparseable" and "planned N, wrote K" — silent regressions no longer masquerade as [OK].

Test plan

  • Reproduced and verified end-to-end against the issue's failure surface: pandoc <epub> -t markdown | openkb add with a ~152K-char Chinese document.
    • gpt-5.4-mini: clean compile, no regressions (3 concepts + summary + rewrite).
    • deepseek/deepseek-v4-flash (issue author's provider family): clean compile, no parse warnings.
  • Read through the diff — defense paths (_filter_concept_items, parse-fail message, planned-vs-written stdout line) reviewed for behavior under each malformed-input shape discussed in the issue. They are not exercised by the happy-path runs above; if reviewers want explicit unit coverage I'm happy to add it.
  • (Reviewer) Sanity check on at least one of: Qwen / Kimi / GLM / MiniMax — response_format is widely supported but provider-by-provider edge cases (e.g. older Qwen variants) may still ignore or reject the kwarg.

Concept generation can fail in two ways when the LLM emits malformed
JSON for the concepts plan: (a) the parser raises and the function
returns with zero concepts written, (b) a structurally-wrong shape
(nested list, bare strings) slips past the list fallback and crashes
each per-concept task at `concept.get("title")`. Both paths previously
exited via `[OK]`, leaving users to discover an empty `wiki/concepts/`
on their own.

- Pass `response_format={"type": "json_object"}` on the four LLM calls
  whose prompt requests a JSON object (summary, plan, concept create,
  concept update). Constrains the decoder so providers that support
  json mode — OpenAI, DeepSeek, Qwen, Kimi, GLM, MiniMax, Doubao —
  can no longer return prose. The existing "Return ONLY valid JSON"
  prompt language already satisfies the DeepSeek/Qwen "must mention
  json" requirement.
- Add `_filter_concept_items` to drop non-dict entries from the plan
  before they reach `_gen_create` / `_gen_update`. Logs the dropped
  count and the offending types so the cause is diagnosable.
- Include the first 500 chars of `plan_raw` in the parse-failure
  WARNING (was DEBUG-only). Print a stdout `[WARN]` line on both
  "plan unparseable" and "planned N, wrote K" so a silent regression
  no longer masquerades as `[OK]`.
Code review on PR #75 surfaced two function-aborting bugs that the
original defenses did not cover. Both shift the same silent-loss
class one step upstream — `_compile_concepts` raises before any
concept task runs, the v1 summary is never written on the short-doc
path, and the new `[WARN] planned vs written` line never fires.

- `_filter_concept_items` also requires a non-empty string `name`.
  Without this, dicts that omit the `name` key (JSON mode constrains
  syntax, not schema) reach the `planned_slugs` set comprehension at
  line 1014 and raise `KeyError: 'name'`.
- New `_filter_related_slugs` mirrors the same guard for the
  `related` list, dropping non-strings. The previous code passed
  `parsed.get("related", [])` straight into `_sanitize_concept_name`,
  which calls `unicodedata.normalize("NFKC", name)` and raises
  `TypeError` on any non-`str` entry.

Verified end-to-end against the original screenwriter EPUB on
deepseek-v4-flash (no regression) and via direct unit-style calls
that feed every mishape into both helpers and observe the expected
drops + WARN messages.
Three observable-failure-mode improvements from the PR #75 review.
The fourth finding (%r expanding Chinese 3-6×) was refuted — Python 3
repr() preserves printable Unicode 1:1, only control chars are escaped.

- Detect ``finish_reason == "length"`` in ``_llm_call`` /
  ``_llm_call_async`` and emit both a logger.warning and a stdout
  [WARN] line. Truncation was previously silent: ``json_repair`` would
  salvage the prefix and parsing succeeded with a smaller-than-intended
  plan, with no signal anything was cut off.
- Bump the concepts-plan ``max_tokens`` from 1024 to 2048. ``json_object``
  mode adds quoting/escaping overhead, and the cap was tight even for
  modest plans; the truncation detector above guards against the 2048
  case too.
- Re-add a full DEBUG log of the raw plan on parse failure. The earlier
  diff replaced ``logger.debug("Raw: %s", plan_raw)`` with a 500-char
  preview embedded in the WARNING, which strictly reduced what DEBUG
  users could recover when the actual bad JSON lived past char 500.
  Now both: 500-char preview at WARNING, full payload at DEBUG.
- Change "check warnings above" to "see log (stderr)" and include the
  failing exception type names inline in the partial-failure [WARN]
  line. ``logger.warning`` lands on stderr (Python default); users
  capturing only stdout previously got a WARN that pointed at logs
  they couldn't see — now the stdout line is self-contained.

Verified end-to-end against the screenwriter EPUB on
deepseek-v4-flash (no regression). Unit-tested ``_warn_if_truncated``
across stop/length/malformed-response inputs.
…eview

The previous round of #75 added a partial-failure [WARN] line but missed
three paths where concept generation still silently produces zero or
broken pages while [OK] prints normally:

- ``_filter_concept_items`` could strip an LLM plan down to nothing
  (e.g. ``{"create":[{"foo":"bar"}, "x"]}`` — both rejected). The
  early-return at ``if not create_items and not update_items and not
  related_items`` then fired with no stdout signal, indistinguishable
  from "LLM legitimately had nothing to add". Now compares pre-filter
  vs post-filter totals and emits a ``[WARN]`` when filtering wiped a
  non-empty plan.
- ``parsed.get("content", raw)`` only used the default when the
  ``"content"`` key was absent; under ``response_format=json_object``
  the LLM can legally return ``{"content": null}`` (refusal /
  content-policy hit). ``None`` then propagated into ``pending_writes``
  and crashed ``strip_ghost_wikilinks(None, ...)`` mid-batch. Switched
  to ``parsed.get("content") or raw`` so null/empty collapses to the
  raw fallback.
- An empty or whitespace-only ``content`` sailed through ``pending_writes``
  as a successful concept and got committed as a blank Markdown page
  with no signal. New ``_require_nonempty_content`` raises ``ValueError``
  after parsing/fallback; the gather loop's existing ``failure_types``
  collector catches it and the partial-failure ``[WARN]`` surfaces it.

Verified ``_require_nonempty_content`` across normal / null / empty /
whitespace / non-str inputs. Existing tests (70) pass with no regression.
Strip multi-paragraph docstrings down to one-line summaries, remove
inline comments that re-stated what the code does, and drop all
issue/PR cross-references — those belong in commit messages and PR
descriptions, not in code that has to age.

Kept the WHYs a cold reader actually needs: the DeepSeek/Qwen prompt
requirement on _JSON_RESPONSE_FORMAT, and the .get("content") or raw
note about JSON-null distinction.
@KylinMountain KylinMountain merged commit 7289cfe into main May 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

compiler: concept generation fails with and JSON parse error on same document

1 participant