Skip to content

fix: stop silently dropping wiki updates when section/frontmatter drifts#46

Open
KylinMountain wants to merge 4 commits into
mainfrom
fix/index-update-silent-failure
Open

fix: stop silently dropping wiki updates when section/frontmatter drifts#46
KylinMountain wants to merge 4 commits into
mainfrom
fix/index-update-silent-failure

Conversation

@KylinMountain
Copy link
Copy Markdown
Collaborator

Summary

  • Fixes index.md file is not being updated when we call add folder #26: openkb add <folder> (and single-file add) appears to update summaries/, concepts/, and log.md but leaves index.md untouched when the user's index.md has drifted from the canonical layout.
  • Same anti-pattern existed in three more places (_backlink_summary, _backlink_concepts, _add_related_link) — fixed in one pass.

Root cause

Four helpers in compiler.py combined a loose substring check with a strict newline-anchored str.replace:

if "## Documents" in text:                         # loose
    text = text.replace("## Documents\n", ...)     # strict

If the user's wiki had any small deviation — trailing space on a heading, CRLF line endings, hand-edited renaming, missing section, sources:[a] without a space after the colon — the if-branch matched but the replace did nothing, silently dropping the entry. No warning, no error, no log. The user's only signal was "why is my index.md never growing?"

That matches the symptom in #26 exactly: summaries/concepts/log keep updating (they're direct write_text / append), only index.md (which goes through this dance) silently fails.

Changes

  • New _ensure_h2_section(lines, heading) helper: auto-appends a missing H2 section and logs a warning so the drift is observable.
  • New _prepend_source_to_frontmatter(text, source_file) helper: parses YAML frontmatter line-by-line and prepends to the inline sources: list, tolerating formatting variants.
  • _update_index, _backlink_summary, _backlink_concepts, _add_related_link rewritten to use line-based section parsing via the existing _get_section_bounds / _insert_section_entry helpers instead of brittle string replace.

Test plan

  • All 220 existing tests still pass.
  • 6 new regression tests covering:
    • _update_index with ## Documents section missing
    • _update_index with ## Concepts section missing
    • _backlink_summary / _backlink_concepts with trailing-whitespace heading
    • _add_related_link with sources:[a] (no space) and with no sources: line at all
  • Total: 224 passed.

🤖 Generated with Claude Code

`_update_index`, `_backlink_summary`, `_backlink_concepts`, and
`_add_related_link` all combined a loose `substring in text` check
with a strict newline-anchored `str.replace`. When the user's wiki
had a hand-edited heading (trailing space, missing section, etc.)
or non-canonical frontmatter (`sources:[a]` with no space), the
if-branch matched but the replace did nothing, silently dropping
the new entry/link.

Refactor all four to use line-based section parsing via a new
`_ensure_h2_section` helper (logs a warning + appends when the
section is missing), and extract `_prepend_source_to_frontmatter`
that walks frontmatter lines instead of string-matching. Adds
regression tests covering trailing-whitespace headings, missing
sections, and `sources:` without a space.

Fixes #26.
@KylinMountain KylinMountain force-pushed the fix/index-update-silent-failure branch from aa47087 to 3c5d4c1 Compare May 11, 2026 09:22
@VectifyAI VectifyAI deleted a comment from quanqigu May 11, 2026
@KylinMountain
Copy link
Copy Markdown
Collaborator Author

Code review

Found 1 issue:

  1. The PR's stated goal is to recover from heading drift, but _ensure_h2_section relies on _get_section_bounds's exact-equality match (if line == heading), so any drift like a trailing space on the heading is treated as "section missing" and a duplicate H2 is appended at the end of the file. Old entries stay stranded under the original (drifted) heading; new entries land under the freshly-appended one. Same family of silent corruption as the original bug, just shifted from "drop" to "duplicate."

The exact-equality check that defeats the recovery:

def _get_section_bounds(lines: list[str], heading: str) -> tuple[int, int] | None:
"""Return the [start, end) bounds for a Markdown H2 section."""
for i, line in enumerate(lines):
if line == heading:
start = i + 1
end = len(lines)
for j in range(start, len(lines)):
if lines[j].startswith("## "):
end = j
break
return start, end
return None

The appender that runs whenever the exact match fails:

def _ensure_h2_section(lines: list[str], heading: str) -> None:
"""Ensure an H2 section ``heading`` exists in ``lines``; append if missing.
Recovers from hand-edited or drifted index.md files where the expected
section was removed or renamed — without this, downstream inserts would
silently no-op and entries would be dropped.
"""
if _get_section_bounds(lines, heading) is not None:
return
logger.warning(
"index.md is missing %r section; appending it. "
"Check whether the file was hand-edited away from the canonical layout.",
heading,
)
while lines and lines[-1] == "":
lines.pop()
if lines:
lines.append("")
lines.append(heading)
lines.append("")

The two new regression tests test_section_with_trailing_whitespace_still_merges pass despite this bug because they only assert "[[concepts/transformer]]" in text — the new link is present (in the appended clean section) but the file now has two ## Related Concepts headings:

def test_section_with_trailing_whitespace_still_merges(self, tmp_path):
"""Heading with trailing space must not cause silent drop of new links."""
wiki = tmp_path / "wiki"
summaries = wiki / "summaries"
summaries.mkdir(parents=True)
(summaries / "paper.md").write_text(
"# Summary\n\nContent.\n\n## Related Concepts \n- [[concepts/attention]]\n",
encoding="utf-8",
)
_backlink_summary(wiki, "paper", ["attention", "transformer"])
text = (summaries / "paper.md").read_text()
assert "[[concepts/transformer]]" in text
class TestBacklinkConcepts:

Trace for input "## Related Concepts \n- [[concepts/attention]]\n" with slugs=["attention","transformer"]: missing=["transformer"] (attention already linked); _ensure_h2_section sees no exact ## Related Concepts line → appends a new one → _insert_section_entry inserts transformer into the appended section. Output has two ## Related Concepts H2s.

Suggested fix: change line 318 to if line.rstrip() == heading: (also tighten the new tests with assert text.count("## Related Concepts") == 1).

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

`_get_section_bounds` used exact `line == heading` equality, so a heading
like `## Documents ` (trailing space) was treated as a different section.
`_ensure_h2_section` would then append a clean `## Documents` at the end
of the file, stranding the original entries under the drifted heading —
silently converting the previous "drop" failure mode into a "duplicate
section" failure mode.

Match on `line.rstrip() == heading` so drifted whitespace is recovered
properly. Tighten the two regression tests with `text.count(...) == 1`
so the duplicate-H2 case fails them. Generalize the warning text since
`_ensure_h2_section` is also called for backlink sections.
Previously `_get_section_bounds` did exact-equality on the start line and
a separate `startswith("## ")` walk to find the end. Two heading-detection
rules in one function — easy to drift out of sync (the trailing-whitespace
fix in the prior commit only touched the start side).

Pull the H2 scan into `_iter_h2_headings`, which returns the normalized
heading + index for every H2 line. `_get_section_bounds` then just indexes
into that list: the target match and the "next H2" boundary use the same
rule by construction. No observable behavior change.
@KylinMountain
Copy link
Copy Markdown
Collaborator Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

`_write_concept` (called on every concept update) still used the same
`fm.replace("sources: [", ...)` anti-pattern this PR introduced
`_prepend_source_to_frontmatter` to eliminate — leaving the bug fixed in
`_add_related_link` but live on the higher-frequency path. Non-canonical
frontmatter like `sources:[paper.pdf]` (no space after colon) silently
dropped the new source on every subsequent compile.

Replace the inline frontmatter manipulation with a single call to the
helper, mirroring `_add_related_link`. Add a regression test exercising
`sources:[a]` (no space) through `_write_concept`'s update path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

index.md file is not being updated when we call add folder

1 participant