Skip to content

fix(wiki): seed-codebase emits modern kind tags the classifier reads (ADR-2244 Phase 6.2)#40

Merged
cdeust merged 1 commit into
mainfrom
feat/wiki-seed-codebase-modern-kinds-phase6.2
May 13, 2026
Merged

fix(wiki): seed-codebase emits modern kind tags the classifier reads (ADR-2244 Phase 6.2)#40
cdeust merged 1 commit into
mainfrom
feat/wiki-seed-codebase-modern-kinds-phase6.2

Conversation

@cdeust
Copy link
Copy Markdown
Owner

@cdeust cdeust commented May 13, 2026

Summary

Producer audit follow-up — closes the second leak flagged out-of-scope in #38. wiki_seed_codebase was emitting kind hints in a tag shape (kind:<value>) that the classifier never read; the kind hint flowed nowhere.

What was wrong

wiki_seed_codebase ingests markdown files (README, ADR, spec, convention, lesson) from a repo into Cortex memory. The _kind_for(rel_path) helper inferred a kind from the path, then the call-site wrote it as a kind:<value> tag along with seed:codebase and file:<rel>.

Two problems:

  1. Legacy kind names: _kind_for returned spec / convention / lesson / note — none of these match any modern kind tag alias in wiki_axis_registry._DEFAULT_KINDS.
  2. Wrong tag shape: even if _kind_for had returned modern names, kind:adr is a different string than adr. The classifier's tag_aliases set intersection requires exact match.

Net: every seed-imported markdown page routed via legacy → modern fallback to kind=explanation, ignoring the path hint that said "this is an ADR" or "this is architecture".

What changed

_kind_for returns modern kinds

Path pattern Before After (modern)
ADR / decision adr adr
architecture spec rfc
convention / style convention explanation
lesson / postmortem lesson explanation
README / default note explanation

The new values are themselves registered tag aliases (or kind names) in _DEFAULT_KINDS, so they flow through the classifier.

Call-site tag shape

# Before
"tags": ["seed:codebase", f"kind:{kind}", f"file:{rel}"]

# After
"tags": ["seed:codebase", "imported", kind, f"file:{rel}"]

The bare kind tag (e.g. adr, rfc, explanation) is now a registered alias. imported tag flips provenance to imported — correct for bulk-imported markdown.

Tests

tests_py/handlers/test_wiki_seed_codebase.py (NEW) — 8 tests:

Test Surface
test_adr_path_routes_to_adr path → kind
test_decision_path_routes_to_adr path → kind
test_architecture_path_routes_to_rfc path → kind
test_convention_path_routes_to_explanation path → kind
test_lesson_path_routes_to_explanation path → kind
test_readme_routes_to_explanation path → kind
test_unknown_path_defaults_to_explanation path → kind
test_all_returned_kinds_are_registered contract pin: every returned value must be a registered tag alias

The contract-pin test is the load-bearing one: if a future refactor returns a value that isn't a registered alias, the kind hint silently flows nowhere again. The test catches that class of bug.

Test plan

  • pytest tests_py/handlers/test_wiki_seed_codebase.py8 passed
  • ruff format --check and ruff check clean
  • CI on this PR

Out of scope

  • End-to-end test through remember()wiki_sync → on-disk page. The seed handler delegates to remember() which kicks off the broader pipeline; an integration test would need a full DB fixture. The unit test on the tag-shape invariant catches the producer bug; downstream behavior is covered by existing wiki_sync tests.

🤖 Generated with Claude Code

…(ADR-2244 Phase 6.2)

Producer audit follow-up. ``wiki_seed_codebase`` was the second
producer flagged out-of-scope in #38 (the codebase_analyze fix): its
``_kind_for`` mapped seed-file paths to *legacy* kind names
(``spec``, ``convention``, ``lesson``, ``note``) which the call-site
then wrote as ``kind:<value>`` tags — a shape the classifier never
read. The kind hint flowed nowhere.

This PR fixes both halves of the leak.

Changes
-------

  _kind_for now returns *modern* kind names whose values are
  themselves tag aliases registered in
  ``wiki_axis_registry._DEFAULT_KINDS``:

    Path pattern        Before        After (modern)
    ─────────────────────────────────────────────────
    ADR / decision      adr           adr
    architecture        spec          rfc
    convention / style  convention    explanation
    lesson / postmortem lesson        explanation
    README / default    note          explanation

  Call-site tag list now writes:
      [seed:codebase, imported, <modern-kind>, file:<rel>]

  Old shape was ``[seed:codebase, kind:<legacy>, file:<rel>]``. The
  bare modern-kind tag (``adr``, ``rfc``, ``explanation``) is a
  registered alias so the classifier picks it up. ``imported`` flips
  provenance to ``imported`` (these are bulk-imported markdown files,
  not human-authored fresh in the wiki).

Test coverage
-------------

``tests_py/handlers/test_wiki_seed_codebase.py`` (NEW) — 8 tests:

  Path → modern kind:
    - ADR / decision → adr
    - architecture → rfc
    - convention / style → explanation
    - lesson / postmortem → explanation
    - README → explanation
    - unknown → explanation

  Contract pin:
    test_all_returned_kinds_are_registered iterates the path samples
    and asserts every returned kind is a registered tag alias in the
    classifier registry. If a future refactor introduces a value the
    registry doesn't know about (the original bug class), this test
    fails loudly.

8 passed; ruff format and check clean.

Out of scope
------------

  * Verifying the end-to-end seed → memory → wiki-page chain produces
    the correct kind/provenance. The seed handler delegates to
    ``remember()`` which kicks off the broader pipeline; the integration
    test would need a full DB fixture. The unit test on the tag-shape
    invariant catches the producer bug; pipeline behaviour is covered
    by wiki_sync tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cdeust cdeust merged commit 90b3203 into main May 13, 2026
11 checks passed
@cdeust cdeust deleted the feat/wiki-seed-codebase-modern-kinds-phase6.2 branch May 13, 2026 10:31
cdeust added a commit that referenced this pull request May 13, 2026
…n complete) (#41)

Bundles 11 merged PRs (#30-#40) since v3.15.4 closing out the
ADR-2244 wiki classification cycle:

  Phase 2     #31 #32  pilot migration analyzer + 1000-page
                       verification (96.7% kind-kept, passes target)
  Phase 3     #33      stable page IDs (UUID4) + redirect data model
                       + backfill CLI
  Phase 3.2   #34      handler-layer redirect mechanics (wiki_read
                       follows transparently, wiki_list/wiki_reindex
                       exclude stubs, new wiki_rename tool)
  Phase 4.1   #35 #36  deterministic bulk migration for the 70
                       known pollution paths (.md.md, timestamp-slug,
                       path-leak)
  Phase 4.2   #37      file-doc re-bucket (8734 pages from notes/
                       to reference/ with modern frontmatter)
  Phase 5     #39      filter auto-generated pages from default
                       listings; INDEX.md splits human-authored
                       from auto-gen
  Phase 6     #38      producer audit — codebase_analyze output
                       routes to kind=reference (root-causes the
                       8734-page misroute)
  Phase 6.2   #40      producer audit — wiki_seed_codebase emits
                       modern kind tags the classifier reads
  Security    #30      authlib CVE-2026-44681 bump (dependabot #4)

Notes for users:
  - Wiki on disk not migrated yet. Apply scripts (in scripts/) are
    dry-run by default. Three commands to fully migrate; each is
    idempotent and leaves redirect stubs.
  - Phases 5/6/6.2 take effect on next MCP restart.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant