feat(wiki): stable page IDs + redirect stubs (ADR-2244 Phase 3 foundation) by cdeust · Pull Request #33 · cdeust/Cortex

cdeust · 2026-05-13T08:31:57Z

Summary

Phase 3 of ADR-2244 — making the path a view over a stable identifier so the upcoming Phase 4 bulk migration (rename the .md.md / timestamp-slug / path-leak pages, re-bucket the 7820 file-docs) doesn't rot inbound links.

What lands here

Module	Purpose
`mcp_server/core/wiki_identity.py`	UUID4 generation, parsing, validation, frontmatter helpers. Pure logic.
`mcp_server/core/wiki_redirect.py`	Redirect data model, frontmatter detection, path-based chain resolution (cycle + depth protection), stub authoring. Pure logic.
`mcp_server/core/wiki_sync.py`	New writes carry `id: <uuid>` in their frontmatter.
`scripts/wiki_backfill_ids.py`	One-shot CLI that walks the wiki and mints an id on every page that lacks one. Dry-run by default; `--apply` to write. Idempotent.

Design choices

UUID4, not UUID1 — UUID1 leaks host MAC into the identifier; the wiki may be exported, so we use the random variant.
Canonical 36-character hex form. No structured embedding in the path; paths stay human-readable.
IDs are independent of memory_id and draft_id. A page can be re-synthesised from a memory and still keep its identity across that operation.
Redirect stubs accept either redirect_to (path) or redirect_id (UUID). When both present the ID wins — paths are mutable but IDs are stable. Path-based chain resolution is implemented here; ID-based resolution requires an id→path index that lands with the read-handler changes in Phase 3.2.
Cycle + depth protection — resolve_chain returns None on a visited-node cycle or chains longer than MAX_REDIRECT_DEPTH=5, matching MediaWiki convention.

Backfill behaviour

Idempotent. Pages with valid ids are skipped. Redirect stubs are skipped (they don't need their own identity, they reference another page's). Pages with no frontmatter at all are skipped (synthesising frontmatter would change page semantics). Pages with a malformed id (id: garbage) have the line replaced rather than duplicated.

Dry-run against the live wiki (9608 pages):

Scanned:           9608
Would mint:        9607
Skipped (no fm):   1
Errored:           0

The remaining Phase 3 work — applying the backfill to your wiki, plus the handler-layer changes (wiki_read follows redirects, wiki_migrate writes stubs on rename, wiki_list optionally hides redirects) — lands in follow-ups.

Tests

File	Tests
`tests_py/core/test_wiki_identity.py`	22 — format validation, generation uniqueness, extraction, ensure-or-mint
`tests_py/core/test_wiki_redirect.py`	24 — parse, dataclass validation, chain resolution (single/multi-hop, cycles, self-loop, depth limit, id-only), stub authoring + roundtrip
`tests_py/core/test_wiki_sync_routing.py`	+2 — sync writes a valid id, distinct ids per page
`tests_py/scripts/test_wiki_backfill_ids.py`	12 — dry-run idempotence, redirect/no-fm skipping, malformed-id replacement, distinct ids across pages
Total new	60

Targeted suite: 66 passed
tests_py/core/ + tests_py/shared/ + tests_py/scripts/: 2049 passed
ruff format --check and ruff check clean
Dry-run against live 9608-page wiki: 0 errors, 9607 would-mint, 1 skipped

How to use after merge

# Dry-run (recommended first pass)
python scripts/wiki_backfill_ids.py

# Apply to the live wiki
python scripts/wiki_backfill_ids.py --apply

# Idempotent — second --apply is a no-op
python scripts/wiki_backfill_ids.py --apply

Out of scope for this PR (Phase 3.2 / Phase 4 follow-ups)

wiki_read handler resolves redirects transparently
wiki_migrate writes a redirect stub at the old path when moving a page
wiki_list / wiki_reindex learn to hide or annotate redirect stubs
Bulk migration script that uses redirect stubs (Phase 4)

🤖 Generated with Claude Code

…tion) Phase 3 makes the path a *view* over a stable identifier. Without stable IDs the upcoming Phase 4 bulk migration (renaming the .md.md / timestamp-slug / path-leak pages, re-bucketing the 7820 file-docs) would rot every inbound link. What lands here --------------- mcp_server/core/wiki_identity.py UUID4 generation, parsing, validation, frontmatter helpers. Pure logic. mcp_server/core/wiki_redirect.py Redirect data model, frontmatter detection, path-based chain resolution with cycle + depth protection, stub authoring. Pure logic. mcp_server/core/wiki_sync.py New writes carry ``id: <uuid>`` in their frontmatter. scripts/wiki_backfill_ids.py One-shot CLI that walks the wiki and mints an id on every page that lacks one. Dry-run by default; apply with ``--apply``. Design choices -------------- * UUID4, not UUID1 — UUID1 leaks host MAC into the identifier; the wiki may be exported, so we use the random variant. * Canonical 36-character hex form. No structured embedding in the path; paths stay human-readable. * IDs are independent of ``memory_id`` and ``draft_id``. A page can be re-synthesised from a memory and still keep its identity. * Redirect stubs accept either ``redirect_to`` (path) or ``redirect_id`` (UUID). When both present the ID wins — paths are mutable but IDs are stable. Path-based chain resolution is implemented here; ID-based resolution requires an id→path index that the read-handler layer will provide in Phase 3.2. * Cycle + depth protection: ``resolve_chain`` returns None on a visited-node cycle or chains longer than ``MAX_REDIRECT_DEPTH=5``, matching MediaWiki convention. Backfill behaviour ------------------ Idempotent. Pages with valid ids are skipped. Redirect stubs are skipped (they don't need their own identity, they reference another page's). Pages with no frontmatter at all are skipped (synthesising frontmatter would change page semantics). Pages with a malformed id (``id: garbage``) have the line replaced rather than duplicated. Dry-run against the live wiki (9608 pages): Scanned: 9608 Would mint: 9607 Skipped (no fm): 1 Errored: 0 The remaining work — applying the backfill to the user's wiki, and the handler-layer changes (``wiki_read`` follows redirects, ``wiki_migrate`` writes stubs on rename) — lands in follow-ups. Tests ----- tests_py/core/test_wiki_identity.py 22 tests — format validation, generation uniqueness, extraction, ensure-or-mint. tests_py/core/test_wiki_redirect.py 24 tests — parse, dataclass validation, chain resolution (single hop, multi hop, cycles, self-loop, depth limit, id-only redirect), stub authoring + roundtrip. tests_py/core/test_wiki_sync_routing.py 2 new tests — sync writes a valid id, distinct ids per page. tests_py/scripts/test_wiki_backfill_ids.py 12 tests — dry-run idempotence, redirect/no-fm skipping, malformed-id replacement, distinct ids across pages. Targeted suite: 66 passed. tests_py/core/ + tests_py/shared/ + tests_py/scripts/: 2049 passed. ``ruff format --check`` and ``ruff check`` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Phase 3.2) (#34) Wires the Phase 3 data model (#33) into the read path and adds a new write handler that performs the rename + stub atomically. With this change ``wiki_rename old.md new.md`` produces: * ``new.md`` — the original content moved verbatim (id preserved) * ``old.md`` — a redirect stub pointing at new.md (with redirect_id = source page id, for future id-based resolution) And ``wiki_read old.md`` then returns the content of ``new.md`` along with ``redirect_chain: ["old.md", "new.md"]``. Inbound links to the old path keep working through the migration. Handler changes --------------- * ``wiki_read`` — follow redirect stubs transparently up to 5 hops. ``follow_redirects: false`` opts out (admin/migration tooling that needs to inspect the stub itself). New response field: ``redirect_chain``. * ``wiki_list`` — exclude redirect stubs from the listing by default. ``include_redirects: true`` opts in. New response field: ``redirect_count``. * ``wiki_reindex`` — drop redirect stubs from .generated/INDEX.md and surface the count by kind in the response. The index now lists only live pages, which is what readers actually want. * ``wiki_rename`` — NEW. Move a page from one path to another and leave a stub at the old path. Refuses to operate on pages without a stable frontmatter id (run ``scripts/wiki_backfill_ids.py --apply`` first), refuses to chain stubs (rename the terminal page instead), refuses to overwrite an existing destination unless ``overwrite_dest=true``. Tool registry: ``wiki_rename`` registered alongside the other 8 wiki tools. ``wiki_read`` and ``wiki_list`` MCP signatures extended with their new optional parameters. Stub semantics -------------- The stub carries ``redirect_id = <source page id>`` so future id-based resolution (which a follow-up will add for cross-rename resolution when the path itself is renamed twice) works. ``redirect_to`` is populated with the new path as the cheap path-based resolution target. Both forms are emitted; the id wins when an id-aware reader arrives. Tests ----- ``tests_py/handlers/test_wiki_redirect_handlers.py`` (NEW) — 20 tests covering every handler change: read: - returns content for a normal page (chain = []) - follows single-hop redirect - follows multi-hop chain (3 pages, 2 hops) - ``follow_redirects: false`` returns the stub itself - cycle returns error - dangling redirect returns error - missing source returns error list: - excludes stubs by default; redirect_count surfaced - ``include_redirects: true`` returns both - redirect_count is 0 when no stubs reindex: - stubs absent from INDEX.md; by_kind counts only live pages rename: - creates stub at old path with correct redirect_to, redirect_id, redirect_reason - refuses missing source - refuses source without id - refuses existing destination - ``overwrite_dest=true`` works - refuses to chain stubs - refuses same path - end-to-end: rename then read resolves to the new content - body preserved verbatim through the move Targeted suite: 86 passed (Phase 3 + Phase 3.2 surface). Broader: tests_py/core/ + tests_py/shared/ + tests_py/scripts/ + relevant tests_py/handlers/ → 2075 passed. ``ruff format --check`` and ``ruff check`` clean. What still ships in a follow-up ------------------------------- * ID→path index for ID-only redirect resolution (currently only path-based chain walking works; id-only stubs return None from resolve_chain so they error in wiki_read with a clear message). * Phase 4 bulk migration script that loops wiki_rename over the 88 known pollution paths (.md.md slug bug, timestamp-slugs, path-leak titles) — gated on this PR + #33 landing. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…(ADR-2244 Phase 4.1) (#35) Phase 4 of ADR-2244 — the bulk migration. This is the deterministic half: three pollution classes with mechanically computable target paths. The LLM-assisted re-classification half (the 7820 file-doc re-bucket) is a separate scope and lands in a follow-up. Targets ------- Audit 2026-05-12 found three deterministic-rename pollution classes: Pattern Audit count ──────────────────────────────────────────────────── ``*.md.md`` 58 ``*-decision-created-YYYY-MM-DDt...z.md`` 10 ``*users-cdeust-... .md`` 11+ (path-leak in slug) Live dry-run after this commit: Pollution paths detected: 70 (all currently skipped because the backfill from #33 hasn't been applied yet — the script correctly refuses to rename pages without a stable id) Script flow ----------- scripts/wiki_bulk_migrate.py 1. Walk wiki, classify each .md page by pollution pattern. 2. For each match: a. Skip redirect stubs (already moved). b. Skip pages without a frontmatter ``id`` (Phase 3 invariant). Caller is told to run ``wiki_backfill_ids.py --apply`` first. c. Compute clean target path: - .md.md → strip duplicate extension - timestamp-slug → derive slug from frontmatter title or first body heading - path-leak → same, plus reject path-shaped titles d. Record the Pollution record. 3. On --apply: call the ``wiki_rename`` handler for each item, which writes content at the new path and a redirect stub at the old one. Inbound links keep resolving. Idempotency: a second --apply finds zero pollution paths (the renames landed; their stubs are detected and skipped). Slug derivation --------------- ``_derive_clean_slug`` picks from three sources in order: 1. Frontmatter ``title`` (if non-empty and not path-shaped / timestamp-shaped / too short / synthetic ``memory-XXX``) 2. First body H1/H2 heading (same cleanness check) 3. Deterministic 6-hex-character hash of the body content prefixed with the kind (``decision-abc123`` / ``page-def456``) The hash fallback is rare — most pollution pages already have a proper ``title`` field; it's the *slug* that's broken, not the metadata. Tests ----- ``tests_py/scripts/test_wiki_bulk_migrate.py`` (NEW) — 22 tests: Detection (6): .md.md positive + negative; timestamp-slug positive + negative; path-leak positive + negative. Slug derivation (5): accepts real titles; rejects path / timestamp / too-short titles; falls back to body heading; falls back to hash. plan() (5): finds all three classes in one pass; skips pages without id; skips existing redirect stubs; proposes the correct target for timestamp-slug and path-leak (preserving numeric and date prefixes). apply() / end-to-end (4): renames + creates stubs with correct redirect_to and redirect_id; idempotent (second run is a no-op); handles three classes in one pass; doesn't crash on id-less skipped pages. Plus 2 sanity tests for boundary slug shapes. Targeted: 22 passed. ruff format and check clean. Operational order ----------------- 1. Merge #33 (Phase 3 — UUID + redirect modules + backfill script) 2. Merge #34 (Phase 3.2 — wiki_read / wiki_rename handlers) 3. Merge this PR (Phase 4.1 — bulk-migrate script) 4. Run: python scripts/wiki_backfill_ids.py --apply python scripts/wiki_bulk_migrate.py # dry-run review python scripts/wiki_bulk_migrate.py --apply # commit moves Out of scope (follow-ups) ------------------------- * ID→path index for ID-only redirect resolution (path-based works today; id-only stubs error in wiki_read). * Phase 4.2 — file-doc re-bucket (7820 ``notes/<domain>/<id>-file-*`` pages → ``reference/<domain>/<file-slug>.md`` with provenance rewrite). Different operation (changes kind directory, rewrites frontmatter); separate script. * Phase 5 — classifier-driven cleanup for ai-generated stubs (filter not delete). * Phase 6 — producer audit (codebase_analyze emits correct provenance / lifecycle on its outputs). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Phase 3.2) Wires the Phase 3 data model (#33) into the read path and adds a new write handler that performs the rename + stub atomically. With this change ``wiki_rename old.md new.md`` produces: * ``new.md`` — the original content moved verbatim (id preserved) * ``old.md`` — a redirect stub pointing at new.md (with redirect_id = source page id, for future id-based resolution) And ``wiki_read old.md`` then returns the content of ``new.md`` along with ``redirect_chain: ["old.md", "new.md"]``. Inbound links to the old path keep working through the migration. Handler changes --------------- * ``wiki_read`` — follow redirect stubs transparently up to 5 hops. ``follow_redirects: false`` opts out (admin/migration tooling that needs to inspect the stub itself). New response field: ``redirect_chain``. * ``wiki_list`` — exclude redirect stubs from the listing by default. ``include_redirects: true`` opts in. New response field: ``redirect_count``. * ``wiki_reindex`` — drop redirect stubs from .generated/INDEX.md and surface the count by kind in the response. The index now lists only live pages, which is what readers actually want. * ``wiki_rename`` — NEW. Move a page from one path to another and leave a stub at the old path. Refuses to operate on pages without a stable frontmatter id (run ``scripts/wiki_backfill_ids.py --apply`` first), refuses to chain stubs (rename the terminal page instead), refuses to overwrite an existing destination unless ``overwrite_dest=true``. Tool registry: ``wiki_rename`` registered alongside the other 8 wiki tools. ``wiki_read`` and ``wiki_list`` MCP signatures extended with their new optional parameters. Stub semantics -------------- The stub carries ``redirect_id = <source page id>`` so future id-based resolution (which a follow-up will add for cross-rename resolution when the path itself is renamed twice) works. ``redirect_to`` is populated with the new path as the cheap path-based resolution target. Both forms are emitted; the id wins when an id-aware reader arrives. Tests ----- ``tests_py/handlers/test_wiki_redirect_handlers.py`` (NEW) — 20 tests covering every handler change: read: - returns content for a normal page (chain = []) - follows single-hop redirect - follows multi-hop chain (3 pages, 2 hops) - ``follow_redirects: false`` returns the stub itself - cycle returns error - dangling redirect returns error - missing source returns error list: - excludes stubs by default; redirect_count surfaced - ``include_redirects: true`` returns both - redirect_count is 0 when no stubs reindex: - stubs absent from INDEX.md; by_kind counts only live pages rename: - creates stub at old path with correct redirect_to, redirect_id, redirect_reason - refuses missing source - refuses source without id - refuses existing destination - ``overwrite_dest=true`` works - refuses to chain stubs - refuses same path - end-to-end: rename then read resolves to the new content - body preserved verbatim through the move Targeted suite: 86 passed (Phase 3 + Phase 3.2 surface). Broader: tests_py/core/ + tests_py/shared/ + tests_py/scripts/ + relevant tests_py/handlers/ → 2075 passed. ``ruff format --check`` and ``ruff check`` clean. What still ships in a follow-up ------------------------------- * ID→path index for ID-only redirect resolution (currently only path-based chain walking works; id-only stubs return None from resolve_chain so they error in wiki_read with a clear message). * Phase 4 bulk migration script that loops wiki_rename over the 88 known pollution paths (.md.md slug bug, timestamp-slugs, path-leak titles) — gated on this PR + #33 landing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…(ADR-2244 Phase 4.1) Phase 4 of ADR-2244 — the bulk migration. This is the deterministic half: three pollution classes with mechanically computable target paths. The LLM-assisted re-classification half (the 7820 file-doc re-bucket) is a separate scope and lands in a follow-up. Targets ------- Audit 2026-05-12 found three deterministic-rename pollution classes: Pattern Audit count ──────────────────────────────────────────────────── ``*.md.md`` 58 ``*-decision-created-YYYY-MM-DDt...z.md`` 10 ``*users-cdeust-... .md`` 11+ (path-leak in slug) Live dry-run after this commit: Pollution paths detected: 70 (all currently skipped because the backfill from #33 hasn't been applied yet — the script correctly refuses to rename pages without a stable id) Script flow ----------- scripts/wiki_bulk_migrate.py 1. Walk wiki, classify each .md page by pollution pattern. 2. For each match: a. Skip redirect stubs (already moved). b. Skip pages without a frontmatter ``id`` (Phase 3 invariant). Caller is told to run ``wiki_backfill_ids.py --apply`` first. c. Compute clean target path: - .md.md → strip duplicate extension - timestamp-slug → derive slug from frontmatter title or first body heading - path-leak → same, plus reject path-shaped titles d. Record the Pollution record. 3. On --apply: call the ``wiki_rename`` handler for each item, which writes content at the new path and a redirect stub at the old one. Inbound links keep resolving. Idempotency: a second --apply finds zero pollution paths (the renames landed; their stubs are detected and skipped). Slug derivation --------------- ``_derive_clean_slug`` picks from three sources in order: 1. Frontmatter ``title`` (if non-empty and not path-shaped / timestamp-shaped / too short / synthetic ``memory-XXX``) 2. First body H1/H2 heading (same cleanness check) 3. Deterministic 6-hex-character hash of the body content prefixed with the kind (``decision-abc123`` / ``page-def456``) The hash fallback is rare — most pollution pages already have a proper ``title`` field; it's the *slug* that's broken, not the metadata. Tests ----- ``tests_py/scripts/test_wiki_bulk_migrate.py`` (NEW) — 22 tests: Detection (6): .md.md positive + negative; timestamp-slug positive + negative; path-leak positive + negative. Slug derivation (5): accepts real titles; rejects path / timestamp / too-short titles; falls back to body heading; falls back to hash. plan() (5): finds all three classes in one pass; skips pages without id; skips existing redirect stubs; proposes the correct target for timestamp-slug and path-leak (preserving numeric and date prefixes). apply() / end-to-end (4): renames + creates stubs with correct redirect_to and redirect_id; idempotent (second run is a no-op); handles three classes in one pass; doesn't crash on id-less skipped pages. Plus 2 sanity tests for boundary slug shapes. Targeted: 22 passed. ruff format and check clean. Operational order ----------------- 1. Merge #33 (Phase 3 — UUID + redirect modules + backfill script) 2. Merge #34 (Phase 3.2 — wiki_read / wiki_rename handlers) 3. Merge this PR (Phase 4.1 — bulk-migrate script) 4. Run: python scripts/wiki_backfill_ids.py --apply python scripts/wiki_bulk_migrate.py # dry-run review python scripts/wiki_bulk_migrate.py --apply # commit moves Out of scope (follow-ups) ------------------------- * ID→path index for ID-only redirect resolution (path-based works today; id-only stubs error in wiki_read). * Phase 4.2 — file-doc re-bucket (7820 ``notes/<domain>/<id>-file-*`` pages → ``reference/<domain>/<file-slug>.md`` with provenance rewrite). Different operation (changes kind directory, rewrites frontmatter); separate script. * Phase 5 — classifier-driven cleanup for ai-generated stubs (filter not delete). * Phase 6 — producer audit (codebase_analyze emits correct provenance / lifecycle on its outputs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…se 4.2) Producer-side fix #27 routed new file-doc pages to ``reference/<domain>/`` with ``provenance: auto-generated``. The existing population — 8,734 pages written under ``notes/<domain>/<id>-file-*.md`` — never got moved. This script handles that one-time migration. Operation per page ------------------ 1. Walk ``notes/<domain>/``; match the file-doc shape ``\\d+-file-...``. 2. Skip redirect stubs (already migrated). 3. Require a frontmatter ``id`` (Phase 3 invariant — run ``wiki_backfill_ids.py --apply`` first). 4. Extract the original source path from the ``file:<path>`` tag (canonical even when the on-disk filename was truncated to ``98817-file-....md``). 5. Compute target ``reference/<domain>/<file-slug>.md``. 6. Rewrite frontmatter to the modern schema: kind: reference lifecycle: seedling audience: [developer] provenance: auto-generated generator: {model: cortex-codebase-analyze, version: v1, prompt_template: file-doc-v1, generated_at: <original-created>} Plus migration trace fields (``source_file_path``, ``rebucketed_from``). The original id, title, tags, and body are preserved verbatim. 7. Write the rewritten page at the new path. 8. Replace the source with a redirect stub that carries ``redirect_to`` (path) + ``redirect_id`` (source id) so ``wiki_read`` resolves the old path through the stub transparently. The script is intentionally NOT a thin wrapper around ``wiki_rename``: that handler preserves content verbatim, whereas the file-doc re-bucket must REWRITE the frontmatter as part of the move. The stub-creation half does use ``mcp_server.core.wiki_redirect.build_redirect_stub`` for consistency with Phase 3.2. Live dry-run ------------ Detected file-doc pages: 8734 Plan: re-bucket 0 Skipped (no id): 8734 Same correct refusal as Phase 4.1 — the backfill from #33 hasn't been applied to the live wiki yet. Once ``wiki_backfill_ids.py --apply`` runs, the plan will flip to ``8734 to re-bucket``. Idempotency ----------- * Second --apply finds zero: source pages are now redirect stubs (skipped by plan()), new producers write to reference/ directly (skipped by the pattern match). * Collision handling: two notes documenting the same source file get distinct targets via a ``-<memory_id>`` suffix on the second one (rare in practice; observed 0 times on the live wiki). Tests ----- ``tests_py/scripts/test_wiki_rebucket_file_docs.py`` (NEW) — 19 tests: detection (6): - canonical file-doc shape matches; non-file-doc notes don't - file tag extracted from block-list and inline-list frontmatter - missing/empty file tag handled slug derivation (3): - separators flattened to hyphens - empty source returns empty target - empty domain falls back to ``_general`` plan (5): - finds file-doc notes, skips other notes - skips pages without id (refusal message) - skips pages without file tag - disambiguates colliding targets via memory-id suffix - skips existing redirect stubs (idempotent re-runs) apply (5): - modern frontmatter at target (kind/lifecycle/audience/ provenance/generator/source_file_path) - body preserved verbatim - redirect stub at source with correct target_path + target_id - refuses when destination already exists - idempotent (second pass = no-op) end-to-end (1): - 25 pages across 3 domains move correctly; spot-check each domain 19 passed; ruff format and check clean. Post-merge operations --------------------- After PR #36 + this PR land on main: python scripts/wiki_backfill_ids.py --apply python scripts/wiki_bulk_migrate.py --apply # Phase 4.1 — 70 paths python scripts/wiki_rebucket_file_docs.py # dry-run review python scripts/wiki_rebucket_file_docs.py --apply # Phase 4.2 — 8734 pages After all three apply runs: * notes/ drops from 92% of the wiki to ~5% (real catch-all content only) * reference/ grows to host the 8734 file docs with proper provenance * 70 + 8734 redirect stubs preserve all inbound links Out of scope (Phase 5+) ----------------------- * Phase 5 — classifier-driven cleanup for ai-generated seedlings (filter from search, do not delete; preserves the auto-gen reference pages but hides empty stubs from default views). * Phase 6 — producer audit (codebase_analyze emits the modern 4-tuple directly on new writes; would also write provenance = auto-generated + generator block on every output). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on onto main (#36) * feat(wiki): handler-layer redirect mechanics + wiki_rename (ADR-2244 Phase 3.2) Wires the Phase 3 data model (#33) into the read path and adds a new write handler that performs the rename + stub atomically. With this change ``wiki_rename old.md new.md`` produces: * ``new.md`` — the original content moved verbatim (id preserved) * ``old.md`` — a redirect stub pointing at new.md (with redirect_id = source page id, for future id-based resolution) And ``wiki_read old.md`` then returns the content of ``new.md`` along with ``redirect_chain: ["old.md", "new.md"]``. Inbound links to the old path keep working through the migration. Handler changes --------------- * ``wiki_read`` — follow redirect stubs transparently up to 5 hops. ``follow_redirects: false`` opts out (admin/migration tooling that needs to inspect the stub itself). New response field: ``redirect_chain``. * ``wiki_list`` — exclude redirect stubs from the listing by default. ``include_redirects: true`` opts in. New response field: ``redirect_count``. * ``wiki_reindex`` — drop redirect stubs from .generated/INDEX.md and surface the count by kind in the response. The index now lists only live pages, which is what readers actually want. * ``wiki_rename`` — NEW. Move a page from one path to another and leave a stub at the old path. Refuses to operate on pages without a stable frontmatter id (run ``scripts/wiki_backfill_ids.py --apply`` first), refuses to chain stubs (rename the terminal page instead), refuses to overwrite an existing destination unless ``overwrite_dest=true``. Tool registry: ``wiki_rename`` registered alongside the other 8 wiki tools. ``wiki_read`` and ``wiki_list`` MCP signatures extended with their new optional parameters. Stub semantics -------------- The stub carries ``redirect_id = <source page id>`` so future id-based resolution (which a follow-up will add for cross-rename resolution when the path itself is renamed twice) works. ``redirect_to`` is populated with the new path as the cheap path-based resolution target. Both forms are emitted; the id wins when an id-aware reader arrives. Tests ----- ``tests_py/handlers/test_wiki_redirect_handlers.py`` (NEW) — 20 tests covering every handler change: read: - returns content for a normal page (chain = []) - follows single-hop redirect - follows multi-hop chain (3 pages, 2 hops) - ``follow_redirects: false`` returns the stub itself - cycle returns error - dangling redirect returns error - missing source returns error list: - excludes stubs by default; redirect_count surfaced - ``include_redirects: true`` returns both - redirect_count is 0 when no stubs reindex: - stubs absent from INDEX.md; by_kind counts only live pages rename: - creates stub at old path with correct redirect_to, redirect_id, redirect_reason - refuses missing source - refuses source without id - refuses existing destination - ``overwrite_dest=true`` works - refuses to chain stubs - refuses same path - end-to-end: rename then read resolves to the new content - body preserved verbatim through the move Targeted suite: 86 passed (Phase 3 + Phase 3.2 surface). Broader: tests_py/core/ + tests_py/shared/ + tests_py/scripts/ + relevant tests_py/handlers/ → 2075 passed. ``ruff format --check`` and ``ruff check`` clean. What still ships in a follow-up ------------------------------- * ID→path index for ID-only redirect resolution (currently only path-based chain walking works; id-only stubs return None from resolve_chain so they error in wiki_read with a clear message). * Phase 4 bulk migration script that loops wiki_rename over the 88 known pollution paths (.md.md slug bug, timestamp-slugs, path-leak titles) — gated on this PR + #33 landing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(wiki): deterministic bulk migration for the ~88 pollution paths (ADR-2244 Phase 4.1) Phase 4 of ADR-2244 — the bulk migration. This is the deterministic half: three pollution classes with mechanically computable target paths. The LLM-assisted re-classification half (the 7820 file-doc re-bucket) is a separate scope and lands in a follow-up. Targets ------- Audit 2026-05-12 found three deterministic-rename pollution classes: Pattern Audit count ──────────────────────────────────────────────────── ``*.md.md`` 58 ``*-decision-created-YYYY-MM-DDt...z.md`` 10 ``*users-cdeust-... .md`` 11+ (path-leak in slug) Live dry-run after this commit: Pollution paths detected: 70 (all currently skipped because the backfill from #33 hasn't been applied yet — the script correctly refuses to rename pages without a stable id) Script flow ----------- scripts/wiki_bulk_migrate.py 1. Walk wiki, classify each .md page by pollution pattern. 2. For each match: a. Skip redirect stubs (already moved). b. Skip pages without a frontmatter ``id`` (Phase 3 invariant). Caller is told to run ``wiki_backfill_ids.py --apply`` first. c. Compute clean target path: - .md.md → strip duplicate extension - timestamp-slug → derive slug from frontmatter title or first body heading - path-leak → same, plus reject path-shaped titles d. Record the Pollution record. 3. On --apply: call the ``wiki_rename`` handler for each item, which writes content at the new path and a redirect stub at the old one. Inbound links keep resolving. Idempotency: a second --apply finds zero pollution paths (the renames landed; their stubs are detected and skipped). Slug derivation --------------- ``_derive_clean_slug`` picks from three sources in order: 1. Frontmatter ``title`` (if non-empty and not path-shaped / timestamp-shaped / too short / synthetic ``memory-XXX``) 2. First body H1/H2 heading (same cleanness check) 3. Deterministic 6-hex-character hash of the body content prefixed with the kind (``decision-abc123`` / ``page-def456``) The hash fallback is rare — most pollution pages already have a proper ``title`` field; it's the *slug* that's broken, not the metadata. Tests ----- ``tests_py/scripts/test_wiki_bulk_migrate.py`` (NEW) — 22 tests: Detection (6): .md.md positive + negative; timestamp-slug positive + negative; path-leak positive + negative. Slug derivation (5): accepts real titles; rejects path / timestamp / too-short titles; falls back to body heading; falls back to hash. plan() (5): finds all three classes in one pass; skips pages without id; skips existing redirect stubs; proposes the correct target for timestamp-slug and path-leak (preserving numeric and date prefixes). apply() / end-to-end (4): renames + creates stubs with correct redirect_to and redirect_id; idempotent (second run is a no-op); handles three classes in one pass; doesn't crash on id-less skipped pages. Plus 2 sanity tests for boundary slug shapes. Targeted: 22 passed. ruff format and check clean. Operational order ----------------- 1. Merge #33 (Phase 3 — UUID + redirect modules + backfill script) 2. Merge #34 (Phase 3.2 — wiki_read / wiki_rename handlers) 3. Merge this PR (Phase 4.1 — bulk-migrate script) 4. Run: python scripts/wiki_backfill_ids.py --apply python scripts/wiki_bulk_migrate.py # dry-run review python scripts/wiki_bulk_migrate.py --apply # commit moves Out of scope (follow-ups) ------------------------- * ID→path index for ID-only redirect resolution (path-based works today; id-only stubs error in wiki_read). * Phase 4.2 — file-doc re-bucket (7820 ``notes/<domain>/<id>-file-*`` pages → ``reference/<domain>/<file-slug>.md`` with provenance rewrite). Different operation (changes kind directory, rewrites frontmatter); separate script. * Phase 5 — classifier-driven cleanup for ai-generated stubs (filter not delete). * Phase 6 — producer audit (codebase_analyze emits correct provenance / lifecycle on its outputs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: bump tool count assertion 47 → 48 for new wiki_rename (ADR-2244 Phase 3.2) CI on PR #36 fails on tests_py/test_main.py:70 — the mcp_server tool count is now 48 because Phase 3.2 (#34's content, now flowing into main via this PR) registers ``wiki_rename`` as a new tool. The assertion is a hard count + membership check; both updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…se 4.2) Producer-side fix #27 routed new file-doc pages to ``reference/<domain>/`` with ``provenance: auto-generated``. The existing population — 8,734 pages written under ``notes/<domain>/<id>-file-*.md`` — never got moved. This script handles that one-time migration. Operation per page ------------------ 1. Walk ``notes/<domain>/``; match the file-doc shape ``\\d+-file-...``. 2. Skip redirect stubs (already migrated). 3. Require a frontmatter ``id`` (Phase 3 invariant — run ``wiki_backfill_ids.py --apply`` first). 4. Extract the original source path from the ``file:<path>`` tag (canonical even when the on-disk filename was truncated to ``98817-file-....md``). 5. Compute target ``reference/<domain>/<file-slug>.md``. 6. Rewrite frontmatter to the modern schema: kind: reference lifecycle: seedling audience: [developer] provenance: auto-generated generator: {model: cortex-codebase-analyze, version: v1, prompt_template: file-doc-v1, generated_at: <original-created>} Plus migration trace fields (``source_file_path``, ``rebucketed_from``). The original id, title, tags, and body are preserved verbatim. 7. Write the rewritten page at the new path. 8. Replace the source with a redirect stub that carries ``redirect_to`` (path) + ``redirect_id`` (source id) so ``wiki_read`` resolves the old path through the stub transparently. The script is intentionally NOT a thin wrapper around ``wiki_rename``: that handler preserves content verbatim, whereas the file-doc re-bucket must REWRITE the frontmatter as part of the move. The stub-creation half does use ``mcp_server.core.wiki_redirect.build_redirect_stub`` for consistency with Phase 3.2. Live dry-run ------------ Detected file-doc pages: 8734 Plan: re-bucket 0 Skipped (no id): 8734 Same correct refusal as Phase 4.1 — the backfill from #33 hasn't been applied to the live wiki yet. Once ``wiki_backfill_ids.py --apply`` runs, the plan will flip to ``8734 to re-bucket``. Idempotency ----------- * Second --apply finds zero: source pages are now redirect stubs (skipped by plan()), new producers write to reference/ directly (skipped by the pattern match). * Collision handling: two notes documenting the same source file get distinct targets via a ``-<memory_id>`` suffix on the second one (rare in practice; observed 0 times on the live wiki). Tests ----- ``tests_py/scripts/test_wiki_rebucket_file_docs.py`` (NEW) — 19 tests: detection (6): - canonical file-doc shape matches; non-file-doc notes don't - file tag extracted from block-list and inline-list frontmatter - missing/empty file tag handled slug derivation (3): - separators flattened to hyphens - empty source returns empty target - empty domain falls back to ``_general`` plan (5): - finds file-doc notes, skips other notes - skips pages without id (refusal message) - skips pages without file tag - disambiguates colliding targets via memory-id suffix - skips existing redirect stubs (idempotent re-runs) apply (5): - modern frontmatter at target (kind/lifecycle/audience/ provenance/generator/source_file_path) - body preserved verbatim - redirect stub at source with correct target_path + target_id - refuses when destination already exists - idempotent (second pass = no-op) end-to-end (1): - 25 pages across 3 domains move correctly; spot-check each domain 19 passed; ruff format and check clean. Post-merge operations --------------------- After PR #36 + this PR land on main: python scripts/wiki_backfill_ids.py --apply python scripts/wiki_bulk_migrate.py --apply # Phase 4.1 — 70 paths python scripts/wiki_rebucket_file_docs.py # dry-run review python scripts/wiki_rebucket_file_docs.py --apply # Phase 4.2 — 8734 pages After all three apply runs: * notes/ drops from 92% of the wiki to ~5% (real catch-all content only) * reference/ grows to host the 8734 file docs with proper provenance * 70 + 8734 redirect stubs preserve all inbound links Out of scope (Phase 5+) ----------------------- * Phase 5 — classifier-driven cleanup for ai-generated seedlings (filter from search, do not delete; preserves the auto-gen reference pages but hides empty stubs from default views). * Phase 6 — producer audit (codebase_analyze emits the modern 4-tuple directly on new writes; would also write provenance = auto-generated + generator block on every output). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…se 4.2) (#37) Producer-side fix #27 routed new file-doc pages to ``reference/<domain>/`` with ``provenance: auto-generated``. The existing population — 8,734 pages written under ``notes/<domain>/<id>-file-*.md`` — never got moved. This script handles that one-time migration. Operation per page ------------------ 1. Walk ``notes/<domain>/``; match the file-doc shape ``\\d+-file-...``. 2. Skip redirect stubs (already migrated). 3. Require a frontmatter ``id`` (Phase 3 invariant — run ``wiki_backfill_ids.py --apply`` first). 4. Extract the original source path from the ``file:<path>`` tag (canonical even when the on-disk filename was truncated to ``98817-file-....md``). 5. Compute target ``reference/<domain>/<file-slug>.md``. 6. Rewrite frontmatter to the modern schema: kind: reference lifecycle: seedling audience: [developer] provenance: auto-generated generator: {model: cortex-codebase-analyze, version: v1, prompt_template: file-doc-v1, generated_at: <original-created>} Plus migration trace fields (``source_file_path``, ``rebucketed_from``). The original id, title, tags, and body are preserved verbatim. 7. Write the rewritten page at the new path. 8. Replace the source with a redirect stub that carries ``redirect_to`` (path) + ``redirect_id`` (source id) so ``wiki_read`` resolves the old path through the stub transparently. The script is intentionally NOT a thin wrapper around ``wiki_rename``: that handler preserves content verbatim, whereas the file-doc re-bucket must REWRITE the frontmatter as part of the move. The stub-creation half does use ``mcp_server.core.wiki_redirect.build_redirect_stub`` for consistency with Phase 3.2. Live dry-run ------------ Detected file-doc pages: 8734 Plan: re-bucket 0 Skipped (no id): 8734 Same correct refusal as Phase 4.1 — the backfill from #33 hasn't been applied to the live wiki yet. Once ``wiki_backfill_ids.py --apply`` runs, the plan will flip to ``8734 to re-bucket``. Idempotency ----------- * Second --apply finds zero: source pages are now redirect stubs (skipped by plan()), new producers write to reference/ directly (skipped by the pattern match). * Collision handling: two notes documenting the same source file get distinct targets via a ``-<memory_id>`` suffix on the second one (rare in practice; observed 0 times on the live wiki). Tests ----- ``tests_py/scripts/test_wiki_rebucket_file_docs.py`` (NEW) — 19 tests: detection (6): - canonical file-doc shape matches; non-file-doc notes don't - file tag extracted from block-list and inline-list frontmatter - missing/empty file tag handled slug derivation (3): - separators flattened to hyphens - empty source returns empty target - empty domain falls back to ``_general`` plan (5): - finds file-doc notes, skips other notes - skips pages without id (refusal message) - skips pages without file tag - disambiguates colliding targets via memory-id suffix - skips existing redirect stubs (idempotent re-runs) apply (5): - modern frontmatter at target (kind/lifecycle/audience/ provenance/generator/source_file_path) - body preserved verbatim - redirect stub at source with correct target_path + target_id - refuses when destination already exists - idempotent (second pass = no-op) end-to-end (1): - 25 pages across 3 domains move correctly; spot-check each domain 19 passed; ruff format and check clean. Post-merge operations --------------------- After PR #36 + this PR land on main: python scripts/wiki_backfill_ids.py --apply python scripts/wiki_bulk_migrate.py --apply # Phase 4.1 — 70 paths python scripts/wiki_rebucket_file_docs.py # dry-run review python scripts/wiki_rebucket_file_docs.py --apply # Phase 4.2 — 8734 pages After all three apply runs: * notes/ drops from 92% of the wiki to ~5% (real catch-all content only) * reference/ grows to host the 8734 file docs with proper provenance * 70 + 8734 redirect stubs preserve all inbound links Out of scope (Phase 5+) ----------------------- * Phase 5 — classifier-driven cleanup for ai-generated seedlings (filter from search, do not delete; preserves the auto-gen reference pages but hides empty stubs from default views). * Phase 6 — producer audit (codebase_analyze emits the modern 4-tuple directly on new writes; would also write provenance = auto-generated + generator block on every output). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n complete) (#41) Bundles 11 merged PRs (#30-#40) since v3.15.4 closing out the ADR-2244 wiki classification cycle: Phase 2 #31 #32 pilot migration analyzer + 1000-page verification (96.7% kind-kept, passes target) Phase 3 #33 stable page IDs (UUID4) + redirect data model + backfill CLI Phase 3.2 #34 handler-layer redirect mechanics (wiki_read follows transparently, wiki_list/wiki_reindex exclude stubs, new wiki_rename tool) Phase 4.1 #35 #36 deterministic bulk migration for the 70 known pollution paths (.md.md, timestamp-slug, path-leak) Phase 4.2 #37 file-doc re-bucket (8734 pages from notes/ to reference/ with modern frontmatter) Phase 5 #39 filter auto-generated pages from default listings; INDEX.md splits human-authored from auto-gen Phase 6 #38 producer audit — codebase_analyze output routes to kind=reference (root-causes the 8734-page misroute) Phase 6.2 #40 producer audit — wiki_seed_codebase emits modern kind tags the classifier reads Security #30 authlib CVE-2026-44681 bump (dependabot #4) Notes for users: - Wiki on disk not migrated yet. Apply scripts (in scripts/) are dry-run by default. Three commands to fully migrate; each is idempotent and leaves redirect stubs. - Phases 5/6/6.2 take effect on next MCP restart. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cdeust merged commit 4590a14 into main May 13, 2026
11 checks passed

This was referenced May 13, 2026

feat(wiki): handler-layer redirect mechanics + wiki_rename (ADR-2244 Phase 3.2) #34

Merged

feat(wiki): deterministic bulk migration for the ~88 pollution paths (ADR-2244 Phase 4.1) #35

Merged

cdeust mentioned this pull request May 13, 2026

feat(wiki): Phases 3.2 + 4.1 — handler-layer redirects + bulk migration onto main #36

Merged

4 tasks

cdeust mentioned this pull request May 13, 2026

release: v3.16.0 — ADR-2244 Phases 2-6.2 (wiki classification redesign complete) #41

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(wiki): stable page IDs + redirect stubs (ADR-2244 Phase 3 foundation)#33

feat(wiki): stable page IDs + redirect stubs (ADR-2244 Phase 3 foundation)#33
cdeust merged 1 commit into
mainfrom
feat/wiki-stable-ids-phase3

cdeust commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cdeust commented May 13, 2026

Summary

What lands here

Design choices

Backfill behaviour

Tests

How to use after merge

Out of scope for this PR (Phase 3.2 / Phase 4 follow-ups)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant