Skip to content

feat(wiki): handler-layer redirect mechanics + wiki_rename (ADR-2244 Phase 3.2)#34

Merged
cdeust merged 1 commit into
feat/wiki-stable-ids-phase3from
feat/wiki-redirect-handlers-phase3.2
May 13, 2026
Merged

feat(wiki): handler-layer redirect mechanics + wiki_rename (ADR-2244 Phase 3.2)#34
cdeust merged 1 commit into
feat/wiki-stable-ids-phase3from
feat/wiki-redirect-handlers-phase3.2

Conversation

@cdeust
Copy link
Copy Markdown
Owner

@cdeust cdeust commented May 13, 2026

Summary

Wires the Phase 3 data model from #33 into the read path and adds a new write handler that performs rename + redirect stub atomically. This is the working machinery that the Phase 4 bulk-rename script will sit on top of.

Depends on #33 (the wiki_identity + wiki_redirect modules + UUID backfill).

End-to-end behaviour after this PR:

wiki_rename old.md new.md
  ⇒ new.md  carries the original content (id preserved)
  ⇒ old.md  becomes a redirect stub pointing at new.md

wiki_read old.md
  ⇒ returns the content of new.md
  ⇒ redirect_chain: ["old.md", "new.md"]

Inbound links to old paths keep working through the migration.

Handler changes

wiki_read

Follows redirect stubs transparently up to 5 hops. New optional follow_redirects: false returns the stub itself (for admin / migration tooling). Response gains redirect_chain field.

wiki_list

Excludes stubs from the listing by default. New include_redirects: true opts in. Response gains redirect_count.

wiki_reindex

Drops stubs from .generated/INDEX.md and surfaces the redirect count by kind in the response. The index now lists only live pages.

wiki_rename (NEW)

Moves a page from one path to another and leaves a redirect stub at the old path. Refuses to:

  • Operate on pages without a stable frontmatter id (run scripts/wiki_backfill_ids.py --apply first)
  • Chain stubs (rename the terminal page instead)
  • Overwrite an existing destination unless overwrite_dest=true
  • Rename to the same path

The stub carries both redirect_to (path) and redirect_id (source page id) so future ID-based resolution will keep working even when the destination is itself renamed.

Tests

tests_py/handlers/test_wiki_redirect_handlers.py (NEW) — 20 tests, all green:

Handler Tests
wiki_read (transparent follow + opt-out + cycle/dangling/missing) 7
wiki_list (default exclude, opt-in include, zero-count case) 3
wiki_reindex (INDEX.md skips stubs, redirect_count surfaced) 1
wiki_rename (happy path, all refusal cases, end-to-end with wiki_read, body verbatim preserved) 9

Test plan

  • pytest tests_py/handlers/test_wiki_redirect_handlers.py — 20 passed
  • pytest tests_py/core/ tests_py/shared/ tests_py/scripts/ tests_py/handlers/test_wiki_sync_errors.py tests_py/handlers/test_wiki_redirect_handlers.py2075 passed
  • ruff format --check and ruff check clean
  • CI on this PR
  • After feat(wiki): stable page IDs + redirect stubs (ADR-2244 Phase 3 foundation) #33 + this merge: run wiki_backfill_ids.py --apply then wiki_rename against the live wiki

Out of scope (follow-ups)

  • ID→path index — currently only path-based chain resolution works. ID-only stubs (redirect_id without redirect_to) return None from resolve_chain, which surfaces as a clear error in wiki_read. The index is small additional work and lands when bulk migration needs it.
  • Phase 4 bulk migration script — loops wiki_rename over the ~88 known pollution paths (.md.md slug bug, timestamp-slug ADRs, path-leak slugs). Gated on this PR + feat(wiki): stable page IDs + redirect stubs (ADR-2244 Phase 3 foundation) #33 landing.

🤖 Generated with Claude Code

…Phase 3.2)

Wires the Phase 3 data model (#33) into the read path and adds a new
write handler that performs the rename + stub atomically. With this
change ``wiki_rename old.md new.md`` produces:

  * ``new.md``  — the original content moved verbatim (id preserved)
  * ``old.md``  — a redirect stub pointing at new.md (with redirect_id
                  = source page id, for future id-based resolution)

And ``wiki_read old.md`` then returns the content of ``new.md`` along
with ``redirect_chain: ["old.md", "new.md"]``. Inbound links to the
old path keep working through the migration.

Handler changes
---------------

* ``wiki_read``  — follow redirect stubs transparently up to 5 hops.
                   ``follow_redirects: false`` opts out (admin/migration
                   tooling that needs to inspect the stub itself).
                   New response field: ``redirect_chain``.

* ``wiki_list``  — exclude redirect stubs from the listing by default.
                   ``include_redirects: true`` opts in. New response
                   field: ``redirect_count``.

* ``wiki_reindex`` — drop redirect stubs from .generated/INDEX.md and
                     surface the count by kind in the response. The
                     index now lists only live pages, which is what
                     readers actually want.

* ``wiki_rename``  — NEW. Move a page from one path to another and
                     leave a stub at the old path. Refuses to operate
                     on pages without a stable frontmatter id (run
                     ``scripts/wiki_backfill_ids.py --apply`` first),
                     refuses to chain stubs (rename the terminal page
                     instead), refuses to overwrite an existing
                     destination unless ``overwrite_dest=true``.

Tool registry: ``wiki_rename`` registered alongside the other 8 wiki
tools. ``wiki_read`` and ``wiki_list`` MCP signatures extended with
their new optional parameters.

Stub semantics
--------------

The stub carries ``redirect_id = <source page id>`` so future id-based
resolution (which a follow-up will add for cross-rename resolution
when the path itself is renamed twice) works. ``redirect_to`` is
populated with the new path as the cheap path-based resolution
target. Both forms are emitted; the id wins when an id-aware reader
arrives.

Tests
-----

``tests_py/handlers/test_wiki_redirect_handlers.py`` (NEW) — 20 tests
covering every handler change:

  read:
    - returns content for a normal page (chain = [])
    - follows single-hop redirect
    - follows multi-hop chain (3 pages, 2 hops)
    - ``follow_redirects: false`` returns the stub itself
    - cycle returns error
    - dangling redirect returns error
    - missing source returns error

  list:
    - excludes stubs by default; redirect_count surfaced
    - ``include_redirects: true`` returns both
    - redirect_count is 0 when no stubs

  reindex:
    - stubs absent from INDEX.md; by_kind counts only live pages

  rename:
    - creates stub at old path with correct redirect_to, redirect_id,
      redirect_reason
    - refuses missing source
    - refuses source without id
    - refuses existing destination
    - ``overwrite_dest=true`` works
    - refuses to chain stubs
    - refuses same path
    - end-to-end: rename then read resolves to the new content
    - body preserved verbatim through the move

Targeted suite: 86 passed (Phase 3 + Phase 3.2 surface).
Broader: tests_py/core/ + tests_py/shared/ + tests_py/scripts/ +
relevant tests_py/handlers/ → 2075 passed.
``ruff format --check`` and ``ruff check`` clean.

What still ships in a follow-up
-------------------------------

  * ID→path index for ID-only redirect resolution (currently only
    path-based chain walking works; id-only stubs return None from
    resolve_chain so they error in wiki_read with a clear message).
  * Phase 4 bulk migration script that loops wiki_rename over the 88
    known pollution paths (.md.md slug bug, timestamp-slugs, path-leak
    titles) — gated on this PR + #33 landing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cdeust cdeust merged commit c8205cf into feat/wiki-stable-ids-phase3 May 13, 2026
cdeust added a commit that referenced this pull request May 13, 2026
…(ADR-2244 Phase 4.1) (#35)

Phase 4 of ADR-2244 — the bulk migration. This is the deterministic
half: three pollution classes with mechanically computable target
paths. The LLM-assisted re-classification half (the 7820 file-doc
re-bucket) is a separate scope and lands in a follow-up.

Targets
-------

Audit 2026-05-12 found three deterministic-rename pollution classes:

  Pattern                                  Audit count
  ────────────────────────────────────────────────────
  ``*.md.md``                              58
  ``*-decision-created-YYYY-MM-DDt...z.md``  10
  ``*users-cdeust-... .md``                 11+ (path-leak in slug)

Live dry-run after this commit:
  Pollution paths detected: 70  (all currently skipped because the
                                 backfill from #33 hasn't been applied
                                 yet — the script correctly refuses
                                 to rename pages without a stable id)

Script flow
-----------

  scripts/wiki_bulk_migrate.py

  1. Walk wiki, classify each .md page by pollution pattern.
  2. For each match:
     a. Skip redirect stubs (already moved).
     b. Skip pages without a frontmatter ``id`` (Phase 3 invariant).
        Caller is told to run ``wiki_backfill_ids.py --apply`` first.
     c. Compute clean target path:
          - .md.md             → strip duplicate extension
          - timestamp-slug     → derive slug from frontmatter title
                                 or first body heading
          - path-leak          → same, plus reject path-shaped titles
     d. Record the Pollution record.
  3. On --apply: call the ``wiki_rename`` handler for each item, which
     writes content at the new path and a redirect stub at the old
     one. Inbound links keep resolving.

Idempotency: a second --apply finds zero pollution paths (the
renames landed; their stubs are detected and skipped).

Slug derivation
---------------

``_derive_clean_slug`` picks from three sources in order:

  1. Frontmatter ``title`` (if non-empty and not path-shaped /
     timestamp-shaped / too short / synthetic ``memory-XXX``)
  2. First body H1/H2 heading (same cleanness check)
  3. Deterministic 6-hex-character hash of the body content
     prefixed with the kind (``decision-abc123`` / ``page-def456``)

The hash fallback is rare — most pollution pages already have a
proper ``title`` field; it's the *slug* that's broken, not the
metadata.

Tests
-----

``tests_py/scripts/test_wiki_bulk_migrate.py`` (NEW) — 22 tests:

  Detection (6):
    .md.md positive + negative; timestamp-slug positive + negative;
    path-leak positive + negative.

  Slug derivation (5):
    accepts real titles; rejects path / timestamp / too-short titles;
    falls back to body heading; falls back to hash.

  plan() (5):
    finds all three classes in one pass; skips pages without id;
    skips existing redirect stubs; proposes the correct target for
    timestamp-slug and path-leak (preserving numeric and date prefixes).

  apply() / end-to-end (4):
    renames + creates stubs with correct redirect_to and redirect_id;
    idempotent (second run is a no-op); handles three classes in one
    pass; doesn't crash on id-less skipped pages.

  Plus 2 sanity tests for boundary slug shapes.

Targeted: 22 passed. ruff format and check clean.

Operational order
-----------------

  1. Merge #33 (Phase 3 — UUID + redirect modules + backfill script)
  2. Merge #34 (Phase 3.2 — wiki_read / wiki_rename handlers)
  3. Merge this PR (Phase 4.1 — bulk-migrate script)
  4. Run:
       python scripts/wiki_backfill_ids.py --apply
       python scripts/wiki_bulk_migrate.py                # dry-run review
       python scripts/wiki_bulk_migrate.py --apply        # commit moves

Out of scope (follow-ups)
-------------------------

  * ID→path index for ID-only redirect resolution (path-based works
    today; id-only stubs error in wiki_read).
  * Phase 4.2 — file-doc re-bucket (7820 ``notes/<domain>/<id>-file-*``
    pages → ``reference/<domain>/<file-slug>.md`` with provenance
    rewrite). Different operation (changes kind directory, rewrites
    frontmatter); separate script.
  * Phase 5 — classifier-driven cleanup for ai-generated stubs
    (filter not delete).
  * Phase 6 — producer audit (codebase_analyze emits correct
    provenance / lifecycle on its outputs).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cdeust added a commit that referenced this pull request May 13, 2026
…(ADR-2244 Phase 4.1)

Phase 4 of ADR-2244 — the bulk migration. This is the deterministic
half: three pollution classes with mechanically computable target
paths. The LLM-assisted re-classification half (the 7820 file-doc
re-bucket) is a separate scope and lands in a follow-up.

Targets
-------

Audit 2026-05-12 found three deterministic-rename pollution classes:

  Pattern                                  Audit count
  ────────────────────────────────────────────────────
  ``*.md.md``                              58
  ``*-decision-created-YYYY-MM-DDt...z.md``  10
  ``*users-cdeust-... .md``                 11+ (path-leak in slug)

Live dry-run after this commit:
  Pollution paths detected: 70  (all currently skipped because the
                                 backfill from #33 hasn't been applied
                                 yet — the script correctly refuses
                                 to rename pages without a stable id)

Script flow
-----------

  scripts/wiki_bulk_migrate.py

  1. Walk wiki, classify each .md page by pollution pattern.
  2. For each match:
     a. Skip redirect stubs (already moved).
     b. Skip pages without a frontmatter ``id`` (Phase 3 invariant).
        Caller is told to run ``wiki_backfill_ids.py --apply`` first.
     c. Compute clean target path:
          - .md.md             → strip duplicate extension
          - timestamp-slug     → derive slug from frontmatter title
                                 or first body heading
          - path-leak          → same, plus reject path-shaped titles
     d. Record the Pollution record.
  3. On --apply: call the ``wiki_rename`` handler for each item, which
     writes content at the new path and a redirect stub at the old
     one. Inbound links keep resolving.

Idempotency: a second --apply finds zero pollution paths (the
renames landed; their stubs are detected and skipped).

Slug derivation
---------------

``_derive_clean_slug`` picks from three sources in order:

  1. Frontmatter ``title`` (if non-empty and not path-shaped /
     timestamp-shaped / too short / synthetic ``memory-XXX``)
  2. First body H1/H2 heading (same cleanness check)
  3. Deterministic 6-hex-character hash of the body content
     prefixed with the kind (``decision-abc123`` / ``page-def456``)

The hash fallback is rare — most pollution pages already have a
proper ``title`` field; it's the *slug* that's broken, not the
metadata.

Tests
-----

``tests_py/scripts/test_wiki_bulk_migrate.py`` (NEW) — 22 tests:

  Detection (6):
    .md.md positive + negative; timestamp-slug positive + negative;
    path-leak positive + negative.

  Slug derivation (5):
    accepts real titles; rejects path / timestamp / too-short titles;
    falls back to body heading; falls back to hash.

  plan() (5):
    finds all three classes in one pass; skips pages without id;
    skips existing redirect stubs; proposes the correct target for
    timestamp-slug and path-leak (preserving numeric and date prefixes).

  apply() / end-to-end (4):
    renames + creates stubs with correct redirect_to and redirect_id;
    idempotent (second run is a no-op); handles three classes in one
    pass; doesn't crash on id-less skipped pages.

  Plus 2 sanity tests for boundary slug shapes.

Targeted: 22 passed. ruff format and check clean.

Operational order
-----------------

  1. Merge #33 (Phase 3 — UUID + redirect modules + backfill script)
  2. Merge #34 (Phase 3.2 — wiki_read / wiki_rename handlers)
  3. Merge this PR (Phase 4.1 — bulk-migrate script)
  4. Run:
       python scripts/wiki_backfill_ids.py --apply
       python scripts/wiki_bulk_migrate.py                # dry-run review
       python scripts/wiki_bulk_migrate.py --apply        # commit moves

Out of scope (follow-ups)
-------------------------

  * ID→path index for ID-only redirect resolution (path-based works
    today; id-only stubs error in wiki_read).
  * Phase 4.2 — file-doc re-bucket (7820 ``notes/<domain>/<id>-file-*``
    pages → ``reference/<domain>/<file-slug>.md`` with provenance
    rewrite). Different operation (changes kind directory, rewrites
    frontmatter); separate script.
  * Phase 5 — classifier-driven cleanup for ai-generated stubs
    (filter not delete).
  * Phase 6 — producer audit (codebase_analyze emits correct
    provenance / lifecycle on its outputs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cdeust added a commit that referenced this pull request May 13, 2026
… Phase 3.2)

CI on PR #36 fails on tests_py/test_main.py:70 — the mcp_server tool
count is now 48 because Phase 3.2 (#34's content, now flowing into
main via this PR) registers ``wiki_rename`` as a new tool. The
assertion is a hard count + membership check; both updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cdeust added a commit that referenced this pull request May 13, 2026
…on onto main (#36)

* feat(wiki): handler-layer redirect mechanics + wiki_rename (ADR-2244 Phase 3.2)

Wires the Phase 3 data model (#33) into the read path and adds a new
write handler that performs the rename + stub atomically. With this
change ``wiki_rename old.md new.md`` produces:

  * ``new.md``  — the original content moved verbatim (id preserved)
  * ``old.md``  — a redirect stub pointing at new.md (with redirect_id
                  = source page id, for future id-based resolution)

And ``wiki_read old.md`` then returns the content of ``new.md`` along
with ``redirect_chain: ["old.md", "new.md"]``. Inbound links to the
old path keep working through the migration.

Handler changes
---------------

* ``wiki_read``  — follow redirect stubs transparently up to 5 hops.
                   ``follow_redirects: false`` opts out (admin/migration
                   tooling that needs to inspect the stub itself).
                   New response field: ``redirect_chain``.

* ``wiki_list``  — exclude redirect stubs from the listing by default.
                   ``include_redirects: true`` opts in. New response
                   field: ``redirect_count``.

* ``wiki_reindex`` — drop redirect stubs from .generated/INDEX.md and
                     surface the count by kind in the response. The
                     index now lists only live pages, which is what
                     readers actually want.

* ``wiki_rename``  — NEW. Move a page from one path to another and
                     leave a stub at the old path. Refuses to operate
                     on pages without a stable frontmatter id (run
                     ``scripts/wiki_backfill_ids.py --apply`` first),
                     refuses to chain stubs (rename the terminal page
                     instead), refuses to overwrite an existing
                     destination unless ``overwrite_dest=true``.

Tool registry: ``wiki_rename`` registered alongside the other 8 wiki
tools. ``wiki_read`` and ``wiki_list`` MCP signatures extended with
their new optional parameters.

Stub semantics
--------------

The stub carries ``redirect_id = <source page id>`` so future id-based
resolution (which a follow-up will add for cross-rename resolution
when the path itself is renamed twice) works. ``redirect_to`` is
populated with the new path as the cheap path-based resolution
target. Both forms are emitted; the id wins when an id-aware reader
arrives.

Tests
-----

``tests_py/handlers/test_wiki_redirect_handlers.py`` (NEW) — 20 tests
covering every handler change:

  read:
    - returns content for a normal page (chain = [])
    - follows single-hop redirect
    - follows multi-hop chain (3 pages, 2 hops)
    - ``follow_redirects: false`` returns the stub itself
    - cycle returns error
    - dangling redirect returns error
    - missing source returns error

  list:
    - excludes stubs by default; redirect_count surfaced
    - ``include_redirects: true`` returns both
    - redirect_count is 0 when no stubs

  reindex:
    - stubs absent from INDEX.md; by_kind counts only live pages

  rename:
    - creates stub at old path with correct redirect_to, redirect_id,
      redirect_reason
    - refuses missing source
    - refuses source without id
    - refuses existing destination
    - ``overwrite_dest=true`` works
    - refuses to chain stubs
    - refuses same path
    - end-to-end: rename then read resolves to the new content
    - body preserved verbatim through the move

Targeted suite: 86 passed (Phase 3 + Phase 3.2 surface).
Broader: tests_py/core/ + tests_py/shared/ + tests_py/scripts/ +
relevant tests_py/handlers/ → 2075 passed.
``ruff format --check`` and ``ruff check`` clean.

What still ships in a follow-up
-------------------------------

  * ID→path index for ID-only redirect resolution (currently only
    path-based chain walking works; id-only stubs return None from
    resolve_chain so they error in wiki_read with a clear message).
  * Phase 4 bulk migration script that loops wiki_rename over the 88
    known pollution paths (.md.md slug bug, timestamp-slugs, path-leak
    titles) — gated on this PR + #33 landing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(wiki): deterministic bulk migration for the ~88 pollution paths (ADR-2244 Phase 4.1)

Phase 4 of ADR-2244 — the bulk migration. This is the deterministic
half: three pollution classes with mechanically computable target
paths. The LLM-assisted re-classification half (the 7820 file-doc
re-bucket) is a separate scope and lands in a follow-up.

Targets
-------

Audit 2026-05-12 found three deterministic-rename pollution classes:

  Pattern                                  Audit count
  ────────────────────────────────────────────────────
  ``*.md.md``                              58
  ``*-decision-created-YYYY-MM-DDt...z.md``  10
  ``*users-cdeust-... .md``                 11+ (path-leak in slug)

Live dry-run after this commit:
  Pollution paths detected: 70  (all currently skipped because the
                                 backfill from #33 hasn't been applied
                                 yet — the script correctly refuses
                                 to rename pages without a stable id)

Script flow
-----------

  scripts/wiki_bulk_migrate.py

  1. Walk wiki, classify each .md page by pollution pattern.
  2. For each match:
     a. Skip redirect stubs (already moved).
     b. Skip pages without a frontmatter ``id`` (Phase 3 invariant).
        Caller is told to run ``wiki_backfill_ids.py --apply`` first.
     c. Compute clean target path:
          - .md.md             → strip duplicate extension
          - timestamp-slug     → derive slug from frontmatter title
                                 or first body heading
          - path-leak          → same, plus reject path-shaped titles
     d. Record the Pollution record.
  3. On --apply: call the ``wiki_rename`` handler for each item, which
     writes content at the new path and a redirect stub at the old
     one. Inbound links keep resolving.

Idempotency: a second --apply finds zero pollution paths (the
renames landed; their stubs are detected and skipped).

Slug derivation
---------------

``_derive_clean_slug`` picks from three sources in order:

  1. Frontmatter ``title`` (if non-empty and not path-shaped /
     timestamp-shaped / too short / synthetic ``memory-XXX``)
  2. First body H1/H2 heading (same cleanness check)
  3. Deterministic 6-hex-character hash of the body content
     prefixed with the kind (``decision-abc123`` / ``page-def456``)

The hash fallback is rare — most pollution pages already have a
proper ``title`` field; it's the *slug* that's broken, not the
metadata.

Tests
-----

``tests_py/scripts/test_wiki_bulk_migrate.py`` (NEW) — 22 tests:

  Detection (6):
    .md.md positive + negative; timestamp-slug positive + negative;
    path-leak positive + negative.

  Slug derivation (5):
    accepts real titles; rejects path / timestamp / too-short titles;
    falls back to body heading; falls back to hash.

  plan() (5):
    finds all three classes in one pass; skips pages without id;
    skips existing redirect stubs; proposes the correct target for
    timestamp-slug and path-leak (preserving numeric and date prefixes).

  apply() / end-to-end (4):
    renames + creates stubs with correct redirect_to and redirect_id;
    idempotent (second run is a no-op); handles three classes in one
    pass; doesn't crash on id-less skipped pages.

  Plus 2 sanity tests for boundary slug shapes.

Targeted: 22 passed. ruff format and check clean.

Operational order
-----------------

  1. Merge #33 (Phase 3 — UUID + redirect modules + backfill script)
  2. Merge #34 (Phase 3.2 — wiki_read / wiki_rename handlers)
  3. Merge this PR (Phase 4.1 — bulk-migrate script)
  4. Run:
       python scripts/wiki_backfill_ids.py --apply
       python scripts/wiki_bulk_migrate.py                # dry-run review
       python scripts/wiki_bulk_migrate.py --apply        # commit moves

Out of scope (follow-ups)
-------------------------

  * ID→path index for ID-only redirect resolution (path-based works
    today; id-only stubs error in wiki_read).
  * Phase 4.2 — file-doc re-bucket (7820 ``notes/<domain>/<id>-file-*``
    pages → ``reference/<domain>/<file-slug>.md`` with provenance
    rewrite). Different operation (changes kind directory, rewrites
    frontmatter); separate script.
  * Phase 5 — classifier-driven cleanup for ai-generated stubs
    (filter not delete).
  * Phase 6 — producer audit (codebase_analyze emits correct
    provenance / lifecycle on its outputs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: bump tool count assertion 47 → 48 for new wiki_rename (ADR-2244 Phase 3.2)

CI on PR #36 fails on tests_py/test_main.py:70 — the mcp_server tool
count is now 48 because Phase 3.2 (#34's content, now flowing into
main via this PR) registers ``wiki_rename`` as a new tool. The
assertion is a hard count + membership check; both updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cdeust added a commit that referenced this pull request May 13, 2026
…n complete) (#41)

Bundles 11 merged PRs (#30-#40) since v3.15.4 closing out the
ADR-2244 wiki classification cycle:

  Phase 2     #31 #32  pilot migration analyzer + 1000-page
                       verification (96.7% kind-kept, passes target)
  Phase 3     #33      stable page IDs (UUID4) + redirect data model
                       + backfill CLI
  Phase 3.2   #34      handler-layer redirect mechanics (wiki_read
                       follows transparently, wiki_list/wiki_reindex
                       exclude stubs, new wiki_rename tool)
  Phase 4.1   #35 #36  deterministic bulk migration for the 70
                       known pollution paths (.md.md, timestamp-slug,
                       path-leak)
  Phase 4.2   #37      file-doc re-bucket (8734 pages from notes/
                       to reference/ with modern frontmatter)
  Phase 5     #39      filter auto-generated pages from default
                       listings; INDEX.md splits human-authored
                       from auto-gen
  Phase 6     #38      producer audit — codebase_analyze output
                       routes to kind=reference (root-causes the
                       8734-page misroute)
  Phase 6.2   #40      producer audit — wiki_seed_codebase emits
                       modern kind tags the classifier reads
  Security    #30      authlib CVE-2026-44681 bump (dependabot #4)

Notes for users:
  - Wiki on disk not migrated yet. Apply scripts (in scripts/) are
    dry-run by default. Three commands to fully migrate; each is
    idempotent and leaves redirect stubs.
  - Phases 5/6/6.2 take effect on next MCP restart.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cdeust added a commit that referenced this pull request May 15, 2026
Documented baseline for the conversational theme-recall architecture
at this commit point. Run details:
  * machine: M2 Pro, background QoS, no caffeinate
  * eval set: tests_py/eval/theme_eval_set.jsonl (4144 queries)
  * runtime: 20h 45min (avg 18 s/query under BG throttle)

Headline:
  R@1   = 0.49
  R@5   = 0.71
  R@10  = 0.75
  MRR   = 0.60
  subgraph_recall = 0.49

Per-source breakdown (key takeaway — architecture works where designed):
  symbol-cluster  n=1527  R@5=0.85  R@10=0.88  MRR=0.72   ✅ above 0.8
  file-basename   n=1775  R@5=0.66  R@10=0.72  MRR=0.58   needs single-anchor channel
  question        n= 712  R@5=0.55  R@10=0.59  MRR=0.42   needs anchor-free path
  kebab           n=  76  R@5=0.47  R@10=0.54  MRR=0.40
  title           n=  54  R@5=0.69  R@10=0.70  MRR=0.52

The strict-containment + lexicographic kind-tier ranking lifts
symbol-cluster R@5 above target (0.85 ≥ 0.8). The gap to aggregate
target is the OTHER query shapes: file-basename queries with one file
anchor, plus natural-language question/kebab/title queries with no
anchors at all. Both have known next steps logged as tasks #34 and
#35 — a single-anchor strict channel and an anchor-free semantic
path with title-overlap rerank.

Use this baseline as the diff target on the M4 once tasks #34/#35
land. Expected post-fix headline: R@5 ≈ 0.80, R@10 ≈ 0.85, MRR ≈ 0.75
on the same 4144-query eval.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant