feat(wiki): filter auto-generated pages from default views (ADR-2244 Phase 5)#39
Merged
Merged
Conversation
…Phase 5)
The 8734 file-doc pages produced by codebase_analyze are valuable lookup
tables but they'd dominate any default listing — that's exactly the
catch-all problem Phase 1 was supposed to solve, just shifted from
``notes/`` to ``reference/``. Phase 5 says: keep them indexed, hide
them from the default view, surface them in their own section.
Changes
-------
wiki_list (handler + tool registry)
New ``include_auto_generated: bool = False`` parameter.
Default behaviour: pages with frontmatter ``provenance:
auto-generated`` are excluded; the returned ``auto_generated_count``
surfaces the population size so callers know they exist.
Pass ``include_auto_generated=True`` to opt in.
wiki_reindex (handler)
INDEX.md gets two top-level sections now: ``Human-authored``
(the prior layout, by kind) followed by ``Auto-generated reference``
(only present when auto-gen pages exist). Both sections are still
grouped by kind underneath. Deterministic output preserved.
Shared work: ``_classify_page`` in both handlers reads frontmatter
once and returns ``(is_redirect_stub, is_auto_generated)``, so the two
filters share the same disk read on the ~9000-page wiki. Worst-case
latency observed locally: <500ms on the live wiki.
Tests
-----
``tests_py/handlers/test_wiki_redirect_handlers.py`` — 5 new tests:
list:
- auto-gen excluded by default (count + pages list correct)
- ``include_auto_generated=True`` returns both
- both filters compose correctly (redirect + auto-gen counted
separately, not double-counted, hidden by default)
- fast-path when both filters are disabled (no per-page reads)
reindex:
- INDEX.md has ``Human-authored`` section first, ``Auto-generated
reference`` section second; counts in the response payload
reflect the split
25 passed in the targeted suite; 2060 passed in tests_py/core/ +
tests_py/shared/ + relevant tests_py/handlers/. ``ruff format`` and
``ruff check`` clean.
Out of scope (Phase 6+ follow-ups)
----------------------------------
* wiki_search / wiki_view / wiki_export learning the same filter
(lower urgency: the dominance problem is in listings and the
reader's first impression).
* Search-time relevance shaping: even with the filter off, auto-gen
pages might warrant a lower default boost in any future relevance
scorer. Out of scope here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
cdeust
added a commit
that referenced
this pull request
May 13, 2026
…n complete) (#41) Bundles 11 merged PRs (#30-#40) since v3.15.4 closing out the ADR-2244 wiki classification cycle: Phase 2 #31 #32 pilot migration analyzer + 1000-page verification (96.7% kind-kept, passes target) Phase 3 #33 stable page IDs (UUID4) + redirect data model + backfill CLI Phase 3.2 #34 handler-layer redirect mechanics (wiki_read follows transparently, wiki_list/wiki_reindex exclude stubs, new wiki_rename tool) Phase 4.1 #35 #36 deterministic bulk migration for the 70 known pollution paths (.md.md, timestamp-slug, path-leak) Phase 4.2 #37 file-doc re-bucket (8734 pages from notes/ to reference/ with modern frontmatter) Phase 5 #39 filter auto-generated pages from default listings; INDEX.md splits human-authored from auto-gen Phase 6 #38 producer audit — codebase_analyze output routes to kind=reference (root-causes the 8734-page misroute) Phase 6.2 #40 producer audit — wiki_seed_codebase emits modern kind tags the classifier reads Security #30 authlib CVE-2026-44681 bump (dependabot #4) Notes for users: - Wiki on disk not migrated yet. Apply scripts (in scripts/) are dry-run by default. Three commands to fully migrate; each is idempotent and leaves redirect stubs. - Phases 5/6/6.2 take effect on next MCP restart. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 5 of ADR-2244 — filter, don't delete for auto-generated pages. The 8,734 file-doc pages produced by
codebase_analyzeare valuable lookup tables but they'd dominate any default listing — exactly the catch-all problem Phase 1 was supposed to solve, just shifted fromnotes/toreference/.Phase 5 says: keep them indexed, hide them from the default view, surface them in their own section.
Changes
wiki_list(handler + tool registry)New
include_auto_generated: bool = Falseparameter. Default behaviour: pages with frontmatterprovenance: auto-generatedare excluded; response carriesauto_generated_countso callers know they exist. Passinclude_auto_generated=Trueto opt in.wiki_reindex(handler)INDEX.md gets two top-level sections:
Both sections still grouped by kind. Deterministic output preserved.
Shared work
_classify_page(in both handlers) reads frontmatter once per page and returns(is_redirect_stub, is_auto_generated). The two filters share the same disk read on the ~9,000-page wiki. Worst-case latency observed locally: <500ms on the live wiki.Tests
tests_py/handlers/test_wiki_redirect_handlers.py— 5 new tests added (25 total in file now):wiki_listwiki_listinclude_auto_generated=Truereturns bothwiki_listwiki_listwiki_reindexpytest tests_py/handlers/test_wiki_redirect_handlers.py— 25 passedpytest tests_py/core/ tests_py/shared/ tests_py/handlers/test_wiki_redirect_handlers.py tests_py/handlers/test_wiki_sync_errors.py— 2060 passedruff format --checkandruff checkcleanOut of scope (follow-ups)
wiki_search/wiki_view/wiki_exportlearning the same filter (lower urgency — listing dominance was the primary user complaint).🤖 Generated with Claude Code