feat(wiki): ADR-2244 Phase 2 — pilot migration analyzer + first 100-page report#31
Merged
Merged
Conversation
…4 Phase 2) Adds ``scripts/wiki_pilot_migration.py`` — a read-only analyzer that walks the methodology wiki, runs each page's body through the new data-driven classifier (post-#27/#28), and produces a Markdown report showing the proposed (kind, lifecycle, audience, provenance) tuple for each page alongside its current legacy kind. Goal per ADR-2244 §5 (Migration plan, Phase 2): a human-reviewable accuracy check on ~100 representative pages before any bulk re-bucketing. Acceptance criterion is ≥ 90% kind agreement. First report ------------ ``scripts/wiki-pilot-report.md`` — deterministic 100-page stratified sample (seed 20260512). Headline numbers: Sample: 100 pages across 12 legacy kind directories Admitted: 88 (88.0%) Rejected: 12 (12.0%, all admission-gate rejections of audit artefacts and template skeletons — looks correct) Kept: 58 (65.9% of admitted, including the expected legacy → modern map: notes → explanation, etc.) Changed: 30 (34.1%) Calibration findings (the value of the pilot) --------------------------------------------- The kind-detection registry is below the 90% acceptance target. Surfaced gaps: 1. ``adr → explanation`` (3 of 8 sampled ADRs misclassified). Body uses ``## Decision`` heading-style; ADR patterns require ``Decision:`` (colon) or ``decided to``. Fix: register a kind pattern for the Nygard heading skeleton (``## Status\\n.*## Context\\n.*## Decision``). 2. ``rfc → adr`` (8 of 8 sampled rfc/ pages misrouted to adr). Pages tagged ``architecture`` match the adr tag-alias. ``architecture`` is too broad to be an adr-only signal. Fix: move ``architecture`` to a kind-agnostic tag, OR require a content-pattern match in addition to the tag for the adr route. 3. Audience inference is noisy. The word ``crypto`` in a list of Node built-in modules (ADR-001) flagged the page as ``security`` audience. Fix: tighten the security pattern to require either security-domain context words or a security tag. 4. ``adrs/`` (plural typo of ``adr``) holds 8 pages with template skeletons. The classifier admits them as ``explanation`` rather than ``adr``. Likely the right outcome — these are draft pages with ``_To be written._`` placeholders, not finished ADRs. Net: the registry's default seed needs pattern-tuning before bulk migration (Phase 4). Calibration is a registry-data edit, not a Python change — exactly what the data-driven design from #28 enables. Script details -------------- Read-only. Default stratified sample of 100 pages with seed 20260512. Pass ``--all`` to evaluate every page (~9600). Output is a deterministic Markdown report with five tables (sample distribution, proposed distribution, transition matrix, per-facet distributions, rejection reasons) plus a per-page proposal list. The frontmatter parser handles three observed patterns: scalar values, inline lists, and block lists. The first attempt at the parser missed block lists and silently dropped all tags, which made the early classifier accuracy look worse than it actually was — caught and fixed during the report-review pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 2026-05-13 pilot (scripts/wiki-pilot-report.md) revealed four calibration gaps in the registry's default seed. This commit fixes the three that mattered (the fourth was already correct behavior). Pilot accuracy: before → 65.9% kind-kept on the 100-page stratified sample after → 87.5% kind-kept (+21.6 pp) The fixes — all data edits to the registry seed, exactly the kind of calibration the data-driven design from #28 was meant to enable. 1. ADR detection now matches the Nygard heading skeleton ------------------------------------------------------- The pilot found 3 of 8 sampled ADRs misclassified as ``explanation`` because their body used ``## Decision`` heading-style without a ``Decision:`` colon. The legacy classifier's prose-only ADR pattern (``decided to`` / ``Decision:``) missed them. Two new patterns added to ``adr.patterns`` using a new ``_re_ml`` helper (re.IGNORECASE | re.MULTILINE): - ``^##+\\s*Decision\\s*$`` - ``^##+\\s*Consequences\\s*$`` After fix: 8 of 8 sampled ADRs route to ``adr`` correctly. The ``adrs/`` typo directory (8 pages) also routes to ``adr`` via the new heading detection. 2. ``architecture`` removed from ADR tag-aliases ------------------------------------------------ The pilot found 8 of 8 sampled ``rfc/`` pages misrouted to ``adr`` because they carried the ``architecture`` tag, which was registered as an adr tag-alias. ``architecture`` is too broad to be an ADR-only signal — architecture- tagged content is more often spec/rfc/explanation than a single decision. Removed from ``adr.tag_aliases``; kept ``decision`` and ``adr`` as the high-confidence signals. After fix: 8 of 8 sampled ``rfc/`` pages route to ``rfc`` correctly. 3. Security audience tightened — bare ``crypto`` no longer fires ---------------------------------------------------------------- The pilot found ADR-001 (zero dependencies) tagged ``security`` audience because its body listed ``crypto`` among Node built-in modules: ``fs, path, os, http, crypto``. The pattern ``crypto(graphy)?`` matched the bare module name. Tightened to require the full ``cryptograph(y|ic)`` word. Same applied to ``auth`` → ``authentication``/``authorization``. The ``security`` tag remains the strongest signal — pages tagged ``security`` are unambiguously security audience regardless of content pattern hits. After fix: ``security`` audience occurrences drop from 21 to 10 on the same sample, matching the pages that actually warrant the audience tag. 4. ``adrs`` (typo dir variant) added to LEGACY_KIND_TO_MODERN ------------------------------------------------------------- The user's wiki has 8 pages under ``adrs/`` (plural typo of ``adr/``). These are real ADRs the user has been writing under a slightly wrong directory name. The legacy → modern map now treats ``adrs`` as synonymous with ``adr``, so the pilot's ``kept`` metric correctly counts these as preserved rather than misclassified. Tests ----- Three new regression tests in ``tests_py/core/test_wiki_classifier.py``: - ``test_adr_detected_from_nygard_heading_skeleton`` - ``test_architecture_tag_alone_does_not_route_to_adr`` - ``test_crypto_module_name_does_not_flag_security_audience`` Each test documents the pilot finding it guards against. 85 tests pass in the targeted suite; ``ruff format --check`` and ``ruff check`` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cdeust
added a commit
that referenced
this pull request
May 13, 2026
Re-runs the pilot from #31 with the calibrated registry at a 10× larger sample to verify the calibration generalises. Headline -------- Sample size: 1000 (vs 100 in #31) Admitted: 942 (94.2%) Rejected: 58 (5.8%) — all admission-gate (template skeletons, audit artefacts), no false negatives observed Kind kept: 911 (96.7% of admitted) Kind changed: 31 (3.3% of admitted) 96.7% kept exceeds the ≥ 90% ADR-2244 §5 Phase 2 acceptance target. The classifier registry is ready for Phase 4 bulk migration. Of the 31 "changed" pages ------------------------- 11 specs → explanation — specs that aren't pre-decision RFCs; defensible re-bucketing 11 guides → explanation — guides without "how to" prose; content-dependent, defensible 5 upgrades to specific kinds where content beat the legacy dir: conventions → how-to (2) notes → runbook (1) notes → how-to (1) lessons → how-to (1) 2 adr → explanation — last remaining ADR-detection gap. Pages have neither the prose pattern nor the heading skeleton. Investigation deferred to a follow-up; well under the 10% acceptance margin. 2 README.md, architecture → explanation — unknown legacy dirs Facet distributions on admitted pages ------------------------------------- Lifecycle: 899 seedling · 43 proposed (ADR default) Audience: 935 developer · 90 ops · 48 security Provenance: 809 auto-generated · 133 human The auto-generated count is dominated by file-doc pages tagged ``codebase`` / ``code-reference``. Human-authored = ADRs, RFCs, specs, lessons. Run reproducibly ---------------- python scripts/wiki_pilot_migration.py --sample-size 1000 Seed and sampling logic unchanged from #31. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
cdeust
added a commit
that referenced
this pull request
May 13, 2026
…n complete) (#41) Bundles 11 merged PRs (#30-#40) since v3.15.4 closing out the ADR-2244 wiki classification cycle: Phase 2 #31 #32 pilot migration analyzer + 1000-page verification (96.7% kind-kept, passes target) Phase 3 #33 stable page IDs (UUID4) + redirect data model + backfill CLI Phase 3.2 #34 handler-layer redirect mechanics (wiki_read follows transparently, wiki_list/wiki_reindex exclude stubs, new wiki_rename tool) Phase 4.1 #35 #36 deterministic bulk migration for the 70 known pollution paths (.md.md, timestamp-slug, path-leak) Phase 4.2 #37 file-doc re-bucket (8734 pages from notes/ to reference/ with modern frontmatter) Phase 5 #39 filter auto-generated pages from default listings; INDEX.md splits human-authored from auto-gen Phase 6 #38 producer audit — codebase_analyze output routes to kind=reference (root-causes the 8734-page misroute) Phase 6.2 #40 producer audit — wiki_seed_codebase emits modern kind tags the classifier reads Security #30 authlib CVE-2026-44681 bump (dependabot #4) Notes for users: - Wiki on disk not migrated yet. Apply scripts (in scripts/) are dry-run by default. Three commands to fully migrate; each is idempotent and leaves redirect stubs. - Phases 5/6/6.2 take effect on next MCP restart. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
scripts/wiki_pilot_migration.py— a read-only analyzer that walks the methodology wiki, runs each page through the new data-driven classifier (post-#27/#28), and produces a Markdown report showing the proposed (kind, lifecycle, audience, provenance) tuple for each page alongside its current legacy kind.This is Phase 2 of ADR-2244 per its migration plan: a human-reviewable accuracy check on ~100 representative pages before any bulk re-bucketing. The ADR acceptance criterion is ≥ 90% kind agreement.
First report
scripts/wiki-pilot-report.md— deterministic 100-page stratified sample (seed20260512). Headline numbers:Calibration findings — the value of the pilot
The kind-detection registry is below the 90% acceptance target on this sample. Surfaced gaps:
adr → explanation(3 of 8 sampled ADRs misclassified). Body uses## Decisionheading-style; the current ADR patterns requireDecision:(with colon) ordecided to. Fix: register a kind pattern for the Nygard heading skeleton (## Status+## Context+## Decision).rfc → adr(8 of 8 sampled rfc/ pages misrouted to adr). Pages taggedarchitecturematch the adr tag-alias.architectureis too broad to be an adr-only signal. Fix: movearchitectureto a kind-agnostic tag, OR require a content-pattern match in addition to the tag for the adr route.Audience inference is noisy. The word
cryptoin a list of Node built-in modules (ADR-001) flagged the page assecurityaudience. Fix: tighten the security pattern to require either security-domain context words or an explicit security tag.adrs/(plural typo ofadr) holds 8 pages with template skeletons. The classifier admits them asexplanationrather thanadr. Likely the right outcome — these are draft pages with_To be written._placeholders, not finished ADRs.Net: the registry's default seed needs pattern-tuning before bulk migration (Phase 4). Calibration is a registry-data edit, not a Python change — exactly what the data-driven design from #28 enables.
Files changed
scripts/wiki_pilot_migration.pyscripts/wiki-pilot-report.mdTest plan
python scripts/wiki_pilot_migration.py --sample-size 100runs end-to-end against the live wiki (~9600 pages) in <2sruff formatandruff checkcleanWhat's NOT in this PR
architecturefrom adr.tag_aliases; tighten security audience pattern)How to re-run
🤖 Generated with Claude Code