feat(wiki): ADR-2244 Phase 2 — pilot migration analyzer + first 100-page report by cdeust · Pull Request #31 · cdeust/Cortex

cdeust · 2026-05-13T07:46:12Z

Summary

Adds scripts/wiki_pilot_migration.py — a read-only analyzer that walks the methodology wiki, runs each page through the new data-driven classifier (post-#27/#28), and produces a Markdown report showing the proposed (kind, lifecycle, audience, provenance) tuple for each page alongside its current legacy kind.

This is Phase 2 of ADR-2244 per its migration plan: a human-reviewable accuracy check on ~100 representative pages before any bulk re-bucketing. The ADR acceptance criterion is ≥ 90% kind agreement.

First report

scripts/wiki-pilot-report.md — deterministic 100-page stratified sample (seed 20260512). Headline numbers:

Metric	Value
Sample size	100 pages across 12 legacy kind directories
Admitted	88 (88.0%)
Rejected (admission gate)	12 (12.0%) — all admission-gate rejections of audit artefacts and template skeletons; looks correct
Kind kept after legacy → modern normalization	58 (65.9% of admitted)
Kind changed	30 (34.1%)

Calibration findings — the value of the pilot

The kind-detection registry is below the 90% acceptance target on this sample. Surfaced gaps:

adr → explanation (3 of 8 sampled ADRs misclassified). Body uses ## Decision heading-style; the current ADR patterns require Decision: (with colon) or decided to. Fix: register a kind pattern for the Nygard heading skeleton (## Status + ## Context + ## Decision).
rfc → adr (8 of 8 sampled rfc/ pages misrouted to adr). Pages tagged architecture match the adr tag-alias. architecture is too broad to be an adr-only signal. Fix: move architecture to a kind-agnostic tag, OR require a content-pattern match in addition to the tag for the adr route.
Audience inference is noisy. The word crypto in a list of Node built-in modules (ADR-001) flagged the page as security audience. Fix: tighten the security pattern to require either security-domain context words or an explicit security tag.
adrs/ (plural typo of adr) holds 8 pages with template skeletons. The classifier admits them as explanation rather than adr. Likely the right outcome — these are draft pages with _To be written._ placeholders, not finished ADRs.

Net: the registry's default seed needs pattern-tuning before bulk migration (Phase 4). Calibration is a registry-data edit, not a Python change — exactly what the data-driven design from #28 enables.

Files changed

File	Change
`scripts/wiki_pilot_migration.py`	NEW — 350-line analyzer with frontmatter parser (scalar/inline-list/block-list), stratified sampling, classification dispatch, Markdown report writer
`scripts/wiki-pilot-report.md`	NEW — committed reference run (100 pages, seed 20260512)

Test plan

python scripts/wiki_pilot_migration.py --sample-size 100 runs end-to-end against the live wiki (~9600 pages) in <2s
Frontmatter parser handles all three observed shapes (the first attempt missed block lists and silently dropped tags — caught during report review and fixed)
ruff format and ruff check clean

What's NOT in this PR

The calibration tuning to hit the 90% target (separate follow-up — add registry default patterns for ADR Nygard heading structure; loosen architecture from adr.tag_aliases; tighten security audience pattern)
The bulk migration (Phase 4) — gated on the pilot accuracy hitting target
Stable IDs (Phase 3) — separate work

How to re-run

# Default: 100-page stratified sample, deterministic
python scripts/wiki_pilot_migration.py

# Full wiki (~9600 pages)
python scripts/wiki_pilot_migration.py --all --out scripts/wiki-pilot-full.md

# Custom seed
python scripts/wiki_pilot_migration.py --seed 42

🤖 Generated with Claude Code

…4 Phase 2) Adds ``scripts/wiki_pilot_migration.py`` — a read-only analyzer that walks the methodology wiki, runs each page's body through the new data-driven classifier (post-#27/#28), and produces a Markdown report showing the proposed (kind, lifecycle, audience, provenance) tuple for each page alongside its current legacy kind. Goal per ADR-2244 §5 (Migration plan, Phase 2): a human-reviewable accuracy check on ~100 representative pages before any bulk re-bucketing. Acceptance criterion is ≥ 90% kind agreement. First report ------------ ``scripts/wiki-pilot-report.md`` — deterministic 100-page stratified sample (seed 20260512). Headline numbers: Sample: 100 pages across 12 legacy kind directories Admitted: 88 (88.0%) Rejected: 12 (12.0%, all admission-gate rejections of audit artefacts and template skeletons — looks correct) Kept: 58 (65.9% of admitted, including the expected legacy → modern map: notes → explanation, etc.) Changed: 30 (34.1%) Calibration findings (the value of the pilot) --------------------------------------------- The kind-detection registry is below the 90% acceptance target. Surfaced gaps: 1. ``adr → explanation`` (3 of 8 sampled ADRs misclassified). Body uses ``## Decision`` heading-style; ADR patterns require ``Decision:`` (colon) or ``decided to``. Fix: register a kind pattern for the Nygard heading skeleton (``## Status\\n.*## Context\\n.*## Decision``). 2. ``rfc → adr`` (8 of 8 sampled rfc/ pages misrouted to adr). Pages tagged ``architecture`` match the adr tag-alias. ``architecture`` is too broad to be an adr-only signal. Fix: move ``architecture`` to a kind-agnostic tag, OR require a content-pattern match in addition to the tag for the adr route. 3. Audience inference is noisy. The word ``crypto`` in a list of Node built-in modules (ADR-001) flagged the page as ``security`` audience. Fix: tighten the security pattern to require either security-domain context words or a security tag. 4. ``adrs/`` (plural typo of ``adr``) holds 8 pages with template skeletons. The classifier admits them as ``explanation`` rather than ``adr``. Likely the right outcome — these are draft pages with ``_To be written._`` placeholders, not finished ADRs. Net: the registry's default seed needs pattern-tuning before bulk migration (Phase 4). Calibration is a registry-data edit, not a Python change — exactly what the data-driven design from #28 enables. Script details -------------- Read-only. Default stratified sample of 100 pages with seed 20260512. Pass ``--all`` to evaluate every page (~9600). Output is a deterministic Markdown report with five tables (sample distribution, proposed distribution, transition matrix, per-facet distributions, rejection reasons) plus a per-page proposal list. The frontmatter parser handles three observed patterns: scalar values, inline lists, and block lists. The first attempt at the parser missed block lists and silently dropped all tags, which made the early classifier accuracy look worse than it actually was — caught and fixed during the report-review pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 2026-05-13 pilot (scripts/wiki-pilot-report.md) revealed four calibration gaps in the registry's default seed. This commit fixes the three that mattered (the fourth was already correct behavior). Pilot accuracy: before → 65.9% kind-kept on the 100-page stratified sample after → 87.5% kind-kept (+21.6 pp) The fixes — all data edits to the registry seed, exactly the kind of calibration the data-driven design from #28 was meant to enable. 1. ADR detection now matches the Nygard heading skeleton ------------------------------------------------------- The pilot found 3 of 8 sampled ADRs misclassified as ``explanation`` because their body used ``## Decision`` heading-style without a ``Decision:`` colon. The legacy classifier's prose-only ADR pattern (``decided to`` / ``Decision:``) missed them. Two new patterns added to ``adr.patterns`` using a new ``_re_ml`` helper (re.IGNORECASE | re.MULTILINE): - ``^##+\\s*Decision\\s*$`` - ``^##+\\s*Consequences\\s*$`` After fix: 8 of 8 sampled ADRs route to ``adr`` correctly. The ``adrs/`` typo directory (8 pages) also routes to ``adr`` via the new heading detection. 2. ``architecture`` removed from ADR tag-aliases ------------------------------------------------ The pilot found 8 of 8 sampled ``rfc/`` pages misrouted to ``adr`` because they carried the ``architecture`` tag, which was registered as an adr tag-alias. ``architecture`` is too broad to be an ADR-only signal — architecture- tagged content is more often spec/rfc/explanation than a single decision. Removed from ``adr.tag_aliases``; kept ``decision`` and ``adr`` as the high-confidence signals. After fix: 8 of 8 sampled ``rfc/`` pages route to ``rfc`` correctly. 3. Security audience tightened — bare ``crypto`` no longer fires ---------------------------------------------------------------- The pilot found ADR-001 (zero dependencies) tagged ``security`` audience because its body listed ``crypto`` among Node built-in modules: ``fs, path, os, http, crypto``. The pattern ``crypto(graphy)?`` matched the bare module name. Tightened to require the full ``cryptograph(y|ic)`` word. Same applied to ``auth`` → ``authentication``/``authorization``. The ``security`` tag remains the strongest signal — pages tagged ``security`` are unambiguously security audience regardless of content pattern hits. After fix: ``security`` audience occurrences drop from 21 to 10 on the same sample, matching the pages that actually warrant the audience tag. 4. ``adrs`` (typo dir variant) added to LEGACY_KIND_TO_MODERN ------------------------------------------------------------- The user's wiki has 8 pages under ``adrs/`` (plural typo of ``adr/``). These are real ADRs the user has been writing under a slightly wrong directory name. The legacy → modern map now treats ``adrs`` as synonymous with ``adr``, so the pilot's ``kept`` metric correctly counts these as preserved rather than misclassified. Tests ----- Three new regression tests in ``tests_py/core/test_wiki_classifier.py``: - ``test_adr_detected_from_nygard_heading_skeleton`` - ``test_architecture_tag_alone_does_not_route_to_adr`` - ``test_crypto_module_name_does_not_flag_security_audience`` Each test documents the pilot finding it guards against. 85 tests pass in the targeted suite; ``ruff format --check`` and ``ruff check`` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Re-runs the pilot from #31 with the calibrated registry at a 10× larger sample to verify the calibration generalises. Headline -------- Sample size: 1000 (vs 100 in #31) Admitted: 942 (94.2%) Rejected: 58 (5.8%) — all admission-gate (template skeletons, audit artefacts), no false negatives observed Kind kept: 911 (96.7% of admitted) Kind changed: 31 (3.3% of admitted) 96.7% kept exceeds the ≥ 90% ADR-2244 §5 Phase 2 acceptance target. The classifier registry is ready for Phase 4 bulk migration. Of the 31 "changed" pages ------------------------- 11 specs → explanation — specs that aren't pre-decision RFCs; defensible re-bucketing 11 guides → explanation — guides without "how to" prose; content-dependent, defensible 5 upgrades to specific kinds where content beat the legacy dir: conventions → how-to (2) notes → runbook (1) notes → how-to (1) lessons → how-to (1) 2 adr → explanation — last remaining ADR-detection gap. Pages have neither the prose pattern nor the heading skeleton. Investigation deferred to a follow-up; well under the 10% acceptance margin. 2 README.md, architecture → explanation — unknown legacy dirs Facet distributions on admitted pages ------------------------------------- Lifecycle: 899 seedling · 43 proposed (ADR default) Audience: 935 developer · 90 ops · 48 security Provenance: 809 auto-generated · 133 human The auto-generated count is dominated by file-doc pages tagged ``codebase`` / ``code-reference``. Human-authored = ADRs, RFCs, specs, lessons. Run reproducibly ---------------- python scripts/wiki_pilot_migration.py --sample-size 1000 Seed and sampling logic unchanged from #31. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n complete) (#41) Bundles 11 merged PRs (#30-#40) since v3.15.4 closing out the ADR-2244 wiki classification cycle: Phase 2 #31 #32 pilot migration analyzer + 1000-page verification (96.7% kind-kept, passes target) Phase 3 #33 stable page IDs (UUID4) + redirect data model + backfill CLI Phase 3.2 #34 handler-layer redirect mechanics (wiki_read follows transparently, wiki_list/wiki_reindex exclude stubs, new wiki_rename tool) Phase 4.1 #35 #36 deterministic bulk migration for the 70 known pollution paths (.md.md, timestamp-slug, path-leak) Phase 4.2 #37 file-doc re-bucket (8734 pages from notes/ to reference/ with modern frontmatter) Phase 5 #39 filter auto-generated pages from default listings; INDEX.md splits human-authored from auto-gen Phase 6 #38 producer audit — codebase_analyze output routes to kind=reference (root-causes the 8734-page misroute) Phase 6.2 #40 producer audit — wiki_seed_codebase emits modern kind tags the classifier reads Security #30 authlib CVE-2026-44681 bump (dependabot #4) Notes for users: - Wiki on disk not migrated yet. Apply scripts (in scripts/) are dry-run by default. Three commands to fully migrate; each is idempotent and leaves redirect stubs. - Phases 5/6/6.2 take effect on next MCP restart. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cdeust and others added 2 commits May 13, 2026 09:45

cdeust merged commit 0587eb0 into main May 13, 2026
11 checks passed

cdeust mentioned this pull request May 13, 2026

docs(wiki): verify Phase 2 pilot on 1000-page sample (96.7% kept — passes ADR-2244 target) #32

Merged

cdeust deleted the feat/wiki-pilot-migration-script branch May 13, 2026 09:25

cdeust mentioned this pull request May 13, 2026

release: v3.16.0 — ADR-2244 Phases 2-6.2 (wiki classification redesign complete) #41

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(wiki): ADR-2244 Phase 2 — pilot migration analyzer + first 100-page report#31

feat(wiki): ADR-2244 Phase 2 — pilot migration analyzer + first 100-page report#31
cdeust merged 2 commits into
mainfrom
feat/wiki-pilot-migration-script

cdeust commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cdeust commented May 13, 2026

Summary

First report

Calibration findings — the value of the pilot

Files changed

Test plan

What's NOT in this PR

How to re-run

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant