Skip to content

feat(corpus): #65 PR 3 of 4. wiring + delegate threading + doctor + IRON RULE regression suite#304

Merged
dep0we merged 10 commits into
mainfrom
feat/corpus-pr3-wiring
Jun 1, 2026
Merged

feat(corpus): #65 PR 3 of 4. wiring + delegate threading + doctor + IRON RULE regression suite#304
dep0we merged 10 commits into
mainfrom
feat/corpus-pr3-wiring

Conversation

@dep0we
Copy link
Copy Markdown
Owner

@dep0we dep0we commented Jun 1, 2026

Summary

PR 3 of 4 of the CorpusBackend arc (#65). Wires the Protocol shipped in PR 1 and the SQLite reference impl shipped in PR 2 into the framework so single-host operators can pin SQLite via one env var. 10 logical commits (6 wiring + 2 adversarial-review fix commits + 1 R3 documenting comment + 1 doc-release commit).

Operator-facing surface lit up:

  • ATOMIC_AGENTS_CORPUS_BACKEND=sqlite resolves to SQLiteCorpusBackend with default db at <agent_root>/.corpus.db and URL-encoded agent_scope=<agent_root.name>
  • ATOMIC_AGENTS_CORPUS_BACKEND_URL overrides the default; both filesystem://... and sqlite:///...?agent_scope=... URLs route through their factories
  • AtomicAgent(corpus_backend=...) constructor kwarg + per-runner kwargs on OutcomeRunner, EvalRunner, DreamRunner
  • delegate.py explicit-only threading via _corpus_backend_was_explicit flag (mirrors PersonaBackend D-ER-2)
  • doctor.check_corpus_backend PASS/WARN/FAIL ladder with capability snapshot + URL redaction + page-count cliff WARN at ~1000 pages for filesystem
  • cli.py:_cmd_corpus honors env var (no more silent CLI-vs-runtime drift)

Call-site migration at agent.py:2937-2939 and bundle.py:_render_memory_breakpoint routes through corpus_backend.render_index_summary(corpus="wiki") when configured. New shared _render_wiki_index_section(label, path, content) helper produces byte-identical output between Protocol path and legacy fallback (IRON RULE assertion 4). bundle.py:_source_paths migration deferred to v1.1.

Behavior Changes

Two documented behavior changes, both deliberate. See CHANGELOG ### Changed.

  1. AtomicAgent._load_indexes Protocol path catches Exception + legacy path catches OSError/UnicodeDecodeError. Soft-degrade to empty wiki section with a logged wiki_index_unreadable warning instead of crashing agent construction. FilesystemCorpusBackend.render_index_summary matches bundle.py:_safe_read_text byte-for-byte for UnicodeDecodeError (partial content with prepended warning comment).
  2. CLI atomic-agents corpus subcommands honor ATOMIC_AGENTS_CORPUS_BACKEND (was hardcoded filesystem pre-PR-3).

Test Coverage

36 net new tests + 2 augmented integration tests. Coverage audit: 95% (PASS, target 80%).

File Tests Coverage
test_corpus_composition.py (new) 4 flag tracking + delegate threading
test_corpus_migration_regression.py (new) 6 IRON RULE 1-4 + OSError catch + Protocol-path exception boundary (R2-F6)
test_corpus_wiring.py (new) 13 env var + per-runner + CLI activation
test_corpus_doctor.py (new) 11 PASS/WARN/FAIL + page-count cliff + URL redaction
test_agent_cascade_integration.py (augmented) 2 wiki INDEX content + section ordering
test_cascade_bundle.py (augmented) 2 render_bundle 3-level threading + _source_paths v1.1 guard

Full suite: 2853 → 2889 + 48 skipped, zero regressions. IRON RULE assertion 5 (full pre-#65 suite passes unchanged) is the CI gate.

Coverage gaps (minor, non-blocking):

  • doctor.py:553-573 stats()-FAIL branch not directly tested (synthetic failure mode)
  • eval.py EvalRunner→AtomicAgent threading lacks integration test (storage tested; threading verified by code-path inspection)

Plan Completion

16/17 DONE, 1 UNVERIFIABLE (IRON RULE assertion 5 = full pre-#65 suite passes unchanged, CI criterion verified by this PR's CI run).

Pre-Landing Review

2 INFORMATIONAL findings, both applied in Round 1 fix commit. PR Quality Score: 9/10.

Adversarial Review (3 rounds, converged at Round 3 LOW only)

Matches PR 2 of #65 precedent: Round 3 LOW-only = zero CRITICAL/HIGH/MEDIUM.

Round 1 (Claude adversarial subagent + pre-landing review): 10 adversarial findings + 2 INFORMATIONAL pre-landing. 8 high-confidence FIXABLE applied: URL-encode agent_root.name, broad SQLite construction catch, UnicodeDecodeError catch in render_index_summary, Protocol-path exception boundary in _load_indexes, doctor WARN message rewrite, defensive conditional replacing bare asserts. 2 deferred to PR 4 backlog (cascade-layout corpus_backend._agent_root divergence; CLI env-var documented in CHANGELOG).

Round 2 (hunting Round 1's fix surfaces): 3 MEDIUM + 4 LOW. F3 (FIXABLE, MEDIUM) load-bearing: Round 1 fix returned "" on UnicodeDecodeError but the pre-PR-3 bundle.py:_safe_read_text re-read with errors="replace" and prepended a warning comment. Asymmetric soft-degrade silently lost wiki body content. Rewrote to match _safe_read_text byte-for-byte. F4 (LOW): stale comment about "URL silently ignored". F5 (LOW): defensive-FAIL detail dict expanded with capability snapshot. F6 (closed via new test): test_agent_load_indexes_protocol_path_exception_soft_degrades. F1/F2/F7 defended as trade-off calls.

Round 3 (hunting Round 2's fix surfaces): 3 LOW + zero higher tiers. R3-F1: legacy else-branch doesn't catch UnicodeDecodeError; unreachable in production so documented in comment. R3-F2: UnicodeDecodeError branch lacks direct test; acknowledged coverage gap (follow-up backlog). R3-F3: defensive-FAIL detail dict path remains untested; accepted trade-off (logically unreachable).

Convergence verdict: Round 3 LOW only. PR 3 ready for merge.

Documentation

  • docs/spec/34-corpus-backend.md: inline status notes flipping the PR 3 wiring contract from "to be implemented" to "implemented in PR 3" (spec/34 LOCK + N-MUST finalization stays at PR 4).
  • docs/spec/27-doctor.md: new ### corpus-backend entry (12th check_*_backend doctor catalogue entry) documenting check_corpus_backend's PASS/WARN/FAIL ladder, capability snapshot, and page-count cliff WARN.
  • CHANGELOG.md: 4 entries under [Unreleased] covering PR 3 wiring, Round 1 fixes, Round 2 fixes, and 2 documented behavior changes (legacy OSError catch + CLI env-var honoring).

Test plan

  • Full pytest suite green (2889 + 48 skipped, zero regressions)
  • Ruff format clean on PR 3 files
  • Ruff lint introduces zero new findings (4 pre-existing lints on main remain untouched)
  • Round 1 + Round 2 + Round 3 adversarial: convergence at Round 3 with LOW only
  • Initial CI run green on commit 06cf8b3 (Analyze actions, pytest 3.11, pytest 3.12, Analyze python)
  • CI green on commit 6857402 (re-running after doc-release push) — pending

🤖 Generated with Claude Code

Dan Powers and others added 10 commits June 1, 2026 09:46
corpus/__init__.py: add sqlite branch to get_default_corpus_backend so
ATOMIC_AGENTS_CORPUS_BACKEND=sqlite resolves to SQLiteCorpusBackend with
default db at <agent_root>/.corpus.db, agent_scope=<agent_root.name>.
Mirrors profile/__init__.py:227-235 precedent. Empty-string env var
normalizes to filesystem. Wraps sqlite construction in (OSError,
PermissionError) for clean operator-facing error.

cli.py: replace hardcoded FilesystemCorpusBackend(agent_root) in
_cmd_corpus with get_default_corpus_backend(agent_root) so CLI honors
ATOMIC_AGENTS_CORPUS_BACKEND env var (closes silent CLI-vs-runtime drift).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…or catch)

Constructor adds corpus_backend kwarg + class-level annotation. Resolves
via get_default_corpus_backend(self.agent_root) when not supplied. Mirrors
PersonaBackend D-ER-2 explicit-only threading: _corpus_backend_was_explicit
flag saved to self, consumed at delegate() for conditional kwarg insertion.

_load_indexes() at agent.py:2933-2985 routes wiki/INDEX.md read through
CorpusBackend Protocol when configured. Legacy direct-read fallback now
catches OSError and returns empty string with a logged warning marker
wiki_index_unreadable, matching FilesystemCorpusBackend.render_index_summary
behavior at corpus/filesystem.py:701-702. Brings both code paths into
behavioral agreement so the IRON RULE byte-identity assertion holds.

NOTE: legacy direct-read previously propagated OSError. This is an
intentional behavior change. Operators with a wiki/INDEX.md that becomes
briefly unreadable now see a logged warning and an empty wiki section
rather than a hard crash at agent construction.

delegate() at agent.py:4628-4629 threads corpus_backend ONLY when the
operator supplied it explicitly. Default-resolved backends do not leak
the coordinator's content_root to delegates (corpus is per-agent
semantic context, not fleet-scoped).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ity helper

render_bundle, _render_sections, and _render_memory_breakpoint all gain
corpus_backend: CorpusBackend | None = None parameter, threaded through
all three call levels. When corpus_backend is None at any caller, the
fallback path uses the legacy direct file read.

New private helper _render_wiki_index_section(label, path, content)
produces the bundle section in the canonical
## {label}\n\`{path}\`\n\n{content} format used by _render_file_section.
Both the corpus_backend Protocol path AND the legacy fallback path call
this helper with the same logical wiki path so byte-identical output
is guaranteed regardless of which path produced the content. Closes
the IRON RULE assertion 4 risk.

Both branches apply .strip() to match _render_file_section's
_safe_read_text(...).strip() behavior. Skip the section when content
is empty (no file or empty file).

_source_paths at bundle.py:266 gets a TODO(v1.1) comment noting the
deferred Protocol routing (filesystem-only function; SQLite has no
equivalent path to track). Follow-up issue filed at PR 4.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OutcomeRunner (outcome.py:255) and EvalRunner (eval.py:363) each accept
corpus_backend: CorpusBackend | None = None and thread it to their
internal AtomicAgent construction site. Mirrors the per-runner kwarg
shape locked at #63 PR 2 (AgentProfileBackend) and #62 PR 2
(PersonaBackend).

DreamRunner accepts the kwarg for API parity but does NOT thread it
to any internal AtomicAgent construction site (none exists in v1).
Stored as self._corpus_backend with a comment matching the existing
DreamRunner pattern for the other 4 backend kwargs (_policy_backend,
_persona_backend), documenting the future-state threading site.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…IL ladder

12th check_*_backend implementation in doctor.py. Mirrors check_mandate_backend
shape (the most recently merged precedent from #296).

PASS: backend constructs successfully + stats("wiki") and stats("raw")
both return without raising. Capability snapshot in detail dict
(backend_id, supports_full_text_search, supports_semantic_search,
supports_versioning, embedding_provider, wiki_page_count, raw_page_count).

WARN: 1. supports_full_text_search=False AND wiki_page_count > 1000 OR
raw_page_count > 1000 (the page-count cliff WARN per
/plan-eng-review 2026-05-29 finding P1). Hint names ATOMIC_AGENTS_CORPUS_BACKEND=sqlite
as the remedy.

WARN: 2. ATOMIC_AGENTS_CORPUS_BACKEND_URL set but
ATOMIC_AGENTS_CORPUS_BACKEND not. URL silently ignored otherwise; surface
this misconfiguration explicitly.

FAIL: backend cannot be constructed, OR stats() raises. URL credential
redaction via existing _redact_for_error_message helper used by the
other doctor checks.

Probes BOTH wiki and raw corpora (the page-count cliff WARN fires if
either exceeds the threshold). Registered in run_all_checks dispatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ctor

35 net new tests + 2 augmented existing integration tests.

NEW tests/test_corpus_composition.py (4 tests): _corpus_backend_was_explicit
flag tracking + delegate explicit-only threading.

NEW tests/test_corpus_migration_regression.py (5 tests): the IRON RULE
suite. Assertions 1-4 verify byte-identity between corpus_backend=None
fallback path and corpus_backend=FilesystemCorpusBackend Protocol path
at both agent.py:_load_indexes and bundle.py:_render_memory_breakpoint
call sites. Assertion 5 (full pre-#65 suite passes unchanged) is a CI
criterion documented in the PR body. Plus 1 OSError catch test
exercising the new legacy-path soft-degrade behavior.

NEW tests/test_corpus_wiring.py (13 tests): env var resolution
(filesystem default, sqlite default, URL override, empty-URL fallback,
empty-backend-treated-as-unset, whitespace padding, filesystem URL,
agent_root empty name guard) + per-runner kwarg storage (Outcome,
Eval, Dream) + OutcomeRunner threading + CLI env-var activation.

NEW tests/test_corpus_doctor.py (11 tests): PASS/WARN/FAIL ladder
discrimination across capability conditions + page-count cliff WARN on
both wiki and raw corpora + URL-without-backend WARN + construction-fail
FAIL + unwritable-path FAIL + capability snapshot completeness + URL
credential redaction + run_all_checks integration.

AUGMENTED tests/test_agent_cascade_integration.py: _build_full_cascade_layout
fixture writes real wiki/INDEX.md content + assert wiki section header +
body content + section ordering after memory INDEX. Closes silent-corruption
risk class flagged by /plan-subagent S4 (9 wiki-touching tests created
empty wiki dirs with ZERO INDEX content assertions).

AUGMENTED tests/test_cascade_bundle.py: end-to-end render_bundle threading
test asserting byte-identity between Protocol path and fallback path +
_source_paths v1.1 deferral guard test (pins the deferral decision
mechanically; a future premature Protocol-routing of _source_paths would
fail this test).

Total suite: 2853 -> 2888 passing + 48 skipped. Zero regressions to the
pre-#65 surface.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Round 1 adversarial review (Claude subagent + pre-landing review) caught
8 high-confidence findings + 2 INFORMATIONAL pre-landing findings. All
addressed in this commit. Round 2 + Round 3 to follow under /ship.

corpus/__init__.py SQLite branch:
- URL-encode agent_root.name via quote_plus so names containing URL
  metacharacters (spaces, +, &, ?, =) do not silently corrupt agent_scope
  or raise ValueError. A name like "my+agent" decoded as "my agent" via
  parse_qsl, causing cross-scope contamination with a real agent named
  "my agent".
- Widen the construction try/except from (OSError, PermissionError) to
  Exception. Re-raise as CorpusBackendNotRegistered with the URL remedy.
  Covers ValueError (malformed URL, invalid charset) and
  sqlite3.OperationalError (db locked at cold start, WAL transition
  failure on NFS) that previously escaped as raw library exceptions.
  PermissionError is redundant (subclass of OSError) so it drops.

corpus/filesystem.py render_index_summary:
- Add UnicodeDecodeError to the except clause. Pre-PR-3 bundle.py used
  _safe_read_text which catches UnicodeDecodeError; the Protocol path
  did not. A wiki/INDEX.md with non-UTF-8 bytes (Latin-1, BOM, mixed
  encodings) would crash agent construction via the Protocol path
  where the legacy bundle path gracefully degraded. Now at parity.

agent.py _load_indexes:
- Add broad try/except around the Protocol path call to
  corpus_backend.render_index_summary. Soft-degrade to empty string
  with a logged wiki_index_unreadable warning so any custom-backend
  exception (sqlite3.OperationalError, CorpusError, KeyError) does
  not crash agent construction. Matches the legacy direct-read soft
  degrade behavior.
- Update comment on the legacy direct-read else-branch: noting it is
  unreachable in production after Stream B's default-resolution at
  __init__ (self.corpus_backend is always non-None). Retained as a
  safety net for future refactors that remove the auto-resolve.
  Exercised by tests in test_corpus_migration_regression.py that
  force corpus_backend=None post-construction.

doctor.py check_corpus_backend:
- Rewrite the URL-without-backend WARN message. Pre-fix message said
  "the URL is being ignored" which was factually wrong: when backend
  unset and URL set, get_default_corpus_backend normalizes to
  filesystem and routes the URL through
  make_filesystem_corpus_backend_from_url. The URL is USED. New message
  describes the implicit-default state and recommends explicit binding.
- Update the docstring at line 2438-2447 to match (drop the misleading
  "or resolves to filesystem" qualifier).
- Replace bare assert wiki_stats is not None / assert raw_stats is not
  None at line 2583-2584 with a defensive conditional that returns
  CheckResult(status=FAIL) on the logically unreachable None state.
  Preserves the always-returns-CheckResult contract under python -O
  optimized builds.

tests/test_corpus_doctor.py:
- Update test_check_corpus_backend_warns_url_without_backend_id to
  assert on stable substrings rather than the verbatim previous message
  text. Matches the new WARN wording.

CHANGELOG.md:
- Add two PR 3 entries to [Unreleased] section: main PR 3 wiring entry
  + Round 1 adversarial fix entry, plus two Changed entries documenting
  the agent.py legacy-path OSError catch behavior change and the
  cli.py CLI env-var honoring behavior change.

Full pytest suite: 2853 -> 2888 passing, 48 skipped, zero regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Round 2 hunted in the Round 1 fix commit (ad13220) per CLAUDE.md rule 11
("each fix changes the diff and exposes new edges"). Caught 3 MEDIUM +
4 LOW findings introduced by Round 1. F3, F4, F5 applied as FIXABLE +
F6 closed as a coverage gap. F1, F2, F7 defended as trade-off calls.

R2-F3 (FIXABLE, was MEDIUM):
- corpus/filesystem.py render_index_summary returned "" on
  UnicodeDecodeError, silently losing wiki body content where the
  pre-PR-3 bundle.py _safe_read_text preserved partial content. Round 1
  CHANGELOG claimed "matches the pre-#65 behavior" but the code did NOT.
- Rewrite to match _safe_read_text exactly: re-read with
  errors="replace" + prepend the same warning comment shape used by
  _safe_read_text. The Protocol and legacy paths now produce truly
  symmetric output for the Unicode case.
- Split the except clause into separate UnicodeDecodeError and OSError
  branches because they have different soft-degrade behavior
  (UnicodeDecodeError has partial content available, OSError does not).

R2-F4 (FIXABLE, was LOW):
- doctor.py:2476-2481 comment still said the URL was "silently ignored"
  after Round 1 fixed that. Updated to describe the post-fix behavior
  accurately: URL is honored via the filesystem factory; binding is
  implicit; the WARN surfaces the implicit-default state.

R2-F5 (FIXABLE, was LOW):
- doctor.py:2588-2598 defensive-conditional FAIL detail dict carried
  only backend_id, dropping the capability snapshot fields already
  available in caps. Operators debugging the (logically-unreachable)
  None state had no context.
- Expand the dict to include supports_full_text_search,
  supports_semantic_search, supports_versioning, embedding_provider.

R2-F6 (INVESTIGATE -> closed via new test):
- No test exercised the Protocol-path except Exception branch added
  in Round 1; only the legacy-path OSError catch had a test.
- Add test_agent_load_indexes_protocol_path_exception_soft_degrades to
  tests/test_corpus_migration_regression.py. Uses a _RaisingCorpusBackend
  stub whose render_index_summary raises sqlite3.OperationalError.
  Verifies _wiki_index_text == "" + log marker + backend-class-name
  in the warning message.

Defended:
- F1 (broad except in corpus/__init__.py misdirects with sqlite-URL hint
  on non-storage errors): cause type is included in error message so
  developers can debug; the production stability trade-off is the right
  default.
- F2 (Protocol-path broad except silently degrades on programmer errors
  like AttributeError): same trade-off; the logged wiki_index_unreadable
  warning is observable; strict-fail behavior is a follow-up env var
  (ATOMIC_AGENTS_CORPUS_STRICT) for a future PR.
- F7 (sqlite-specific URL remedy in error message): scoped correctly
  inside the sqlite branch only.

CHANGELOG updated with the Round 2 fix bullet under [Unreleased].

Test suite: 2888 -> 2889 passing + 48 skipped, zero regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Round 3 R3-F1 (LOW, 7/10): the legacy direct-read else-branch in
_load_indexes catches OSError but not UnicodeDecodeError. The branch is
unreachable in production after PR 3 default-resolution, so adding a
catch would be dead code. Document the gap in the comment instead so
a future contributor reactivating the branch knows to mirror the
Protocol path's partial-content soft-degrade.

Round 3 R3-F2 (LOW, 8/10) and R3-F3 (LOW, 6/10) are coverage gaps
accepted as follow-up backlog candidates (UnicodeDecodeError branch in
render_index_summary lacks a direct test; defensive-FAIL detail dict
path remains logically unreachable). Neither blocks PR 3 merge.

Round 3 convergence shape: 3 LOW + zero CRITICAL/HIGH/MEDIUM. Matches
PR 2 of #65 precedent (PR 2 converged at Round 3 with 5 LOW = zero
higher tiers). PR 3 ready for merge per CLAUDE.md rule 11 ("2-3 rounds
is sufficient for most diffs").

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…kend stub

spec/34 inline note: PR 3 wiring IMPLEMENTED. Does NOT drop the RFC
banner or finalize the N-MUST Implementer Contract -- both PR 4 work.

spec/27 corpus-backend entry: 12th check_*_backend doctor entry. PASS/
WARN/FAIL ladder, capability snapshot, page-count cliff WARN at ~1000
pages for filesystem backends without supports_full_text_search.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dep0we dep0we merged commit 3d82c84 into main Jun 1, 2026
5 checks passed
@dep0we dep0we deleted the feat/corpus-pr3-wiring branch June 1, 2026 15:56
dep0we added a commit that referenced this pull request Jun 1, 2026
…tus flip (#316)

* docs(corpus): #65 PR 4 of 4. spec/34 LOCKED + spec/24 Decision 7 addendum

spec/34 flips from RFC to LOCKED. RFC banner and 4-PR shipping-plan provenance
block replaced with a single locked-status line. Per-PR temporal markers
throughout the body consolidated to present-tense lock prose (capability
declarations, "Per-runner kwargs (PR 3 -- implemented)" subheaders, "SQLite
hybrid layout (PR 2)" header, "Call-site migration reference (PR 3 --
implemented in #65 PR 3 of 4)" section title, "Follow-up issue filed at PR 4"
deferral markers). The "PR 4 documentation-update checklist" section (lines
847-864) deleted; self-referential scaffolding has no place in a locked spec.

Implementer Contract finalized at 9 normative MUSTs, mirroring PersonaBackend
spec/33's shape exactly with one extra MUST for the query() capability
precedence rule that CorpusBackend has via the FTS5 / semantic / substring
fallback ladder: (1) name and corpus charset validation at API boundary,
(2) side-effect-free construction, (3) capability honesty including
embedding_provider=None invariant, (4) query() capability precedence rule,
(5) write_page() 4-case behavior table, (6) URL credential redaction across
all operator-facing error paths, (7) cross-corpus isolation at storage layer,
(8) snapshot id determinism + cross-page isolation, (9) backend_id property
stability + close() idempotency. The merge in MUST 9 is honest: backend_id
is name-identity and close() is lifecycle-identity, both backend-identity
contracts.

spec/24 Decision 7 receives the CorpusBackend ownership addendum. The
existing "Why" paragraph previously said MemoryBackend owned wiki/, memory/,
and journal/. With CorpusBackend locked, the addendum clarifies: MemoryBackend
retains exclusive ownership of memory/ and journal/; CorpusBackend, when
registered, owns wiki/ and raw/. The two backends compose at prompt assembly
(agent.py:_load_indexes() reads from both).

18 distinct edits across 11 line ranges in spec/34. File 881 to 855 lines.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(corpus): #65 PR 4 of 4. refresh CLAUDE.md + README.md + ROADMAP to "eleven shipped"

CLAUDE.md adds the canonical CorpusBackend lock-paragraph (the 11th, mirroring
the 10 prior shipped-protocol bullets), flips the ASCII architecture diagram
from "Corpus 🟡" to "Corpus ✅ (locked at #65 PR 4)", bumps the spec-doc count
from "30 locked + 2 drafts" to "31 locked + 2 drafts", refreshes the live test
count to 2,937 collected (2,889 passing + 48 skipped) at 2026-06-01, and flips
the Status block from "Ten backend protocols shipped" to "Eleven backend
protocols shipped". Status tail flips from "remaining two protocols (Corpus /
MCPServerRegistry)" to "remaining protocol (MCPServerRegistry)".

README.md adds CorpusBackend to the shipped list in the Current limits
paragraph (replacing "filesystem-default-only today" with the locked
CorpusBackend summary including FTS5 + page-count cliff WARN + CLI + env-var
override), bumps the comparison-matrix locked-docs count from 30 to 31, adds
spec/34 to the spec list, flips the backend-protocols table row for
CorpusBackend from "Planned" to "✅ Shipped" with the locked summary cell,
flips the v1 direction sentence from "those two land" to "MCPServerRegistry
lands", bumps the repo-structure test count to 2937 collected (2889 passing),
and flips the Status block from "Ten of twelve" to "Eleven of twelve".

ROADMAP.md (repo root, public strategic narrative) flips line 11 from "Ten of
twelve" to "Eleven of twelve" with CorpusBackend appended to the shipped list
and "Two remain" to "One remains", removes the now-shipped #65 row from the
remaining-protocols table, and flips the ship-when sentence from "both
remaining backends" to "the remaining backend".

7 + 10 + 3 = 20 edits across 3 files. The vault ROADMAP at ~/ObsidianVault/
Atomic Agents/ROADMAP.md is refreshed out-of-band (not in the git repo).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(corpus): #65 PR 4 of 4. cross-spec CorpusBackend refs + reference impl docstring scrub

Cross-spec cross-references propagate the CorpusBackend locked status to
adjacent spec docs:
- spec/01 (anatomy): adds a CorpusBackend cross-reference paragraph after the
  wiki/raw section explaining the Protocol seam and the SQLiteCorpusBackend
  GB-scale benefit.
- spec/02 (atomic memory): adds a CorpusBackend cross-reference paragraph
  after the "Why two layers" section naming the wiki/ + raw/ vs memory/ +
  journal/ ownership split.
- spec/04 (runtime assembly): adds an integration note after the canonical
  load order describing how step [7] routes through corpus_backend.
  render_index_summary("wiki") when CorpusBackend is registered.
- spec/26 (cascade bundle DRAFT): flips two future-tense references ("when
  CorpusBackend ships") to present-tense ("now that CorpusBackend has shipped,
  locked at #65 PR 4 of 4") and updates the composition table row to cite
  the specific render_index_summary("wiki") method.
- spec/31 (LLMBackend): appends "(spec/34)" link to the Corpus entry in the
  protocol-pattern list.

spec/27 (doctor catalogue) already has the corpus-backend entry from PR 3
inline status flip; no edit needed (verified).

Reference impl docstring scrub completes the per-PR-marker consolidation
sweep across shipped Python code:
- corpus/__init__.py: drops "scaffolding PR -- no behavior change today" and
  "in PR 3" temporals; rewrites the PRE-PR-3 wiring contract block to a
  present-tense locked-status block (the SQLiteCorpusBackend "DEFERRED"
  bullet is now FALSE since SQLite shipped in PR 2; only semantic search
  remains deferred to v1.1); drops "(wired in PR 3)" from
  get_default_corpus_backend docstring.
- corpus/types.py: drops "PR 1 of 4" + "PR 1, File 2 of 3" parentheticals;
  deletes the "Scaffolding PR (#65 PR 1 of 4)" paragraph entirely; drops
  "in PR 1 / PR 2 respectively" temporal.
- corpus/backend.py: replaces the 4-bullet per-PR shipping plan with a
  single locked-status line.
- test_corpus_sqlite_backend.py: drops "PR 2 of 4" from the module docstring.

Mirrors PersonaBackend PR 4 commit 93dad48's stale-marker scrub pattern.
All edits are docstring/comment only; no executable Python changed. 158
tests on the affected corpus modules continue to pass; full suite still
2889 passing + 48 skipped (zero regressions).

7 + 10 = 17 edits across 9 files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(corpus): #65 PR 4 of 4. CHANGELOG arc-closer + non-spec doc test + protocol counts

CHANGELOG.md adds 3 bullets under [Unreleased] mirroring PersonaBackend
PR 4's arc-summary shape:
- ### Changed: framework status flip from "ten of twelve" to "eleven of
  twelve backend protocols shipped" with operator-user outcome lead
  (pin SQLite via one env var for indexed FTS5 query at GB scale; doctor
  surfaces page-count cliff WARN; CLI honors env var; legacy paths
  soft-degrade gracefully on UnicodeDecodeError + OSError; IRON RULE
  byte-identity preserved). Cites all 4 PRs (#297, #298, #304, this PR)
  and all 10 follow-up issues (#305-#314).
- ### Changed: spec/24 Decision 7 addendum naming CorpusBackend as the
  source of truth for wiki/ and raw/ (cross-spec ownership propagation).
- ### Documentation: spec/34 LOCKED + doc-release sweep landed. Names
  the 9-MUST Implementer Contract finalization, the per-PR marker scrub
  across spec body + reference impls + tests + strategic docs, and the
  cross-spec cross-references.

docs/deployment/programmatic.md: protocol-pattern paragraph flipped from
"Ten backend protocols have shipped" to "Eleven backend protocols have
shipped"; CorpusBackend added to the enumerated list; spec/34 added to
the spec doc list; "two remain" flipped to "one remains" with
MCPServerRegistryBackend the only remaining protocol.

docs/methodology.md: "today ten are shipped" flipped to "today eleven
are shipped" with CorpusBackend appended; test count bumped from 2686+
to 2937+.

CONTRIBUTING.md: stale "2401 tests today" (drifted across multiple arcs)
refreshed to "2937 tests today".

3 + 1 + 1 + 1 = 6 edits across 4 files.

Test suite stable at 2889 passing + 48 skipped (Python 3.11/3.12).
This is the PR 4 of 4 arc closer. After merge, the CorpusBackend arc
CLOSES. 11 of 12 backend protocols shipped for v1.0; only
MCPServerRegistryBackend (#201) remains.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(corpus): #65 PR 4 of 4 R2. fold test_corpus_registry.py into the locked record

Adversarial Round 1 caught an inverse-phantom in the CLAUDE.md 11th
CorpusBackend lock-paragraph: the canonical "locked at PR 4 with..."
test file list omitted `tests/test_corpus_registry.py`, which is a real
file with 4 tests shipped in PR 1 of the arc and cited in CHANGELOG's
PR 1 bullet. spec/34's §"Test coverage" PR 1 section also did not
enumerate it. The omission is small but it creates inconsistency
between the canonical PR 4 record (CLAUDE.md lock-paragraph) and the
actual locked test surface.

This is the PersonaBackend PR 4 Round 1 phantom-file failure shape in
the opposite direction: a real-but-uncited file rather than a
cited-but-nonexistent file. Same risk surface; same fix discipline.

Fixes:
- CLAUDE.md line 15 lock-paragraph: insert `tests/test_corpus_registry.py`
  between `test_corpus_sqlite_backend.py` and `test_corpus_composition.py`
  in the locked-at-PR-4 test file list.
- spec/34 §"Test coverage" PR 1 section: add a 4-bullet sub-list under
  the `tests/test_corpus_registry.py` heading naming the registry
  primitives the tests cover (register / unregister round-trip and
  collision-replace; get_corpus_backend raises on unknown id;
  list_corpus_backends ordering; get_default_corpus_backend env var).

Round 1 finding count: 0 P0, 1 P1, 0 P2. This commit lands the Round 2
convergence; full pytest still 2889 passing + 48 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(corpus): #65 PR 4 of 4 R3. doc-release sweep convergence

Step 18 doc-release subagent caught 5 stale per-PR temporal markers that
the Stream A spec/34 LOCK sweep and the Stream F reference-impl docstring
sweep both missed. PersonaBackend PR 4's doc-release subagent caught the
exact same shape (2 finds folded into commits ad6723b + e1d05cf); for
PR 4 of #65 the equivalent finds are landed here BEFORE PR creation
rather than fix-forward post-creation.

Three spec/34 body edits:
- Line 54 module-layout code block: drop "# SQLite ships in PR 2:"
  comment header (stale future-tense; SQLite shipped in PR 2 of this arc).
- Line 233 Protocol surface docstring: drop "(PR 3)" parenthetical from
  the render_index_summary migration-target comment.
- Line 818 implementation notes: rewrite "PR 3's call-site migration scope
  is writes of render_index_summary only" to present-tense "The call-site
  migration scope is reads through render_index_summary only"; rewrite
  "The PR 3 IRON RULE regression suite" to "The IRON RULE regression
  suite".

Two corpus reference-impl docstring edits:
- corpus/types.py lines 190-195: drop "(Subagent 2 HIGH H4 ... is a
  design assumption until real raw sample data is added at PR 1 prep or
  contributed by operators). Accept as provisional for v1.0." replaced
  with "the raw-side field shape is locked at v1.0 against issue #65's
  stated schema. Operator-contributed raw sample data could surface
  refinements for v1.1." The word "provisional" in a locked spec's
  reference impl contradicts the spec/34 LOCKED status; the v1.1
  refinement framing matches the corpus_backend bundle.py:_source_paths
  v1.1 migration pattern at #314.
- corpus/backend.py lines 145-156: drop "the PR 3 call-site migration"
  and "(PR 3)" temporals from the render_index_summary Protocol method
  docstring. The migration is historical; the docstring describes the
  Protocol contract today.

Sixth finding (doc-release Check 3, TENSIONS.md T9): classified as
FOLLOW_UP, not FIX_NOW. T9 carries pre-landing predictive language
("expect ~26 spec docs," "Around spec doc #25 (~CorpusBackend land)").
Worth a follow-up issue to update the count and tense; not blocking PR 4.
(Tension itself, "spec surface grows with code surface," is still active.)

Full pytest re-run on the 8 corpus test modules after these edits: 196
tests pass, zero regressions. Full suite expected to remain at 2889 + 48
skipped (no executable Python changed; only docstrings + a comment line).

This completes the per-PR-marker consolidation sweep across the full
locked surface: spec body, reference impls, tests, strategic docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Dan Powers <dep0we@gmail.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant