Skip to content

Wave: #17 auto-snapshot + #16 configurable output language#25

Merged
VGonPa merged 26 commits into
mainfrom
develop
May 22, 2026
Merged

Wave: #17 auto-snapshot + #16 configurable output language#25
VGonPa merged 26 commits into
mainfrom
develop

Conversation

@VGonPa
Copy link
Copy Markdown
Owner

@VGonPa VGonPa commented May 22, 2026

Promotes two completed issues from develop to main.

Issues included

Validated locally

  • ✅ Quality gate all-green on develop tip.
  • ✅ End-to-end smoke test on Víctor's real corpus (1884 items): regenerate switches between English and Spanish headers correctly; worksheet rubric ships with {language} already substituted; snapshot list/restore/prune work as expected.

Follow-ups (already filed)

Test plan

🤖 Generated with Claude Code

VGonPa and others added 26 commits May 20, 2026 15:25
Closes #15.

A reference-level document covering how XBrain is shaped (the README is
onboarding; this is the "why is the pipeline like this" answer). Includes:

- Pipeline diagram of extract → fetch → vocab → enrich → topics → generate
  and the data flow between them.
- Per-stage breakdown: what each command does, what it reads, what it
  writes, and why it is shaped that way.
- The data layer: items.json (source of truth), state.json, vocab.yaml,
  topics.json — and why JSON+YAML beat a database here.
- The rubric layer: what each of the four rubrics instructs, why they
  live in declarative markdown, and the LLM-emits-only-judgment rule.
- The validator + guardrails.yaml — the structural gate before the store.
- The executor model — api vs claude-code (worksheet) vs manual.
- Seven invariants the rest of the architecture rests on.
- A where-things-live map of the full repo layout.

README.md gets a pointer to ARCHITECTURE.md in the "How it works" section
and a row in the Documentation table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ARCHITECTURE.md:
- Add a temporal "step by step" Mermaid diagram at the top of the pipeline
  section. The existing diagram is architectural (what reads what); this one
  is sequential (what runs, in order, with what it reads/writes and the key
  invariant of each step). It is the answer to "what happens in each phase".
- Document the off-the-main-loop ops: import-archive, sync, status.

README.md:
- Replace the pipeline diagram. The previous one self-looped items.json on
  stage ②, which Mermaid renders by hiding the edge label — so ② "fetch"
  was invisible. The new diagram is a clean linear flow; the artifact-hub
  detail moves to a sentence below.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous iteration removed the data artifacts (items.json, vocab.yaml,
topics.json, wiki) from the pipeline diagram in the name of fixing the
invisible ② label. Wrong trade — the artifact layout was the most useful
part of the diagram.

This restores the hub-and-spoke design with items.json at the center, but
converts every command into a labelled node (instead of relying on edge
labels for a D → D self-loop). All six stage numbers now render, and the
data flow into vocab.yaml / topics.json / the wiki is visible again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The hub-and-spoke Mermaid attempt was a mess: arrows looping back to
items.json crossed over each other, ② fetch ended up rendered above the
main flow instead of after extract, and the wiki node landed under
GitHub's zoom controls. Trying to show six stages AND four artifacts
AND every read/write edge in one LR diagram was the wrong trade.

New shape:
- Diagram is a clean linear arrow chain ① → ② → ③ → ④ → ⑤ → ⑥ → wiki.
  Single visual job: show the order. No crossings, all six numbers visible.
- The table gets a new "Writes to" column listing the artifact each stage
  writes (items.json, state.json, vocab.yaml, topics.json, the vault).
  That carries the where-things-go information the diagram used to try
  to show.
- Closing paragraph keeps the items.json-is-the-hub framing for the
  data-flow narrative without overloading the diagram.

For the full read/write graph and per-stage detail, ARCHITECTURE.md
remains the reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous LR linear diagram lost the where-things-go information that
the original (broken) hub-and-spoke tried to show. New shape, modelled
on the hand sketch:

README pipeline diagram:
- Top-to-bottom chain on the left: X → ① extract → ② fetch → ③ vocab →
  ④ enrich → ⑤ topics → ⑥ generate → wiki. Strict execution order.
- Artifacts on the right, with dashed "writes"/"mutates" edges from each
  stage to the file it persists into. items.json appears once with three
  edges (extract writes, fetch + enrich mutate) — visualises the hub.
- classDef styling: stages purple, artifacts amber, sources blue, wiki
  green. GitHub renders this with no extra setup.

README three-layer diagram (in 'What you get'):
- Replaces the minimal three-box stack with subgraphs that actually
  convey the cardinality: many Items → fewer Topics → one Index.
- Same colour scheme as the pipeline (items purple, topics amber,
  index green) so the families are visually consistent.

ARCHITECTURE.md:
- Architectural diagram (Sources/Pipeline/Data/Wiki subgraphs) gets the
  same classDef palette — visually consistent with the README diagrams.
- Step-by-step diagram colours: mechanical stages purple, LLM stages
  amber. You can tell at a glance which steps cost tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…labels

Pipeline diagram:
- 🧠 icon + subroutine shape on the knowledge-base node so it visually
  stands out from the data artifacts (it is the destination, not a
  store).
- Add the read edges that were missing: vocab.yaml → enrich, topics,
  generate; topics.json → generate. Answers "who uses the vocab?" and
  "who uses topics.json?" directly from the diagram.
- Replace the single-paragraph "items.json is the hub" with a bullet
  list per artifact (items.json / vocab.yaml / topics.json) — what it
  is, who writes, who reads. The "items.json isn't part of the
  knowledge base?" confusion gets a direct answer: items.json is the
  database; generate is what turns it into items/*.md inside Obsidian.

Three-layer wiki diagram:
- Wrap the three layers in a 🧠 Obsidian knowledge base subgraph so it
  is visually obvious where they live.
- Use Mermaid's asymmetric flag shape (>...]) for every item / topic /
  index node — it reads as a document/page, not a box.
- Layer subgraphs carry the counts in the title ("Items · ~1k+ notes",
  "Topics · ~30-45 notes") so the cardinality is visible.
- Thicker arrows (==>) for "grouped under" / "mapped from" to make the
  layer hierarchy stand out.

Layer 1 / 2 / 3 prose: each example block is now preceded by an
italic "Example:" so the reader knows the markdown is illustrative,
not literal current output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pipeline:
- The 'reads' edges added to the pipeline diagram made the auto-layout
  fall apart (vocab.yaml and items.json drifted apart, the chain
  zig-zagged). Drop them — the read graph goes in the explanatory text
  below the diagram, not in the diagram itself.
- Diagram is back to clean linear stages on the left + writes/mutates
  dashed edges to the artifact cylinders on the right.

Three-layer wiki:
- Mermaid's nested subgraphs with asymmetric shapes produced something
  unreadable: tiny boxes, layered yellow rectangles, the index node
  clipped under GitHub's zoom UI. Wrong tool for this job.
- Replace with a native HTML table — three columns, one per layer
  (Items / Topics / Index), each with: an emoji header, a one-line
  cardinality subtitle, an ASCII mock-up of the note that layer
  produces, and a one-line description below.
- Result: looks like three documents side by side in a vault. GitHub
  renders it natively, no Mermaid rendering surprises.
- Add a leading sentence that anchors the layers to their literal
  location: 'All three layers are markdown notes inside a single
  Obsidian vault, under learnings/x-knowledge/.'

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pipeline diagram:
- Replace the single 'Obsidian knowledge base' node with a subgraph
  containing the three note kinds that actually live in the vault:
  items/*.md, topics/*.md, _index.md. Answers 'do items / topics live as
  markdown in Obsidian?' directly from the diagram — yes, you can see it.
- ⑥ generate fans out to all three with thick arrows (==>), so the
  one-stage-writes-three-files relationship is obvious.

Three-layer wiki table:
- Add a visible heading 'Example layout — three notes side by side, as
  they appear in the vault' so the table is unambiguously an example.
- Translate the ASCII mock-ups to English (Topics, Summary instead of
  Temas, Resumen).

Layer 1 / 2 / 3 markdown examples:
- Translate the prose, headers and frontmatter to English (Topics,
  Content, Key notes, Primary posts, Also relevant, Summary).
- The id, slugs and tweet text stay as-is — they are literal data, not
  prose to translate.

Closing note about language:
- Replace the 'examples are in Spanish' note with: 'examples above are
  shown in English for clarity; live output is Spanish today; config
  parameter is on the roadmap #16'. Sets expectations correctly:
  examples are illustrative, not the literal current Spanish output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
'~1k+ notes' was the size of Víctor's corpus on the live run, not a
property of XBrain. Replaced in both places it appeared as if it were a
feature:

- Pipeline diagram subgraph: items/*.md now says 'one note per post',
  topics/*.md says 'one note per topic'. Cardinality, not absolute count.
- Three-layer wiki table: column subtitles now describe the cardinality
  rule ('one per saved post · scales with your X corpus' / 'one per
  topic · ~30 by default, configurable' / 'one note · the map'). The
  user reading the README sees a behaviour, not a promise of corpus size.

The ASCII mock-ups still show concrete numbers (1884 items, etc.) — they
are explicitly labelled as 'Example layout', so concrete is honest there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply consistent visual theming to every Mermaid diagram in the repo:

Common to all three diagrams:
- %%{init}%% block with theme:'base' + themeVariables setting
  system-ui font, transparent background, slate-500 arrows. Inherits
  GitHub's dark/light mode framing instead of fighting it.
- Tighter strokes (1.5px instead of 2px) for a less heavy look.
- Saturated source/stage colours (slate-blue + violet) with white text
  and font-weight 500 — more punch than the pastel default.

Per-diagram:
- README pipeline: stages now use stadium shape (...) for a pill look
  instead of rectangle [...]. Wiki output notes also stadium. Cylinders
  stay for data artifacts (the shape carries meaning: store).
- ARCHITECTURE architectural diagram: same paint job; subgraphs keep
  their emoji headers (🌐 ⚙️ 💾 📚) — the icon-as-section-header
  pattern reads cleanly.
- ARCHITECTURE step-by-step: keeps rectangles (each node carries 4-6
  lines of left-aligned text — pill shapes would stretch). Adopts
  the new palette + saturated start/done nodes for stronger anchoring.

No structural changes — every node, edge and label is unchanged. Pure
styling pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r steps

Architectural diagram:
- The previous version tried to encode every read AND every write. With
  items.json read by five later stages plus mutated by three of them,
  Mermaid's auto-layout gave up and the edges crossed everything.
- Drop the read arrows. The pipeline is now a clean LR chain inside
  the Pipeline subgraph (extract → fetch → vocab → enrich → topics →
  generate), with dashed writes/mutates into data/ and thick arrows
  into the vault.
- Reads are documented in the Step-by-step below (and the prose under
  the diagram explicitly says so), so the diagram no longer carries
  that responsibility.

Step-by-step diagram:
- Mermaid with 7 boxes of 5-6 lines of <b>/<i>/<br> markup each was a
  bad shape for that much text — every box was a wall of styled
  string that Mermaid laid out vertically anyway.
- Replace with an HTML table, one row per step (card pattern). Each
  card uses native markdown inside the cell: H4 title, lead paragraph,
  bullet list with Reads / Writes / behavioural notes. Renders cleanly
  on GitHub, far more legible than the Mermaid block.
- Start / Done are now plain blockquotes wrapping the cards.

No content change — every read, write and behavioural note from the
old diagram is in the cards. Format is the only thing that moved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous flowchart with three branches and long-text boxes was the
wrong tool: it tried to encode three alternative *sequences* using a
static fork. The actual model is 'same starting point, three different
paths over time' — which is exactly what sequenceDiagram is for.

New shape: one sequenceDiagram with alt/else for the three modes:

- claude-code (default): CLI writes worksheet → you fill it in a Claude
  Code session → --apply imports it → validate → write items.json.
- api (unattended): a loop of prompt → judgment → validate per item,
  then a single store write at the end.
- manual: same worksheet path as claude-code, but you fill it by hand.

Validation is now visible as a discrete step on every path — the
mechanical guardrail layer is part of the picture.

system-ui font + transparent background via %%{init}%%, matching the
other diagrams.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mermaid's flowchart parser accepts XML-style self-closing <br/>, but
the sequence-diagram parser does not — it only accepts <br>. The
multi-line Note in the executor sequence diagram was unrenderable on
GitHub because of this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub's Mermaid build rejects every HTML tag inside sequenceDiagram
Note over (both <br/> and <br> parse to INVALID). The reliable form is
a single-line note — Mermaid wraps it automatically at the actor span.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`xbrain enrich --executor <mode>` had `<` and `>` in the body of a
sequenceDiagram message. Mermaid's tokenizer for sequenceDiagram
includes <,>,/ as arrow-character set; even though they appear after
the message colon they can still trip strict builds. Replace the
placeholder with MODE — same semantics, parser-safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub's sequenceDiagram parser is stricter than the standalone
Mermaid build. It interprets [, ], {, } as message/activation tokens
even mid-string, and semicolons/apostrophes can prematurely terminate
notes. Strip every special char from messages and notes:

- judgments[] → judgments
- { summary, primary_topic, topics[] } → summary, primary topic, topics
- prompt(rubric + item + vocab) → prompt with rubric, item and vocab
- (rubrics + guardrails) → against rubrics and guardrails
- worksheet's → worksheet
- "; " → comma or split into separate phrases

Same semantics, parser-clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gram

The single sequenceDiagram with alt/else for all three modes was hard
to scan: every reader had to mentally separate which step belonged to
which branch. Restructure as:

1. Compact intro table up top — Mode / Cost / When you reach for it.
   Lets the reader pick a mode in five seconds.

2. One H3 subsection per mode (claude-code, api, manual). Each has:
   - What it does (mechanism, one paragraph)
   - Why this mode exists (the design reason, the trade-off)
   - When to use it (concrete trigger)
   - A small sequenceDiagram showing only that mode's flow

3. The closing 'common to all three' paragraph stays implicit in the
   intro ('All three modes end the same way: validate then persist').

The smaller per-mode diagrams are also simpler for Mermaid's
sequenceDiagram parser — no alt/else, fewer actors per diagram
(claude-code has CC as participant, api has API, manual has neither),
and a clear linear flow each time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…amily

Apply the same palette + typography pass to the three execution-mode
diagrams that the flowcharts already use:

- actorBkg / actorBorder / actorTextColor in the saturated violet
  (#7c3aed / #5b21b6 / #fff) that the stage nodes use in the pipeline
  diagram. The participants now read as 'system pieces' instantly.
- noteBkgColor / noteBorderColor / noteTextColor in the amber that
  carries 'data / artifact' in the rest of the README.
- signalColor / signalTextColor in slate — fine, neutral arrows that
  let the actors pop.
- labelBox in the saturated cyan that 'External' and 'start/done' use
  elsewhere — keeps alt/loop tags consistent.
- autonumber on every diagram. Steps now read 1, 2, 3... — standard
  UML sequence-diagram convention and makes the prose under each
  diagram referenceable ('step 4 validates...').

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Tailwind violet (#7c3aed / #5b21b6) reads as 'AI default landing
page' — the same hue every AI tool uses. Replace with slate-800
(#1e293b) + slate-600 (#475569) borders across every diagram:

- README pipeline (flowchart): classDef stage
- README execution-modes (3 sequenceDiagrams): actorBkg + actorBorder
- ARCHITECTURE architectural diagram (flowchart): classDef stage

Notes (amber), sources (cyan) and wiki outputs (emerald) stay — those
weren't the AI-slop hue. The slate gives the actors / stages a serious
'Linear / Vercel / Stripe' weight while the amber+cyan+emerald carry
the semantic colour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs: add ARCHITECTURE.md — pipeline stages, artifacts, invariants
…bcommand

Adds an automatic safety net: every destructive command (`vocab --regenerate`,
`topics --resynth`, `fetch --force`) snapshots the full `data/` directory to
`data/snapshots/<UTC-ts>-pre-<command>/` *before* it writes anything. If the
re-run produces worse results, `xbrain snapshot restore <name>` brings the
previous good state back. Foundation for #18 (`xbrain diff`).

Implementation:
- New `xbrain.snapshot` module with a pure, fully-typed lifecycle API:
  `snapshot_create`, `snapshot_pre`, `snapshot_list`, `snapshot_show`,
  `snapshot_restore`, `snapshot_prune`. Each artifact is copied via
  `shutil.copy2`, the manifest is written through the existing atomic-write
  helper. Counts in the manifest come from parsing the live files (yaml for
  vocab, json for the rest).
- Five new CLI verbs under `xbrain snapshot {create,list,show,restore,prune}`.
- Auto-snapshot wired into the three destructive code paths in `cli.py`:
  `_run_fetch` (when `--force`), `_vocab_apply` / `_vocab_run` (when
  `--regenerate`), `_topics_run` (when `--resynth`). A failed snapshot
  propagates and aborts the destructive op — never silently skipped.
- The vault is intentionally NOT snapshotted (it is fully derivable from
  `data/` via `xbrain generate`).

Spec deviation worth flagging: the PRD mentioned `enrich --regenerate` as a
fourth destructive site, but the CLI has no such flag — re-enrichment is
triggered via `vocab --regenerate`, which already calls `_mark_for_regenerate`.
Three destructive sites in total (vocab/topics/fetch), not four.

Tests:
- 14 unit tests on the snapshot module (empty data, full data, naming rules,
  list ordering, restore round-trip, restore deletes a live file missing from
  the snapshot, restore does not touch unrelated files, prune keep-last,
  prune=0 clears everything, prune rejects negative, show on unknown raises).
- 10 integration tests via `CliRunner`: every destructive flag creates the
  expected `pre-<op>` snapshot; the same flag absent creates none; every new
  `xbrain snapshot` CLI verb is exercised end-to-end.

Total: 245 tests (up from 235), 87% coverage, `uv run poe check` all-green.

Docs:
- README: new `snapshot` row in the Commands table + new "Snapshots & safety"
  section linked from the TOC.
- ARCHITECTURE: new invariant #8 ("destructive ops are reversible"); each of
  the three destructive cards gains a "Snapshots `data/` before <flag>" note;
  `snapshot.py` listed in "Where things live".

Closes #17.

PRD:  vault/zz-support-files/docs/prds/2026-05-21-xbrain-17-auto-snapshot.md
Plan: vault/zz-support-files/docs/implementation-plans/2026-05-21-xbrain-17-auto-snapshot.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ening

Addresses every HIGH/CRITICAL finding from the review pipeline on PR #22
(code-reviewer, silent-failure-hunter, pr-test-analyzer, python-code-reviewer,
code-simplifier, spec-compliance):

snapshot.py:
- snapshot_create returns (Path, SnapshotManifest) — callers (incl. _auto_snapshot)
  now print the item count from the manifest just written, matching PRD §5
  observability ("Snapshot created: <path> (N items)").
- New `dir_label` parameter separates directory naming from manifest.command:
  manifest now records "vocab-regenerate" (the op name) while the directory
  carries the `pre-` prefix. Fixes the dual-purpose smell flagged by code-reviewer.
- snapshot_pre removed — inlined in _auto_snapshot (code-simplifier).
- Timestamp gains millisecond precision (`%Y-%m-%dT%H-%M-%S-NNNZ`). Eliminates
  the same-second collision bug flagged by pr-test-analyzer #9 and
  python-code-reviewer #2. As a side effect, the suite no longer needs
  `time.sleep(1.1)` between snapshots — total test runtime dropped from 7s to <1s.
- snapshot_restore now uses `shutil.copy2` symmetrically with snapshot_create
  instead of the text round-trip via `_atomic_write`. Binary-safe, metadata-
  preserving, and no longer asymmetric (code-reviewer/python-code-reviewer
  both flagged this as the must-fix-before-merge issue).
- snapshot_restore returns a list of (artifact, action) tuples — RESTORE_COPIED,
  RESTORE_DELETED, RESTORE_SKIPPED. The CLI prints every action, so a deletion
  from a "missing in snapshot" artifact is never silent (silent-failure-hunter
  #1 HIGH).
- snapshot_list now returns rows with `manifest=None` for corrupt directories
  instead of silently dropping them; the CLI marks those as CORRUPT on stderr
  (silent-failure-hunter #2 HIGH).
- _count_* helpers now propagate exceptions instead of swallowing them — a
  corrupt items.json aborts the snapshot, not records a lying count=0
  (silent-failure-hunter #3, code-reviewer #3). Inlined the trivial
  _count_items/_count_topics wrappers (code-simplifier #1).
- All imports (json, yaml, importlib.metadata) at the module top (code-reviewer #3).
- _count_items/_count_topics removed (one-line wrappers, code-simplifier #1).

cli.py:
- _OPERATOR_ERRORS now includes OSError (covers PermissionError, FileExistsError,
  IsADirectoryError) so snapshot I/O failures surface as clean exit-1 instead
  of raw tracebacks (silent-failure-hunter #4).
- _auto_snapshot now reads the count from the manifest and emits the spec-
  mandated English message: `Snapshot created: <dir> (N items)` (pr-test-analyzer
  #10, python-code-reviewer #3).
- snapshot_restore_cmd echoes every per-artifact action.
- snapshot_list_cmd handles `manifest=None` rows as CORRUPT (to stderr).
- snapshot_create_cmd uses the new (path, manifest) return shape and
  passes `command="manual"` + `dir_label=name`.
- Strings translated to English (the whole new subcommand group; the rest of
  the CLI stays Spanish — out of scope here).

Tests:
- test_snapshot.py: 21 unit tests (up from 14). New: round-trip across ALL FOUR
  artifacts (pr-test-analyzer #1 CRITICAL), millisecond-collision regression,
  corrupt-JSON-aborts-snapshot, dir_label separation from command,
  shutil.copy2-preserves-bytes (binary-safety smoke test), per-artifact action
  codes, xbrain_version assertion (pr-test-analyzer #5), prune-with-fewer-than-
  keep_last (pr-test-analyzer #6).
- test_snapshot_auto.py: 16 integration tests (up from 10). New: snapshot-
  taken-before-mutation-when-op-fails (pr-test-analyzer #2 CRITICAL — uses
  monkeypatch to force `_mark_for_regenerate` to raise, asserts snapshot
  already on disk + items.json unchanged), snapshot-failure-aborts-destructive-op
  (pr-test-analyzer #3 CRITICAL — monkeypatch snapshot_create to raise OSError,
  assert fetch --force aborts and nothing is mutated), snapshot show CLI
  (pr-test-analyzer #7), restore-via-CLI-with-missing-artifact
  (pr-test-analyzer #8), corrupt-dirs-marked-via-CLI, stdout-includes-item-count
  (pr-test-analyzer #10).
- All 258 tests pass; coverage 87%.

CONTRIBUTING.md:
- Added a "Safety: destructive operations auto-snapshot" section
  (spec-compliance #FAIL — closes the doc gap).

Deviation log unchanged: 3 destructive sites (`vocab --regenerate`,
`topics --resynth`, `fetch --force`) — `enrich --regenerate` does not exist
as a CLI flag, re-enrichment happens via `vocab --regenerate` which is
already covered. Spec-compliance reviewer confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[#17] Auto-snapshot data/ before destructive ops + xbrain snapshot subcommand
Adds [output].language to config.toml. Default is "English"; "Spanish" also
supported. The setting drives:

- LLM output language (rubric-summary, rubric-topic-page, rubric-vocab all
  carry a {language} placeholder, substituted at prompt-assembly time via
  the new `load_rubric(name, language=...)` kwarg).
- Wiki UI strings (Topics:, Content:, Summary, Primary posts, Also relevant)
  via a new `xbrain.i18n` module — a frozen Strings dataclass + dict keyed
  by language, with `strings_for(language)` accessor and a derived
  SUPPORTED_LANGUAGES tuple.

Plumbing: Config gains `output_language`; cli.py threads it into every
LLM call-site (ApiExecutor, induce_vocab, export_*_worksheet,
synthesize_overviews_api) and into the generators (generate, write_topic_pages,
render_topic_page). The four rubrics in src/xbrain/rubrics/ stay in English
for the LLM instructions; only the {language} OUTPUT is parameterised.

Default behaviour change: a repo with no [output] section now generates
English output (was Spanish before). Víctor confirmed this trade — he reads
both languages — in exchange for a sane default for new users. To convert
an existing Spanish corpus to English, run `xbrain enrich --regenerate` +
`xbrain topics --resynth` (both auto-snapshotted thanks to #17). Documented
in README.

Tests (10 new + 21 updated to pass output_language):
- tests/test_i18n.py: 6 unit tests (Spanish/English entries, unknown raises,
  every field populated, dataclass frozen).
- tests/test_rubrics.py: 4 new tests for {language} substitution behaviour.
- tests/test_config.py: 3 new (default English, Spanish round-trip, unknown
  rejected).
- tests/test_generate.py: 3 new integration tests (English by default,
  Spanish on demand, bogus language rejected).
- 21 existing tests updated to pass output_language to the changed signatures.

Total: 274 tests (up from 264), coverage 88% (up from 87%), `uv run poe
check` all-green.

Docs:
- README Configuration table + new [output] section in the example.
- README note: switching language after the corpus is enriched requires
  enrich --regenerate + topics --resynth (auto-snapshotted).
- ARCHITECTURE Rubrics section now describes the {language} placeholder
  mechanism + i18n.py.
- CONTRIBUTING new "Adding an output language" mini-subsection.

PRD:  vault/zz-support-files/docs/prds/2026-05-22-xbrain-16-output-language.md
Plan: vault/zz-support-files/docs/implementation-plans/2026-05-22-xbrain-16-output-language.md

Closes #16.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fensive check

Addresses every actionable finding from the 6-reviewer panel on PR #23:

simplifier:
- Strings dataclass: 7 fields → 5. Dropped `tweet_header` (identical in both
  languages — inlined "## Tweet" in generate.py). Collapsed
  `topics_label` + `index_topics_header` into a single `topics_label` since
  both rendered to the same value within each language. Adding a second
  variant suggested variance that did not exist.
- Config validation: replaced the 6-line `not in SUPPORTED_LANGUAGES`
  branch with a single `strings_for(output_language)` call — that function
  already raises ValueError listing the supported set, so this collapses
  to one line and removes the duplicated error message.

silent-failure-hunter M1 (defensive check for {language} placeholder leak):
- `load_rubric(name, *, language=...)` now runs a case-insensitive regex
  sniff for any leftover `{...language...}` placeholder after substitution.
  A typo like `{Language}` (capital L) would silently survive `str.replace`
  and ship the literal placeholder to the LLM, producing wrong-language
  output. The check raises ValueError naming the unresolved placeholder.
- New test asserts the sniff catches `{Language}`.

pr-test-analyzer CRITICAL gaps 1, 2, 3, 6 (LLM-prompt-substitution invariant):
- test_executors_api: new test asserts the system prompt sent to the
  Anthropic client has `{language}` substituted (no leftover, `**Language:**
  Spanish` present for Spanish).
- test_topic_synth: new test asserts both `{language}` placeholders in
  rubric-topic-page.md are substituted in the system prompt (`in Spanish`
  appears twice).
- test_worksheet: extended the existing rubric-shipping test to also
  assert the rubric inside the worksheet has the placeholder substituted
  — the Claude Code session reads it as-is.
- test_vocab: extended `induce_vocab` test to assert the system prompt
  contains no `{language}` literal and includes the language name.

Out of scope (documented for follow-up): two pre-existing bare
`except Exception` blocks (`ApiExecutor.enrich_items`,
`synthesize_overviews_api`) silently swallow API/validation errors with
only a warn-line. Not introduced here; surface as a separate issue.

Total: 278 tests (up from 274), coverage 88% unchanged, `uv run poe check`
all-green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[#16] Configurable output language — English default, Spanish supported
@VGonPa VGonPa merged commit 9771e2d into main May 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant