Skip to content

docs: comprehensive README#6

Merged
VGonPa merged 1 commit into
developfrom
readme-overhaul
May 19, 2026
Merged

docs: comprehensive README#6
VGonPa merged 1 commit into
developfrom
readme-overhaul

Conversation

@VGonPa
Copy link
Copy Markdown
Owner

@VGonPa VGonPa commented May 19, 2026

Full rewrite of the README — the previous one predated WS2 (no vocab/topics, no execution modes, no fetch_x/Firecrawl).

Now covers, for a reader who knows nothing about the project:

  • The 3-layer wiki (items → topics → index) with a diagram
  • The 6-stage pipeline (extract → fetch → vocab → enrich → topics → generate) with a data/items.json-hub flow diagram
  • Execution modes — the pluggable claude-code / api / manual executor tracks, with a diagram and the end-to-end worksheet flow
  • Commands table, full configuration table, prerequisites, authentication (Chrome + Safari cookie import)
  • "How it works" — the LLM-emits-only-judgment rule, the data stores, broken-link evidence
  • Project structure tree, development / quality-gate, responsible use

Badges: CI status, Python version, license. Validated against the writing-readmes checklist; internal links resolve. Docs-only — no code change.

🤖 Generated with Claude Code

@VGonPa VGonPa merged commit 7e33a0e into develop May 19, 2026
1 check passed
@VGonPa VGonPa deleted the readme-overhaul branch May 19, 2026 16:13
VGonPa added a commit that referenced this pull request May 21, 2026
…ening

Addresses every HIGH/CRITICAL finding from the review pipeline on PR #22
(code-reviewer, silent-failure-hunter, pr-test-analyzer, python-code-reviewer,
code-simplifier, spec-compliance):

snapshot.py:
- snapshot_create returns (Path, SnapshotManifest) — callers (incl. _auto_snapshot)
  now print the item count from the manifest just written, matching PRD §5
  observability ("Snapshot created: <path> (N items)").
- New `dir_label` parameter separates directory naming from manifest.command:
  manifest now records "vocab-regenerate" (the op name) while the directory
  carries the `pre-` prefix. Fixes the dual-purpose smell flagged by code-reviewer.
- snapshot_pre removed — inlined in _auto_snapshot (code-simplifier).
- Timestamp gains millisecond precision (`%Y-%m-%dT%H-%M-%S-NNNZ`). Eliminates
  the same-second collision bug flagged by pr-test-analyzer #9 and
  python-code-reviewer #2. As a side effect, the suite no longer needs
  `time.sleep(1.1)` between snapshots — total test runtime dropped from 7s to <1s.
- snapshot_restore now uses `shutil.copy2` symmetrically with snapshot_create
  instead of the text round-trip via `_atomic_write`. Binary-safe, metadata-
  preserving, and no longer asymmetric (code-reviewer/python-code-reviewer
  both flagged this as the must-fix-before-merge issue).
- snapshot_restore returns a list of (artifact, action) tuples — RESTORE_COPIED,
  RESTORE_DELETED, RESTORE_SKIPPED. The CLI prints every action, so a deletion
  from a "missing in snapshot" artifact is never silent (silent-failure-hunter
  #1 HIGH).
- snapshot_list now returns rows with `manifest=None` for corrupt directories
  instead of silently dropping them; the CLI marks those as CORRUPT on stderr
  (silent-failure-hunter #2 HIGH).
- _count_* helpers now propagate exceptions instead of swallowing them — a
  corrupt items.json aborts the snapshot, not records a lying count=0
  (silent-failure-hunter #3, code-reviewer #3). Inlined the trivial
  _count_items/_count_topics wrappers (code-simplifier #1).
- All imports (json, yaml, importlib.metadata) at the module top (code-reviewer #3).
- _count_items/_count_topics removed (one-line wrappers, code-simplifier #1).

cli.py:
- _OPERATOR_ERRORS now includes OSError (covers PermissionError, FileExistsError,
  IsADirectoryError) so snapshot I/O failures surface as clean exit-1 instead
  of raw tracebacks (silent-failure-hunter #4).
- _auto_snapshot now reads the count from the manifest and emits the spec-
  mandated English message: `Snapshot created: <dir> (N items)` (pr-test-analyzer
  #10, python-code-reviewer #3).
- snapshot_restore_cmd echoes every per-artifact action.
- snapshot_list_cmd handles `manifest=None` rows as CORRUPT (to stderr).
- snapshot_create_cmd uses the new (path, manifest) return shape and
  passes `command="manual"` + `dir_label=name`.
- Strings translated to English (the whole new subcommand group; the rest of
  the CLI stays Spanish — out of scope here).

Tests:
- test_snapshot.py: 21 unit tests (up from 14). New: round-trip across ALL FOUR
  artifacts (pr-test-analyzer #1 CRITICAL), millisecond-collision regression,
  corrupt-JSON-aborts-snapshot, dir_label separation from command,
  shutil.copy2-preserves-bytes (binary-safety smoke test), per-artifact action
  codes, xbrain_version assertion (pr-test-analyzer #5), prune-with-fewer-than-
  keep_last (pr-test-analyzer #6).
- test_snapshot_auto.py: 16 integration tests (up from 10). New: snapshot-
  taken-before-mutation-when-op-fails (pr-test-analyzer #2 CRITICAL — uses
  monkeypatch to force `_mark_for_regenerate` to raise, asserts snapshot
  already on disk + items.json unchanged), snapshot-failure-aborts-destructive-op
  (pr-test-analyzer #3 CRITICAL — monkeypatch snapshot_create to raise OSError,
  assert fetch --force aborts and nothing is mutated), snapshot show CLI
  (pr-test-analyzer #7), restore-via-CLI-with-missing-artifact
  (pr-test-analyzer #8), corrupt-dirs-marked-via-CLI, stdout-includes-item-count
  (pr-test-analyzer #10).
- All 258 tests pass; coverage 87%.

CONTRIBUTING.md:
- Added a "Safety: destructive operations auto-snapshot" section
  (spec-compliance #FAIL — closes the doc gap).

Deviation log unchanged: 3 destructive sites (`vocab --regenerate`,
`topics --resynth`, `fetch --force`) — `enrich --regenerate` does not exist
as a CLI flag, re-enrichment happens via `vocab --regenerate` which is
already covered. Spec-compliance reviewer confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VGonPa added a commit that referenced this pull request May 22, 2026
Addresses every HIGH/MEDIUM finding from the 6-reviewer panel on PR #28
(code-reviewer + python-code-reviewer + spec-compliance + test-analyzer
APPROVED; silent-failure-hunter and simplifier flagged actionable items):

silent-failure-hunter MEDIUM #1: silent empty-diff on missing dirs.
- `diff_snapshots` now validates both directories exist on disk; raises
  FileNotFoundError with the missing path if not.
- Validates that at least one artifact exists on either side; if both are
  fully empty, raises FileNotFoundError naming both dirs (guards against
  the "data/ deleted out-of-band" scenario where diff would otherwise
  silently report 'everything was removed').

silent-failure-hunter MEDIUM #2: corrupt-file errors lacked context.
- Each loader call inside `diff_snapshots` is wrapped to add the path to
  the ValueError message ("failed to load <path>: <orig msg>"), so a
  malformed items.json / vocab.yaml / topics.json surfaces with the
  specific file rather than a bare pydantic / json traceback.

simplifier #1: `_tfidf_cosine` renamed to `_tf_cosine`.
- The function uses plain TF cosine, not TF-IDF (with only 2 documents,
  IDF degenerates). Docstring already explained this; renaming the
  symbol stops the name from lying. Module docstring + every call site
  + import updated.

simplifier #2: `VocabDiff.unchanged: list[str]` was only ever consumed
as `len(...)`. Replaced with `unchanged_count: int`. JSON output is
slightly smaller on large vocabs and the data shape stops promising
information the consumers don't read.

spec-compliance follow-up: `diff.py` added to the "Where things live"
tree in ARCHITECTURE.md.

Tests:
- Three new tests for the validation: missing dir → FileNotFoundError,
  both-empty → FileNotFoundError, corrupt items.json → ValueError with
  path in message.
- Existing tests updated for the `unchanged_count` rename.

Skipped (out of scope for this round, documented in task #88):
- test-analyzer polish (French tokenizer test, JSON schema-stability
  deeper assertions, secondary-topic-no-reassign test) — improvements
  not blockers.
- simplifier #3, #4 (drop TopicChange.unchanged, drop DiffSummary) —
  borderline derivability vs. JSON consumer ergonomics.
- simplifier #5, #6 (Literal["text","json"] dispatch, drop
  diff_snapshots threshold kwargs) — internal-only style.
- code-reviewer/python-code-reviewer naming nit (`reassigned_pct`
  carrying a fraction) — purely cosmetic, internal consistency intact.

Total: 329 tests (up from 326), coverage 89%, `uv run poe check`
all-green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant