Skip to content

Add migration runner — atomic_agents.migrate (closes #4)#15

Closed
dep0we wants to merge 1 commit into
feat/goal-managerfrom
feat/migration-runner
Closed

Add migration runner — atomic_agents.migrate (closes #4)#15
dep0we wants to merge 1 commit into
feat/goal-managerfrom
feat/migration-runner

Conversation

@dep0we
Copy link
Copy Markdown
Owner

@dep0we dep0we commented May 7, 2026

Summary

Implements issue #4 — schema migration runner per spec/03-file-formats. The highest-stakes module in the stack — bad migration corrupts vaults. Snapshot before, dry-run preview, validate after, atomic rollback on failure.

Design

The defensive architecture is what's load-bearing here:

  1. Snapshot ALWAYS taken before any non-dry-run migration. tar+gzip of the entire vault, excluding regenerable artifacts (logs, caches, dashboard). Lives at _migrations/snapshots/YYYY-MM-DD_pre_vN_migration.tar.gz.
  2. Dry-run mandatory for first pass. Migration scripts respect dry_run and don't write when set.
  3. Post-migration validation runs after every script application. Every touched file must pass the helper's current frontmatter validator.
  4. Automatic rollback on:
    • Validation failure for any file
    • Catastrophic exception during application
    • Migration script crash
  5. Rollback failure (when both migration and rollback fail) leaves the vault in an inconsistent state with explicit error guidance — operator manually recovers from the snapshot.

Forward-only by design (refuses target ≤ current). Forbids gaps in the migration chain (refuses v1→v3 if v2 isn't reachable through v1→v2).

Important safety property

The helper's CURRENT_SCHEMA_VERSION stays at the version it can handle. Until a migration script AND a helper bump land together, post-migration validation correctly rejects the migrated files and rolls back. This is intentional safety — you can't migrate to a version the helper doesn't yet support.

The test suite proves this: test_run_migration_real_applies_and_validates runs a v1→v2 migration with the helper at v1, observes the rollback fires, and verifies the original files are restored intact.

What's in this PR

New module: atomic_agents/migrate.py (~600 LOC)

  • MigrationScript Protocol — defines the script interface
  • Script discovery + chain validation (rejects gaps, version mismatches, missing attributes, underscore-prefixed templates)
  • Content file walker (excludes _dashboard, _migrations, _cache, .git, INDEX.md)
  • Version detection (lowest if mixed = needs migration)
  • Plan builder (chains scripts current → target)
  • Snapshot create/restore with proper exclusions
  • Migration runner with mandatory snapshot, per-script application, post-validation, auto-rollback
  • Vault status, snapshot listing
  • CLI: --to vN, --dry-run, --status, --rollback, --list-snapshots

Tests: tests/test_migrate.py — 32 tests, all passing

  • Version parsing (incl. v prefix optional)
  • Content file walker (positive + excluded-dirs + empty)
  • Version detection (single + mixed + empty vault)
  • Script discovery: positive case + 4 rejection cases (version skip, version mismatch, missing attribute, underscore prefix)
  • Plan building: chain discovery + below-current-rejection + missing-script-rejection
  • Snapshot creation: writes tarball, includes content, excludes meta dirs
  • Restore round-trip: mutate file → restore → original recovered
  • Snapshot listing: newest first
  • End-to-end migration: dry-run preserves files; real migration with validation rollback proves the safety mechanism works
  • Migration target ≤ current rejected
  • File-touched recording

Total suite: 190/190 passing.

Spec coverage

Per spec/03 acceptance criteria (issue #4):

  • ✅ Migration directory at <agents_root>/_migrations/
  • ✅ Migration script protocol with applies_to() + migrate(path, dry_run)
  • ✅ Backup before migrate (tar+gzip snapshot)
  • ✅ Dry-run mandatory and respected
  • ✅ Post-migration validation
  • ✅ Rollback on validation failure
  • ✅ Helper read-time adaptation (was already in _schema.py from v0.1)
  • ✅ Multi-agent atomicity (snapshot covers entire <agents_root>)
  • ✅ CLI: --to, --dry-run, --status, --rollback, --list-snapshots
  • Lazy migration (write-time per-file migration) is documented in the helper but the runtime lazy-write path needs the helper's _schema to expose old-schema read adaptation per spec/03 — that's a follow-up to wire into agent.py's capture write path.

Test plan

  • uv run pytest tests/test_migrate.py — 32 tests pass
  • uv run pytest — full suite (190 tests) passes
  • Real-world test: when the spec bumps to v2, write the actual v1_to_v2.py script and run it against a vault with real data

🤖 Generated with Claude Code

Implements the schema migration runner per spec/03-file-formats.

The highest-stakes module: bad migration corrupts vaults. Snapshot before,
dry-run preview, validate after, atomic rollback on failure.

Key pieces:

- MigrationScript Protocol — each vN_to_vM.py script in
  <agents_root>/_migrations/ implements:
    FROM_VERSION, TO_VERSION  (validated against filename pattern)
    applies_to(path) -> bool
    migrate(path, dry_run) -> dict
- discover_scripts() — finds scripts, validates filename ↔ module version
  consistency, rejects gaps (v1→v3 skipping v2), sorts by version chain
- find_content_files() — walks <agents_root>/<agent>/{memory,wiki}/*.md;
  excludes _dashboard, _migrations, _cache, .git, INDEX.md
- get_current_vault_version() — reads schema_version from a sample of
  files; returns lowest if mixed (treat as needing migration up)
- build_migration_plan() — chains scripts from current → target;
  rejects gaps in the chain
- create_snapshot() — tar+gzip of vault to
  _migrations/snapshots/YYYY-MM-DD_pre_vN_migration.tar.gz; excludes
  meta dirs and log JSONLs (regenerable)
- restore_snapshot() — clears current vault content, extracts snapshot
- run_migration() — orchestrator with mandatory snapshot, per-script
  application, post-migration validation, automatic rollback on:
    - Validation failure (any file's frontmatter doesn't pass new schema)
    - Catastrophic exception during application
- vault_status() — current version, available scripts, snapshots
- list_snapshots() — newest-first
- CLI: --to vN, --dry-run, --status, --rollback, --list-snapshots

HARD RULES enforced (per spec/03):
- Forward-only (rejects target ≤ current)
- Multi-agent atomicity (any file failure rolls back all changes)
- Snapshot ALWAYS taken before non-dry-run migration
- Validation runs after every migration; rollback on failure
- Rollback failure (when both migration AND rollback fail) leaves vault
  in inconsistent state; explicit error guides operator to manual recovery
- Custom user-added frontmatter fields preserved (frontmatter library
  round-trips unknown fields automatically)

Important safety design: the helper's CURRENT_SCHEMA_VERSION stays at
the version it can handle; until a migration AND a helper bump land
together, post-migration validation correctly rejects the migrated
files and rolls back. This is intended — you can't migrate to a
version the helper doesn't support yet.

Tests: 32 new tests covering script discovery (positive + 4 error cases:
version skip, version mismatch, missing required attribute, underscore-
prefix exclusion), content file walking (incl. excluded dirs), version
detection (with mixed versions returning lowest), plan building (chain
discovery + 2 error cases), snapshot creation/restore (round-trip with
mutation in between), snapshot listing (newest first), end-to-end
migration (dry-run preserves files; real migration with validation
rollback proves the safety mechanism works), file-touched recording.

Total suite: 190/190 passing.
@dep0we
Copy link
Copy Markdown
Owner Author

dep0we commented May 7, 2026

Content landed on main as part of #16's squash merge (PR #16's base was retargeted to main but its head branch contained the migration-runner commits as ancestors, so the squash absorbed both).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant