Skip to content

v1.x polish: phase snapshots, generic domain-evaluator, denylist-first#48

Merged
SamPlvs merged 1 commit into
mainfrom
claude/interesting-dubinsky-90747c
Apr 20, 2026
Merged

v1.x polish: phase snapshots, generic domain-evaluator, denylist-first#48
SamPlvs merged 1 commit into
mainfrom
claude/interesting-dubinsky-90747c

Conversation

@SamPlvs
Copy link
Copy Markdown
Owner

@SamPlvs SamPlvs commented Apr 20, 2026

Summary

Three independent v1.x items landed in one session, all post .zo/ portable memory (PR #44-47):

  • Phase completion snapshots (C1) — new src/zo/snapshots.py module. PhaseSnapshot pydantic model with schema_version: 1, MD+YAML frontmatter format (same pattern as STATE.md), write/list/load helpers. Orchestrator writes a snapshot at every gate PROCEED (both automated and human paths). Uses MemoryManager.memory_root so portable .zo/memory/snapshots/ works automatically. Non-blocking on failure (snapshots are reporting artifacts, not correctness gates).
  • Domain-evaluator → generic shell — stripped all domain-specific content. Agent identity comes exclusively from the plan's **Agent adaptations:** block at build time (PR-020 mechanism). Hard stop-rule: if no adaptation is present, the agent asks Lead Orchestrator rather than produce a generic report. Plan Architect already proposes these during zo draft.
  • Denylist-first DL prior codified — PR-026 (Phase 1 learning: curated allowlist silently limited feature space to <1% of available signals) propagated into specs/workflow.md Subtask 1.3 callout + new "Pipeline Principles" section in .claude/agents/data-engineer.md. Rule: include all signals, exclude only leakage; feature selection is Phase 2 model-dependent; flag any reduction >10× from inherited lists.

Deferred (by design, not scope cut)

  • Experiment capture-layer — schema sketched in this session (design-only). Implementation deferred per PR-005 until prod-001 Phase 4 generates real iteration data. Building the autonomous loop (plateau detector, proposer, budget-aware selection) from speculation would tune for demo-scale failure modes and retrofit for production.
  • Remote-data manifest for zo draft — skipped. Portable .zo/ memory (PR feat(memory): portable project memory via .zo/ directory #44) already solved the cross-machine case; you can now run zo draft directly on the host that has the data.

Memory updates

  • STATE.md — current position rewritten; three completed items added; What's Next reorganised (three items ticked off, capture-layer sketch added as feat: ZO e2e validated — wrapper fix, README, MNIST project complete #7); session metadata updated (session-019, 557 tests).
  • DECISION_LOG.md — four new entries: denylist codification, domain-evaluator refactor (Option A: pure generic shell), phase snapshots (C1), capture-layer design sketch.
  • PRIORS.md — no update (no failures this session).

Test plan

  • uv run pytest -q — 557 passed, 7 skipped (+28 new tests)
  • uv run ruff check src/zo/ — clean
  • ./scripts/validate-docs.sh — 10/10 passed, 1 pre-existing warning (stale test-count badge)
  • 23 unit tests (tests/unit/test_snapshots.py): model defaults, render (frontmatter/body/subtasks/artifacts/duration/placeholders/notes), write (dir creation, filename format, content), list (empty/newest-first/filter-by-phase), load (none/roundtrip)
  • 5 integration tests (tests/integration/test_phase_snapshots.py): human gate PROCEED writes snapshot, snapshot records gate decision, automated gate PROCEED writes snapshot, snapshot captures subtask state, file has valid YAML frontmatter

🤖 Generated with Claude Code

…nylist-first

Three independent v1.x items landed:

1. Phase completion snapshots (C1): new src/zo/snapshots.py module with
   PhaseSnapshot pydantic model (schema_version: 1), MD+YAML frontmatter
   render, write, list, load. Orchestrator writes a snapshot at every
   gate PROCEED (automated + human). Uses memory_root so portable
   .zo/memory/snapshots/ works automatically. 23 unit + 5 integration
   tests. Non-blocking on failure (snapshots are reporting, not gates).

2. Domain-evaluator refactored to generic shell: stripped domain-specific
   content (physics/chemistry/process engineering). Agent identity comes
   exclusively from the plan's **Agent adaptations:** block at build
   time (PR-020 mechanism). Hard stop-rule: if no adaptation present,
   the agent asks Lead Orchestrator rather than produce a generic
   report. Plan Architect already proposes these during draft.

3. Denylist-first DL prior codified: PR-026 (Phase 1 learning — curated
   allowlist silently limited feature space to <1% of available signals)
   propagated into specs/workflow.md Subtask 1.3 callout + new Pipeline
   Principles section in .claude/agents/data-engineer.md. Rule: include
   all signals, exclude only leakage; feature selection is Phase 2
   model-dependent; flag any reduction >10x from inherited lists.

Memory: STATE.md updated (current position, completed items, what's
next, session metadata session-019); DECISION_LOG.md four new entries
covering the three features plus the experiment-capture-layer design
sketch (implementation deferred per PR-005 until real iteration data
grounds the loop heuristics).

Tests: 557 passed (+28 from 529), ruff clean, validate-docs 10/10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying zero-operators with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2cb4e49
Status: ✅  Deploy successful!
Preview URL: https://27378639.zero-operators.pages.dev
Branch Preview URL: https://claude-interesting-dubinsky.zero-operators.pages.dev

View logs

@SamPlvs SamPlvs merged commit b5b4778 into main Apr 20, 2026
1 check passed
@SamPlvs SamPlvs deleted the claude/interesting-dubinsky-90747c branch April 20, 2026 11:18
SamPlvs added a commit that referenced this pull request Apr 30, 2026
v1.x polish: phase snapshots, generic domain-evaluator, denylist-first
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant