Skip to content

feat(loop): agent-suggested parallel batches + end-of-run verify + loop graph DAG (hew-lf40)#59

Merged
droidnoob merged 8 commits into
mainfrom
feat/batch-plan-module
May 30, 2026
Merged

feat(loop): agent-suggested parallel batches + end-of-run verify + loop graph DAG (hew-lf40)#59
droidnoob merged 8 commits into
mainfrom
feat/batch-plan-module

Conversation

@droidnoob
Copy link
Copy Markdown
Owner

Closes the hew-lf40 epic — agent-suggested parallel batches for hew loop run --jobs N, plus end-of-run test verification and a DAG renderer for loop iteration history.

Why this exists

DECISION:loop-parallel-overlap-policy ("trust the graph") shipped in v1 as a deliberate punt: bd dep edges encode safety, and conflicts get caught at merge-back time. That's correct for sparse graphs but bites when two independent tasks touch the same file. The 2026-05-29 autonomous run made the cost real — loop_log.rs overlap between hew-2cq and hew-6nxs required a manual rebase even though both branches were green in isolation.

This epic layers informed batching on top of trust-the-graph without contradicting it:

  1. Iter agent's own suggestion — when the agent that just closed iter N emits a next_iteration: [task_ids] block, the dispatcher honors it for the next tick. Cheapest signal: the agent already has full context.
  2. Dedicated planner runtime — between iters, when (1) is silent, a small claude -p / codex exec call (capped by loop.planner.budget_tokens, default 10k) reads the bd-ready set + recent symbol-touch sets and returns the next batch. Never truncates context to fit budget — skips cleanly instead.
  3. Trust-the-graph floor — dispatch_tick intersects the batch with bd ready. The batch can only narrow the candidate set, never expand it. Floor is locked.

What lands (8 atomic commits)

commit task what
108d148 hew-58ac hew_core::batch_plan module — BatchPlan + BatchSource enum + atomic file I/O at .hew/loop/<run-id>/batch-NNN.json, schema_version=1
e33abb0 hew-7klt batch_plan_parse::extract_next_iteration — tolerant parser for the agent's close-output block (fenced next_iteration form + <next_iteration> XML form), filters malformed task IDs
f58ff12 hew-pxw9 spawn_planner — subprocess with pre-spawn budget check; failure modes all return BatchPlan::Skipped { reason } rather than propagating errors
48506d9 hew-rplg Dispatcher threading: Dispatcher::new accepts Option<BatchPlan>; dispatch_tick filters by batch ∩ bd_ready; ready_seen reflects post-filter
31ef9ff hew-7k1m CLI + config — --no-planner, --planner-budget, loop.planner.* schema; iter-end hook chooses agent → planner → skipped
5e595fa hew-z7rz hew loop summary adds planner: agent=N, runtime=M, fallback=K; docs/LOOP.md "Batch planner" section; CHANGELOG entry; new DECISION:loop-batch-planner-floor memory
dbe56b4 hew-bon7 End-of-run verify step: stack-detected test command (Rust → Node → Python → Go → Make/Just), loop.end_of_run.verify_tests config + --verify-tests flag, opt-in default false, budget-capped, writes verify.log and STATUS:loop-verify-failed: memory on failure
42014ba hew-m7lq hew loop graph — DAG renderer over iter + batch + run logs. Outputs mermaid (default), dot, or ASCII. Handles incomplete iters, cancelled runs, runtime-error-with-empty-stderr, backpressure rollback, verify outcomes, parallel worker swimlanes

Bonus: the hew loop graph unhappy paths the user flagged

Per the chore body, the graph must render the cases where things didn't go cleanly:

case node treatment
incomplete iter (started_at, no ended_at) glyph, dashed border, partial label
cancelled mid-run (stop_reason: Cancelled) glyph, gray, [Cancelled at <ts>] annotation
runtime_error with empty stderr (the 2h hang case) glyph, (no stderr — possibly hung) annotation
backpressure_fail with rollback edge back to previous iter's HEAD sha
verify failed red verify node + top 3 failed test names

CI parity

Backward compat

  • Legacy runs (no batch-*.json files in run-dir): dispatcher reads read() == None, falls through to bd ready. Byte-identical to today.
  • Legacy run.json (no verify_outcome field): #[serde(default)]None, summary line omitted.
  • --jobs=1 (the default): batch-plan layer skipped entirely. Single-worker loops don't write batch files.

Non-goals (v1)

  • Static touches-overlap analysis (would parse description prose; brittle).
  • Cross-run batch memory.
  • Auto-fix on verify failure.
  • Live-updating graph (websocket/fswatch).

🤖 Generated with Claude Code

droidnoob and others added 8 commits May 30, 2026 17:39
- BatchPlan { schema_version, iter_number, task_ids, source, reason,
  created_at, planner_tokens } + tagged BatchSource (Agent/Planner/Skipped,
  snake_case on the wire)
- path/read/write API; atomic write via loop_log::write_json_atomic;
  read returns Ok(None) on missing file and rejects mismatched
  SCHEMA_VERSION with a clear miette diagnostic
- 9 unit tests covering zero-pad path, missing-file, all three source
  roundtrips, atomic temp-cleanup, wire form, pinned version, unknown-
  version rejection

First-class artifact for the parent epic hew-lf40's batch-planner
pipeline; downstream parser/planner/dispatcher consume this type.

Closes hew-58ac.
…ew-7klt)

- New hew_core::batch_plan_parse module
- Parses fenced ```next_iteration JSON-array and <next_iteration> XML-tag CSV forms
- Hand-rolled hew-id validator (no new regex dep)
- Distinct None / Some(vec![]) / Some(ids) return states
- 13 tests including 1000-iter adversarial fuzz

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- spawn_planner in hew/src/commands/loop_cmd.rs: assembles a small
  prompt over bd_ready + recent_touches, runs a pre-spawn token
  budget check, drives the runtime, parses extract_next_iteration
  from the response.
- Every failure path returns BatchPlan { source: Skipped, reason }:
  budget_exceeded (no spawn), runtime_error, parse_error. Planner
  must never kill the loop.
- skills/data/planner-prompt.md holds the system body; embedded via
  include_str! and treated as a data file (not a registered skill).
- skills drift test now skips skills/data/ since it ships embedded
  resources (.toml + .md), not skill bodies.
- 6 inline unit tests cover all branches via MockSpawner + a custom
  Err-returning spawner.

Closes hew-pxw9.
…bd-ready

- Dispatcher::new gains Option<BatchPlan>; field cached on the struct.
- dispatch_tick narrows post-scope candidates by linear contains
  against plan.task_ids (typical batch <10 — avoids per-tick HashSet
  alloc). Filter is non-expansive: bd dep graph stays the safety
  floor per DECISION:loop-parallel-overlap-policy.
- Source::Skipped and empty task_ids fall through to trust-the-graph
  with no batch_source signaled.
- New DispatchTick.batch_source + Dispatcher::current_batch_source()
  for downstream summary aggregation.
- 8 new tests cover the matrix; existing 13 dispatcher tests pass
  unchanged with batch_plan: None.

Closes hew-rplg.
- LoopPlannerConfig {enabled, budget_tokens, runtime}; default
  enabled=true / 10_000 tokens / runtime=None.
- hew config get/set for loop.planner.{enabled,budget_tokens,runtime}.
- hew loop run --no-planner / --planner-budget / --planner-runtime,
  resolved via resolve_planner_config (CLI > config > default).
- Iter-end hook in run_worker_loop_with_scope writes
  <run-dir>/batch-NNN+1.json under --jobs >= 2 covering all four
  branches: Agent (raw_text named the block) → Planner (spawned) →
  Skipped (planner_disabled / budget_exceeded / parse_error /
  runtime_error) → bypass entirely when jobs == 1.
- Pure resolve_iter_completion_plan helper keeps the branch
  arithmetic test-friendly.
…w-z7rz)

- Summary gains PlannerCounts{agent,planner,skipped} + scan_planner_counts(run_dir)
  helper that walks batch-NNN.json artifacts; render emits
  'planner: agent=N, runtime=M, fallback=K' between scope and tokens, omits when zero
- loop_cmd::print_summary populates from run_dir so live, replay, and parallel-aggregate
  paths all carry it
- docs/LOOP.md '## Batch planner' section: agent→planner→trust-the-graph cascade,
  batch-NNN.json schema, summary line, --no-planner / loop.planner.* surface
- CHANGELOG [Unreleased] entry; DECISION:loop-batch-planner-floor memory persisted
- 5 new lib tests; fmt+clippy clean; 712 lib tests green
Adds an opt-in mandatory verify step that runs after the last iter
(and after merge-back on --jobs >= 2) to prove the final stacked
state is green. Conditional on both a resolvable test command
(CLI > config > gate::detect) and an explicit opt-in.

- new hew_core::verify (VerifyOutcome + resolve_command + run_verify)
- new [loop.end_of_run] config block (verify_tests, verify_command,
  verify_budget_wall) with three settable keys
- Run + RunLog gain verify_outcome with backward-compat parse
- summary renderer adds a coloured "verify:" line below planner
- --verify-tests / --no-verify-tests / --verify-command CLI flags
- failure writes STATUS:loop-verify-failed:<run-id> + non-zero exit;
  closed tasks are NOT rolled back
- defaults byte-identical to today (verify_tests = false)
- 18 new tests; docs/LOOP.md + CHANGELOG updated

Closes bd-hew-bon7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- hew_core::loop_graph IR + mermaid/dot/ascii renderers (pure, no I/O)
- builders read iter*.json, batch*.json, run.json, manifest.json
- unhappy paths render distinctly: incomplete (dashed), cancelled (⊘),
  runtime-error+empty-stderr ("possibly hung"), backpressure rollback
  (↺ self-edge), verify outcomes (passed/failed/skipped)
- parallel runs lay out per-worker swimlanes from manifest.json
- pre-batch-plan legacy runs render with sequential edges only
- CLI: hew loop graph [--run-id ID] [--format ...] [--out PATH] [--all]
- 13 unit tests covering each acceptance criterion + 5 e2e CLI tests
- docs/LOOP.md § Loop graph section + CHANGELOG entry

Closes epic hew-lf40 (8/8 children).
@droidnoob droidnoob force-pushed the feat/batch-plan-module branch from 42014ba to e9c23c0 Compare May 30, 2026 12:09
@droidnoob droidnoob merged commit 0c07687 into main May 30, 2026
14 checks passed
@droidnoob droidnoob mentioned this pull request May 30, 2026
droidnoob added a commit that referenced this pull request May 30, 2026
* chore(release): 0.11.0

- workspace Cargo.toml: 0.10.0 -> 0.11.0
- 23 skill body `hew:version=` markers bumped to match
- .claude/ install snapshot refreshed via `hew init --runtime=claude`
- CHANGELOG.md: move [Unreleased] content into [0.11.0] — 2026-05-30

Release contents since 0.10.0:

#53 parallel hew loop via per-worker git worktrees (hew-6az)
#54 per-task model selection + per-model token spend (hew-1tq)
#55 init re-run UX — refresh/reconfigure/cancel (hew-0wa)
#56 split /hew:auto from /hew:loop semantics (hew-6n0v)
#57 cut local cargo test from ~2 min to ~22s (hew-v2ib)
#58 hew loop run --scope={ready|epics} (hew-b3yl)
#59 batch planner + end-of-run verify + loop graph (hew-lf40)
#60 retry_etxtbsy stub flake fix (hew-0rky)

Breaking surface: hew loop run in non-interactive mode now requires
--scope. Justifies the minor bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(readme): reflect 0.11.0 surface changes

- /hew:auto description updated to in-conversation epic walk (was the
  legacy plan→decompose→execute→verify; rewritten in hew-6n0v / #56)
- slash count 40 → 41 (new /hew:auto + various)
- loop snippets show --scope (required in non-interactive mode per
  hew-b3yl / #58), --jobs N, --verify-tests, hew loop summary,
  hew loop graph
- autonomous-loop bullets gain parallel-workers, scoped-runs +
  per-task-model, end-of-run-verification entries
- Selected knobs table adds loop.model.*, loop.planner.*,
  loop.end_of_run.verify_tests, loop.fallback_runtime

No changes to brand, hero copy, or repo description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant