Skip to content

feat(loop): parallel hew loop via per-worker git worktrees (hew-6az complete)#53

Merged
droidnoob merged 14 commits into
mainfrom
feat/parallel-loop-worktrees
May 29, 2026
Merged

feat(loop): parallel hew loop via per-worker git worktrees (hew-6az complete)#53
droidnoob merged 14 commits into
mainfrom
feat/parallel-loop-worktrees

Conversation

@droidnoob
Copy link
Copy Markdown
Owner

Summary

Lands the implementation slice of epic hew-6az — parallel hew loop driven by per-worker git worktrees. 9 of 12 children closed; 3 follow-ups (worktree GC, per-worker summary, docs) remain open and will land on top of this once merged.

  • New module hew_core::worktree at ~/.hew/wt/<run-id>/<n>/ — create/prune/list_orphans, GitClient-injected for tests (hew-8da).
  • hew_core::dispatcher slot-fill state machine + thread-per-worker spawn driven from bd ready (hew-9m5).
  • run_loop_with split into per-worker run_worker_loop parameterised on worktree_dir (hew-ddi).
  • --jobs N CLI flag (default 1, range 1..=16), with the N=1 path bypassing the dispatcher entirely so single-worker latency is unchanged (hew-wee).
  • hew_core::merge_back module — auto-files a [merge-conflict] bug task when a worker's branch can't fast-forward back onto the loop branch (hew-ki4).
  • hew_core::gate takes working_dir so backpressure runs inside the worker's worktree, not the project root (hew-j4x).
  • Per-worker iter log layout: .hew/loop/<run-id>/worker-<n>/iter-NNN.json plus a top-level manifest.json enumerating workers (hew-bmq).
  • Per-worker branch creation + scoped git reset --hard — collision guard for shared run-id branch names (hew-ptb).
  • e2e_parallel_jobs_2_with_mock_spawner + e2e_parallel_merge_conflict integration tests (hew-d5gd).

Default loop stays sequential per DECISION:loop-parallel-overlap-policy (trust the dependency graph in v1); --jobs >1 is opt-in.

Test plan

  • cargo fmt --check
  • cargo clippy --all-targets -- -D warnings
  • cargo test (all green; library suite includes 11 worktree unit tests + the two new e2e parallel tests)
  • Smoke hew loop run --jobs 2 --max-iter 2 --dry-run against this repo's own .beads/ once merged

Follow-ups (remaining hew-6az children)

  • hew-kt5q — worktree GC: graceful teardown + hew loop prune-worktrees subcommand
  • hew-h0tuhew loop summary per-worker breakdown table reading manifest.json
  • hew-9noldocs/LOOP.md Parallel runs + Recovery sections, CHANGELOG, CLAUDE.md, ARCHITECTURE.md

Each will land as its own PR off the merged main; epic hew-6az closes when all three ship.

droidnoob and others added 14 commits May 29, 2026 22:47
Adds a new "Agent contracts — always honor" block to `hew prime
resume`, rendered between "Project config" and "Latest CHECKPOINT".
Every hew user's SessionStart hook now sees these contracts in
context every session, independent of project config.

The seven contracts codify command shapes where wrong invocation has
silently caused real bugs:

- Checkpoints: ALWAYS `hew checkpoint "<body>"`, never roll the
  CHECKPOINT: shape by hand (GH #40).
- External-state gates: `hew gate new --gh-pr=N`, not the
  nonexistent `bd create --type=gate --await-*`.
- Batch task creation: `bd create --graph plan.json` for >3 tasks
  or any multi-line description.
- Multi-line bodies: pass `--description-file` / `--from-file` /
  `bd update --body-file`; never heredoc through zsh.
- Inspection: text-default surfaces; never pipe `--json` through
  `jq` / `python` / `sed` (FEEDBACK:no-json-piping).
- Loop runs: always cap (`--max-iter`, `--budget-wall`,
  `--budget-tokens`).
- Prefer hew wrappers over raw bd; explicit hold-out list.

Each contract carries its own rationale comment in the
`agent_contracts()` function so the next maintainer knows why a
line is there before deleting it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First building block for hew-6az parallel loop. Per
DECISION:loop-worktree-location, per-worker git worktrees live at
~/.hew/wt/<run-id>/<n>/ — out of tree to keep .hew/ tracked in git
without gitignore drift.

Public API:
- root() -> ~/.hew/wt/ via etcetera home strategy
- branch_name(run_id, n) -> "loop/<run-id>/w<n>" (dispatcher convention)
- worker_path(root, run_id, n) -> <root>/<run-id>/<n>
- create(git, project_root, root, ...) shells `git -C <p> worktree add -b ...`
- prune(git, project_root, root, ...) — tolerant of partial state
- list_all(root) / list_orphans(root, &active_run_ids)

Everything that talks to git goes through hew_core::git::GitClient so
unit tests inject a RecordingGit fake — no real git binary spawns from
the lib tests, matching the acceptance criterion.

11 new tests cover: handle shape, exact argv to `git worktree add`,
branch-name pattern, prune cleanup, prune-tolerates-missing,
list_orphans empty + filtering, defensive scan of non-numeric dirs,
and the etcetera root smoke.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hew-ddi)

- Introduce Worker { id, worktree_dir, branch, log_dir } and
  WorkerOutcome { run, iter_logs } so the dispatcher can fill
  parallel slots without re-plumbing the iter loop.
- run_loop_with stays as the dispatcher: builds run-id, log dir,
  primer text and allowed-tools, constructs a single Worker for
  the --jobs=1 fast path, delegates the iter body to
  run_worker_loop, then renders the end-of-run summary.
- run_worker_loop targets worker.worktree_dir and worker.log_dir
  for every git / gate / iter-log call so future per-slot workers
  read/write under their own worktree.
- Behavior byte-identical for --jobs=1; new test
  run_worker_loop_uses_worker_worktree_for_git_calls pins the
  per-worker contract against a tempdir worktree.
- Dispatcher tracks N worker slots (jobs clamped to ≥ 1) over the
  lifetime of a parallel `hew loop` run.
- dispatch_tick(&dyn BdClient) pulls from bd.ready(), filters tasks
  already in_flight (defense against stale snapshots), and claims
  each picked task atomically via hew_core::tasks::claim. A claim
  race surfaces as a ClaimFailure and the slot stays idle so the
  next tick can retry.
- complete(slot_id) releases a slot back to Idle and returns the
  task id that was running there.
- Per DECISION:loop-parallel-overlap-policy ("trust-the-graph"),
  the dispatcher does NOT do overlap detection — any bd-ready task
  is assumed parallelizable; merge conflicts surface later.
- Pure state machine: bd / git / spawner / worktree side-effects
  are all injected, so the 9 unit tests use a MockBd and exercise
  the slot logic without threads, real git, or real claude.
- N=1 regression test pins the serial-loop equivalence acceptance.
- Not yet wired into loop_cmd — that integration lands with hew-wee
  (--jobs N CLI fast-path).
- rename GateRunner::run_gate(project_root) -> (working_dir) across
  trait, AutoGateRunner, StaticGateRunner, and run_gate_step; doc the
  parallel-loop intent on gate::detect
- run_worker_loop already passes worker.worktree_dir since hew-ddi;
  add two integration tests pinning that contract:
  - gate_is_called_with_worker_worktree_dir: RecordingGateRunner
    asserts the gate received worker.worktree_dir (not the
    dispatcher's ambient project_root)
  - gate_falls_back_to_project_root_when_unspecified: single-worker
    fast path keeps gating at project_root byte-for-byte

The future parallel dispatcher (hew-9m5 + hew-6az) now has a tested
guarantee that per-worker target/ and node_modules/ stay isolated by
virtue of the gate following worker.worktree_dir.
- loop_log: iter_log_path / run_log_path take Option<u32> worker_n;
  None preserves the --jobs=1 layout, Some(n) routes under worker-<n>/.
- Add worker_dir / ensure_worker_dir helpers + Manifest types and
  write_manifest at the run-dir root.
- Worker gains worker_n; run_loop_with writes a single-worker manifest
  at dispatcher shutdown — parallel dispatcher folds the same shape.
- Per-worker iter counter is inherent: each worker owns its own Run
  via run_worker_loop, so next_iter_number() is worker-scoped.

Closes hew-bmq.
…rd (hew-ptb)

- hew_core::git::reset_hard_in(git, worktree, sha): shared helper that
  runs git -C <wt> reset --hard <sha> so the parallel loop's gate-fail
  revert is scoped to one worker's worktree, never siblings. Replaces
  the local git_reset_hard in loop_cmd.rs with a one-line delegation.
- hew_core::worktree::branch_exists + create() collision pre-check:
  refuses to land git worktree add -b on an existing branch (e.g. when
  a run_id is reused after a crashed run). Surfaces a clear GitNonZero
  with branch + run_id + remediation hint instead of letting git fail
  mid-way or silently land on a stale branch.
- Integration test worker_rollback_only_resets_own_worktree against
  real git: two sibling worktrees, both with iter commits, reset one,
  assert the other's HEAD + tree are untouched. Scrubs inherited
  GIT_DIR/GIT_INDEX_FILE/etc. so the test survives running under the
  pre-commit hook's own git context.
- Unit tests for reset_hard_in (-C argv shape, nonzero propagation),
  branch_exists (Ok(false) on exit-1, Ok(true) on success), and create
  collision (rev-parse only, no worktree add invoked on collision).
- New hew_core::merge_back: sequential git merge --no-ff --no-edit per
  worker branch; conflicts trigger --abort + ConflictReport, clean
  merges land in MergeReport::merged. file_conflict_bug_tasks files one
  [merge-conflict] bug task per conflict via `bd q` and attaches a
  description listing the conflicting files.
- Dispatcher::shutdown_merge_back wires merge_back + bug-task filing as
  the run-end consolidation seam (consumed once --jobs N>=2 ships in
  hew-wee).
- Worktrees intentionally retained on conflict per
  DECISION:loop-parallel-overlap-policy so the human resolving the
  conflict can `cd ~/.hew/wt/<run-id>/<n>/`.
- Tests: three named acceptance cases (clean / single conflict + bug
  task / continues after conflict) plus parse_worker_n + non-standard
  branch fallback; injected GitClient + BdClient fakes, no real git.

Closes hew-ki4.
…ch (hew-wee)

- Add --jobs (u32, default 1, range 1..=16) to `hew loop run` Args.
- Split run_loop_with: jobs<=1 keeps the existing serial body
  byte-identical; jobs>=2 calls new run_loop_parallel which builds
  hew_core::dispatcher::Dispatcher, lays per-worker git worktrees,
  drives run_worker_loop per slot, and shutdown_merge_back's branches
  back onto launch HEAD. Workers sequential in v1 (Send+Sync bounds
  for thread::scope ship with hew-d5gd e2e fixtures).
- Tests: clap range rejects 0 and 17; help documents the flag;
  default is 1; jobs=1 dry-run writes no worker-N subdir and
  Manifest.jobs==1; jobs=2 dry-run writes Manifest.jobs==2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add hew/tests/loop_parallel_e2e.rs with two hermetic integration tests:
  e2e_parallel_jobs_2_with_mock_spawner drives --jobs=2 against 4 ready
  tasks and asserts the queue drains, worktrees materialize, manifest
  lists 2 workers; e2e_parallel_merge_conflict_files_bug_task forces
  colliding commits in both worker worktrees and asserts merge_back
  files exactly one [merge-conflict] bug task while leaving both
  worktrees on disk for human resolution.
- Tests isolate ~/.hew/wt/ via a HOME override gated on a static Mutex
  so the two tests serialize their env mutation within one binary;
  scrub GIT_DIR / GIT_INDEX_FILE / etc. that the host shell or
  pre-commit hook may have leaked in.
- Fix (Rule 2): run_loop_parallel now ensure_worker_dir's the
  worker-<n>/ log subdir before each worker iter — without this the
  iter log write ENOENTs on first iter since iter_log_path composes
  the path but never mkdirs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ubuntu CI runners have no system-wide git identity. `HomeGuard` scrubs
`HOME` to isolate `~/.hew/wt/`, which also strips access to any
`~/.gitconfig`. Production `RealGit` invocations inside `merge_back`
create commits (`git merge --no-ff`), which fail with "Please tell me
who you are" before any conflict is detected — so the conflict-report
path never fires and `e2e_parallel_merge_conflict_files_bug_task`
sees a fatal merge error instead of a filed `[merge-conflict]` bug.

Write a minimal `.gitconfig` into the tempdir HOME so production git
calls during the test have an identity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Graceful teardown: run_loop_parallel prunes worktrees of cleanly-
  merged worker branches after merge_back. Conflicted worktrees stay
  on disk so the [merge-conflict] bug-task hint can point at them.
- hew_core::loop_log::active_run_ids walks .hew/loop/loop-*/run.json
  and returns ids whose stop_reason is still None (or whose run.json
  is missing/unparseable) — conservative so crashes aren't auto-
  cleaned.
- New `hew loop prune-worktrees` subcommand; default dry-run lists
  orphan worktrees under ~/.hew/wt/, `--apply` removes them.
- Updated existing parallel e2e tests: clean-merge worktrees are now
  gone after the run; only conflicted ones survive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- hew_core::loop_summary gains WorkerSlice + worker_slice() +
  render_parallel_breakdown(): one row per worker with iters / closed
  / runtime / tokens / stop and an aggregate totals row.
- run_summary in loop_cmd branches on manifest.json presence. Parallel
  runs render the breakdown first, then build an aggregate Run from
  the union of all workers' iter logs (manifest's started_at /
  completed_at give an honest wall-clock window) so the existing
  summary block reports cross-worker token + outcome totals.
- Serial runs (no manifest) unchanged.
- Fixture: hew-core/tests/fixtures/parallel-run-2workers/ — 2 workers,
  5 iter logs covering claude + codex runtimes and a no_close.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/LOOP.md gains a Parallel runs section (layout, branch naming,
  merge-back, worked --jobs 2 example, concurrency caveats) and a
  Recovering from a crashed parallel run section pointing at
  `hew loop prune-worktrees`. --jobs added to defaults block.
- CHANGELOG [Unreleased] covers --jobs, prune-worktrees, and the
  per-worker summary breakdown landed in this branch.
- CLAUDE.md How-to-work gets a one-liner + prune-worktrees command
  and a link to the new docs section.
- ARCHITECTURE.md Loop runner section extended with the Epic C bullet
  (dispatcher + worktree + merge_back modules + active_run_ids).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@droidnoob droidnoob changed the title feat(loop): parallel hew loop via per-worker git worktrees (9/12 of hew-6az) feat(loop): parallel hew loop via per-worker git worktrees (hew-6az complete) May 29, 2026
@droidnoob droidnoob merged commit c7c512c into main May 29, 2026
14 checks passed
@droidnoob droidnoob deleted the feat/parallel-loop-worktrees branch May 29, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant