Skip to content

feat(hooks): background hook jobs + universal logging + daft hooks jobs CLI#411

Merged
avihut merged 195 commits intomasterfrom
feat/background-hook-jobs
Apr 28, 2026
Merged

feat(hooks): background hook jobs + universal logging + daft hooks jobs CLI#411
avihut merged 195 commits intomasterfrom
feat/background-hook-jobs

Conversation

@avihut
Copy link
Copy Markdown
Owner

@avihut avihut commented Apr 26, 2026

Summary

Big feature branch landing the background-hook-jobs subsystem and everything that grew around it. Three independent sub-projects bundled into one PR (each was originally going to ship separately, but they share enough state — log store, coordinator IPC, retention policy — that splitting them at this point would create more churn than it saves). 194 commits, +35.2k / -302 lines, 119 files.

What's in here

Background hook jobs (sub-project A core). worktree-post-create and friends can declare background: true jobs that run asynchronously after the command returns. A single forked coordinator process per repo manages them — runs jobs as threads, persists results, exposes a Unix-socket IPC for cancel/list/shutdown. New src/coordinator/ module: process.rs (fork + run), client.rs/ipc.rs (Unix-socket protocol), log_store.rs (<state>/jobs/<repo-uuid>/<inv-id>/<job-name>/{meta.json,output.log}), clean_policy.rs (retention/budget/keep-last).

Universal hook invocation logging. Every hook fire — foreground or background, declared or not — produces an invocation record + per-job records. Removes the long-standing gap where pre-remove hooks with only foreground jobs left no audit trail. Cross-worktree resolution lets you address jobs even after the worktree directory is gone.

`daft hooks jobs` CLI. New top-level command tree: `list`, `logs`, `cancel` (per-job + `--all`), `retry` (single + invocation-prefix bulk), `prune` (replaces the deprecated `clean`). Multi-format output (`--format json|yaml|tsv|csv|markdown|toon|ndjson|template`) and a `--template` flag for Tera. Filters: `--worktree`, `--status`, `--hook`, `--all` for cross-repo views. Composite addressing (`worktree::job`, `inv:job`, `worktree:inv:job`) with hex/slash heuristics so users don't have to think about disambiguation.

Log maintenance (sub-project B). Three-pass cleanup: per-log truncation (`max_log_size`, default 10 MB), retention sweep (`retention`, default 7 d, with `keep_last` floor of 3), per-repo budget (`max_total_size`, default 500 MB, LRU eviction). Auto-fires once per 24 h via a detached `daft __clean-logs` child (gated by `DAFT_NO_LOG_CLEAN` and CI detection). Manual escape hatch: `daft hooks jobs prune [--dry-run] [--older-than]`. Repo-level policy (`max_total_size`/`keep_last`/`stale_running_after`) persisted to a `repo-policy.json` sidecar so cleanup never has to re-read `daft.yml`.

UX polish. Outline renderer for the listing's vertical-spine timeline; column alignment across invocations; proper sub-second duration rendering (`36ms` instead of `<1m`); tab completion for hooks-jobs commands (job names, invocation prefixes, worktree drill-down); deprecation warnings for the old `hooks jobs clean` verb.

Ultrareview bug fixes (final 20 commits). Wrapping up four findings from a cloud review:

  1. Background-job cancellation was non-functional end-to-end. `ChildPidMap` was never populated (`run_command` didn't expose the spawned PID); per-job cancel produced `Failed` not `Cancelled`; worktree-removal cancel compared filesystem path against branch slug; `branch-delete` path didn't call the cancel helper at all; the coordinator IPC server is one-request-per-stream but the cancel helper reused a single client.
  2. Documented `log` / `background_output` fields were silent no-ops. Top-level `log:` block never propagated to per-job `log_config`; `background_output: silent` parsed and validated but no execution path consulted it; `log.path` was parsed and merged but no writer ever read it (stripped, since the subsystem is unreleased).
  3. `write_repo_policy` clobbered persisted tuning. Truncate-write of an all-`None` policy (the common case for hooks without log blocks) silently reset user values to defaults. Now field-merges with on-disk so unset fields preserve.
  4. Cleanup summary accounting was wrong. `enforce_budget` returned only the eviction count, dropping the per-invocation byte/job totals it had already computed. `prune --dry-run` always reported `across 0 invocation(s)`. Fixed to return a full `BudgetOutcome { evicted_invocations, freed_bytes, freed_jobs }` and to hoist the dry-run tally above the early-return.

Why now

The three sub-projects feed each other tightly. Sub-project A introduced the log store; B added the cleanup machinery that needs A's persisted retention; the universal-logging work sits between them. Splitting the merge at this stage would mean either landing B without enough of A's machinery to test against (`prune` with no logs to prune), or landing the universal-logging changes on top of an unmerged sub-project A. Cleaner to ship the bundle.

Spec/plan trail

  • `docs/superpowers/specs/2026-03-27-background-hook-jobs-design.md` — sub-project A.
  • `docs/superpowers/specs/2026-04-11-universal-hook-logging.md` — universal logging.
  • `docs/superpowers/specs/2026-04-25-hooks-jobs-log-maintenance.md` — sub-project B.
  • `docs/superpowers/plans/2026-04-26-hooks-jobs-ultrareview-fixes.md` — final 4-bug fixup.

User-facing docs in `docs/guide/hooks.md` (`Background Jobs`, `Log Configuration`) and `docs/cli/daft-hooks-jobs.md`.

Deferred follow-ups

Tracked in `~/.claude/projects/.../memory/project_hooks_jobs_followups.md` so they don't drop on the floor:

  • `cancel_background_jobs_for_worktree` cross-module placement (lives in `prune.rs` but called from `branch_delete.rs` too).
  • `write_repo_policy` atomicity (tmpfile+rename) and version-downgrade handling.
  • `prune --dry-run` summary gate is asymmetric (defense-in-depth tighten).
  • Argument-bag refactor for `yaml_jobs_to_specs` (8 args) / `execute_yaml_hook_with_rc` (11 args) / `run_single_background_job` (7 args).
  • Composite-address parser quirks (`worktree:job` requires three-segment form).

Test plan

  • `mise run ci` green locally — 484 manual scenarios, 1802 steps, all 4 matrix combos (default+gitoxide × bash+yaml). Total ~20 min.
  • `mise run test:unit` — 1375 tests pass, clippy clean, fmt clean.
  • Per-task unit tests for each of the 4 ultrareview bugs (PID register/deregister, per-job cancel marks Cancelled, silent log delete on success / preserve on failure / preserve on cancellation, repo-policy field-merge preserves and overrides, budget outcome reports bytes+jobs, dry-run tallies invocations).
  • Manual scenarios for each of the 4 ultrareview bugs (`bg-job-cancel-by-name.yml`, `bg-job-cancel-on-worktree-remove.yml`, `repo-log-defaults-applied.yml`, `repo-policy-not-clobbered.yml`, `prune-budget-accounting.yml`, `prune-dry-run-invocations.yml`, `bg-output-silent.yml`). Mutation-tested the silent-stderr negative assertion to confirm it has teeth across the fork+setsid boundary.
  • Sandbox sanity walk after merge: per-job cancel terminates child, `--all` cancel terminates all, `git-worktree-branch -D` triggers auto-cancel, top-level `log: { max_total_size: 100MB }` shows `104857600` in `repo-policy.json` and survives a subsequent hook fire without the log block, `prune` with tight budget reports non-zero `freed`, `prune --dry-run` reports correct `across N invocation(s)`, silent bg job leaves no `output.log` on success.

🤖 Generated with Claude Code

@avihut avihut self-assigned this Apr 26, 2026
@avihut avihut added the feat New feature label Apr 26, 2026
@avihut avihut added this to the Public Launch milestone Apr 26, 2026
@avihut avihut linked an issue Apr 26, 2026 that may be closed by this pull request
4 tasks
@avihut avihut force-pushed the feat/background-hook-jobs branch 6 times, most recently from 51996ed to 93a6c89 Compare April 27, 2026 21:03
avihut and others added 20 commits April 28, 2026 22:07
Adds STATE_DIR_ENV constant and daft_state_dir() function following the
same pattern as daft_config_dir() and daft_data_dir(). Falls back to
~/.local/state/daft on macOS (no native XDG state dir equivalent).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ML schema

Adds BackgroundOutput enum and LogConfig struct, along with new optional
fields to YamlConfig (log), HookDef (background), and JobDef (background,
background_output, log) to support background job execution configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Defines BackgroundOutput and LogConfig in executor/mod.rs (avoiding a
circular dependency), re-exports them from hooks/yaml_config.rs, and
updates yaml_jobs_to_specs to pass background/background_output/log
through from JobDef to JobSpec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…omotion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements partition_foreground_background() which splits a JobSpec slice
into two phases: jobs that must complete before daft exits (foreground) and
jobs that can outlive the command (background). Background jobs transitively
needed by any foreground job are promoted to foreground to preserve DAG
validity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Creates src/coordinator/ module with LogStore that manages on-disk storage
of background job metadata (meta.json) and output logs, using daft_state_dir()
for the XDG-compliant base path. Adds serde feature to chrono dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…und execution

Add CoordinatorState for managing background hook jobs and fork_coordinator()
for spawning a detached child process that runs jobs as threads with log output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Unix socket IPC so CLI commands can communicate with a running
coordinator to list jobs, cancel them, or request shutdown. The protocol
uses newline-delimited JSON over Unix domain sockets.

- Define CoordinatorRequest/CoordinatorResponse protocol types in mod.rs
- Add CoordinatorClient with connect, list_jobs, cancel_job, cancel_all
- Add socket listener thread to coordinator process (non-blocking accept)
- Add cancellation support via shared AtomicBool and child PID tracking
- Write PID file on coordinator startup, clean up on exit
- Add helper functions: coordinator_socket_path, coordinator_pid_path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…spatch

Modify execute_yaml_hook_with_rc() to partition jobs into foreground and
background phases, run foreground inline, and fork a coordinator for
background jobs. Includes DAFT_NO_BACKGROUND_JOBS escape hatch and
non-Unix fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the `daft hooks jobs` CLI command with subcommands for listing,
viewing logs, cancelling, retrying, and cleaning background hook jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add field-level merging for the `log` section in merge_configs(), so that
overlay's retention/path fields take precedence while base fields are
preserved when overlay is None.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Before removing a worktree (before the worktree-pre-remove hook runs),
connect to the coordinator and cancel any background jobs that are
actively running for that worktree. This prevents background jobs from
writing to a directory that no longer exists.

Cancellation is best-effort: if no coordinator is running, or if the
coordinator cannot be reached, the error is silently ignored and removal
proceeds normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Background Jobs section to hooks guide (config, promotion, logging, CLI)
- Create CLI reference page for daft hooks jobs
- Add DAFT_NO_BACKGROUND_JOBS to configuration env vars
- Update SKILL.md with background job fields and commands

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…inator

Background jobs now appear individually in the hook output with a
(background) indicator and optional description, giving immediate
visibility into what was dispatched without running daft hooks jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Background jobs now appear in the hook summary section alongside
foreground jobs, with a blue ↻ indicator and (background) label,
instead of only appearing as an on_message line. Adds Background
variant to JobOutcome and on_job_background to JobPresenter trait.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
avihut and others added 17 commits April 28, 2026 22:07
Renumbers stale step comments in run_single_background_job so the
sequence is contiguous (1..10) after the recent PID-channel insert,
clears cancelled_jobs entries after the classifier runs so a
re-invocation of the same job name doesn't inherit a stale Cancelled
flag, and routes the per_job_cancel test killer thread through
cancel_single_job so future regressions in that helper (dropping the
insert or skipping the SIGTERM) surface in the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JobInfo.worktree carries a branch slug (e.g. "feat/x"); previously
we compared it against the filesystem worktree path, so the
auto-cancel-on-removal flow silently never matched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The coordinator's handle_client_connection services exactly one
request per stream, so reusing a CoordinatorClient across list_jobs
and cancel_job sends produces a Broken-pipe error on the second send,
silently leaving the bg job running. Open a fresh connection for
list_jobs and each cancel_job. Also exposes the helper as pub(crate)
so the branch-delete path can call it (next commit wires that up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 3 wired auto-cancel-on-removal into the prune codepath but
left the branch-delete codepath (`git-worktree-branch -D`) silent:
deleting a worktree with a live bg job would tear the worktree
down without ever signalling the coordinator, so the job kept
running and its meta.json stayed at "running" forever.

Hook the same helper into delete_single_branch right before the
pre-remove hook fires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locks in cancel semantics: per-job cancel SIGTERMs and records
Cancelled, worktree removal triggers auto-cancel via slug match.

Both scenarios use a `sleep 30` bg job and redirect daft's stdio
to /dev/null on the checkout step so the test framework's capture
pipe doesn't keep the forked coordinator alive past the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cancel scenarios now poll for the JSON status flip instead of
fixed-duration sleeping. Also harmonizes env -u DAFT_TESTING
usage between the two scenarios so the asymmetry doesn't
mislead future readers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The path field was parsed and merged but no execution code path
read it; both writers hard-coded the XDG state-dir location.
Stripping is preferable to shipping a documented no-op. The
hooks-jobs/log subsystem is new on this branch and hasn't been
released, so this isn't a breaking change for any real user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to e55d1ce — the job-fields table at line 216 still
listed path as a sub-field of log. Replaced with max_log_size
(the only other per-job log field) so the two log tables in
the file agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yaml_jobs_to_specs now accepts the repo-level LogConfig and
merges it under each per-job log block, so max_total_size /
keep_last / stale_running_after configured at the top level
finally reach build_repo_policy. Threaded through
execute_yaml_hook_with_rc; the executor.rs caller passes
yaml_config.log.as_ref().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces inline `crate::executor::LogConfig` and
`crate::hooks::yaml_config_loader::merge_log_configs` paths
with direct uses, matching the file's existing import style.
No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously silent was parsed and validated but had no execution
effect — output.log was always written and the failure stderr
notification always fired. Now: log is deleted on success,
notification suppressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The silent-mode log deletion previously gated on cmd_result.success.
A job that exited 0 but was cancelled mid-run would have its log
deleted while status was set to Cancelled — losing the captured
output for an interrupted job. Move the delete below the classifier
and gate on node_status == Succeeded so the log survives whenever
the recorded status is anything but a true success.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hooks without a log block submit an all-None RepoPolicy. The
previous truncate-write silently wiped the user's persisted
tuning back to defaults on every such fire. Now: only explicitly
set fields override on-disk; unset fields preserve.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Returns BudgetOutcome { evicted_invocations, freed_bytes,
freed_jobs } instead of just the eviction count, so cleanup
summaries accurately reflect what was deleted. Fixes the
'No old logs to clean.' message printing despite hundreds
of MB freed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hoists candidates_per_inv build above the dry-run early return so
the same logic that counts invocation removals in the live path
(count >= total) also fires in dry-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locks in: top-level log: defaults reach repo-policy.json,
RepoPolicy not clobbered by hooks without log blocks, budget
eviction reports full accounting, dry-run reports correct
invocation count, silent background output behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cancel paths (`cancel_single_job`, `CancelAll`, `Shutdown`) signalled the
bare child PID. On macOS, `bash -c "<single-cmd>"` exec-optimises into
the inner command so the PID maps directly to the workload — SIGTERM
killed it. On Ubuntu (`/bin/sh = dash`) the same single-command shape
sometimes fork+waits instead of execing, so the registered PID was
`sh`'s. SIGTERM killed `sh`, the grandchild (e.g. `sleep 30`) was
reparented to init and lived on, leaving the job stuck in `Running` —
which is exactly what the two failing yaml-Linux scenarios caught.

Spawn each bg child in its own session via `pre_exec` + `setsid()` so
its PID equals its PGID, then send SIGTERM to `-pid` (the whole group).
Now every descendant receives the signal regardless of whether `sh`
exec'd or fork+waited. `run_command_interactive` is left as-is; it
inherits the parent's TTY and must stay in the parent's session.

Verified locally: `mise run test:manual -- --ci` is 484/484 green
(including bg-job-cancel-by-name and bg-job-cancel-on-worktree-remove,
which were the two failures on the PR's yaml-Linux matrix legs).
@avihut avihut force-pushed the feat/background-hook-jobs branch from 93a6c89 to 79b4aba Compare April 28, 2026 19:08
@avihut avihut merged commit 6add06d into master Apr 28, 2026
8 checks passed
@avihut avihut deleted the feat/background-hook-jobs branch April 28, 2026 19:24
avihut added a commit that referenced this pull request Apr 28, 2026
PR #411 carried BREAKING CHANGE: footers in its commit-body for
internal renames of unreleased surfaces (--json, hooks jobs clean).
release-plz parsed those and bumped major. The user-facing changes
are additive, so target 1.10.0 and drop the breaking marker from
the changelog entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avihut added a commit that referenced this pull request Apr 28, 2026
PR #411 carried BREAKING CHANGE: footers in its commit-body for
internal renames of unreleased surfaces (--json, hooks jobs clean).
release-plz parsed those and bumped major. The user-facing changes
are additive, so target 1.10.0 and drop the breaking marker from
the changelog entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avihut added a commit that referenced this pull request Apr 28, 2026
* chore: release v2.0.0

* chore: regenerate man pages and CLI docs for new version

* chore: target 1.10.0 instead of 2.0.0 for release

PR #411 carried BREAKING CHANGE: footers in its commit-body for
internal renames of unreleased surfaces (--json, hooks jobs clean).
release-plz parsed those and bumped major. The user-facing changes
are additive, so target 1.10.0 and drop the breaking marker from
the changelog entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: wheatley-the-moronic-ci-bot[bot] <253382460+wheatley-the-moronic-ci-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Avihu Turzion <avihu.turzion@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avihut added a commit that referenced this pull request Apr 28, 2026
The background hook coordinator added in #411 used Unix-only APIs
(`std::os::unix::net::{UnixStream, UnixListener}`, `libc::kill`,
`libc::SIGTERM`, `libc::pid_t`) without conditional compilation. The
v1.10.0 release workflow's `build-local-artifacts (x86_64-pc-windows-msvc)`
job consequently failed with 16 errors in `src/coordinator/{client,process}.rs`.

Gate all Unix-only socket and `libc` code in `client.rs` and `process.rs`
behind `#[cfg(unix)]` (including their test modules), and provide a
`#[cfg(not(unix))]` `CoordinatorClient` stub whose `connect()` always
returns `Ok(None)`. Existing call sites in `commands/hooks/jobs.rs` and
`core/worktree/prune.rs` already handle the `None` case as "no
coordinator running", and the only invokers of `fork_coordinator` are
already gated, so the public API stays compatible.

To prevent this class of regression from slipping past PR review again,
add a `windows-check` job to `test.yml` that runs `cargo check
--all-targets` on `windows-latest`. This catches Windows compile breaks
on every PR rather than only at release tag time.

Fixes #424
avihut added a commit that referenced this pull request Apr 28, 2026
…426)

* fix(hooks): gate coordinator IPC behind cfg(unix) so Windows builds

The background hook coordinator added in #411 used Unix-only APIs
(`std::os::unix::net::{UnixStream, UnixListener}`, `libc::kill`,
`libc::SIGTERM`, `libc::pid_t`) without conditional compilation. The
v1.10.0 release workflow's `build-local-artifacts (x86_64-pc-windows-msvc)`
job consequently failed with 16 errors in `src/coordinator/{client,process}.rs`.

Gate all Unix-only socket and `libc` code in `client.rs` and `process.rs`
behind `#[cfg(unix)]` (including their test modules), and provide a
`#[cfg(not(unix))]` `CoordinatorClient` stub whose `connect()` always
returns `Ok(None)`. Existing call sites in `commands/hooks/jobs.rs` and
`core/worktree/prune.rs` already handle the `None` case as "no
coordinator running", and the only invokers of `fork_coordinator` are
already gated, so the public API stays compatible.

To prevent this class of regression from slipping past PR review again,
add a `windows-check` job to `test.yml` that runs `cargo check
--all-targets` on `windows-latest`. This catches Windows compile breaks
on every PR rather than only at release tag time.

Fixes #424

* fix(hooks): gate PermissionsExt import in executor tests for Windows

The new `windows-check` PR job (cargo check --all-targets) caught an
un-gated `use std::os::unix::fs::PermissionsExt;` at the top of
`src/hooks/executor.rs`'s test module. Its only consumer is already
inside a `#[cfg(unix)]` block, so the import just needs the same gate.

Pre-existing: this didn't break the release workflow because dist's
build doesn't compile the lib-test target — only the `--all-targets`
check exercises the test modules on Windows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Background Jobs

1 participant