feat(hooks): background hook jobs + universal logging + daft hooks jobs CLI#411
Merged
feat(hooks): background hook jobs + universal logging + daft hooks jobs CLI#411
Conversation
Closed
4 tasks
51996ed to
93a6c89
Compare
Adds STATE_DIR_ENV constant and daft_state_dir() function following the same pattern as daft_config_dir() and daft_data_dir(). Falls back to ~/.local/state/daft on macOS (no native XDG state dir equivalent). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ML schema Adds BackgroundOutput enum and LogConfig struct, along with new optional fields to YamlConfig (log), HookDef (background), and JobDef (background, background_output, log) to support background job execution configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Defines BackgroundOutput and LogConfig in executor/mod.rs (avoiding a circular dependency), re-exports them from hooks/yaml_config.rs, and updates yaml_jobs_to_specs to pass background/background_output/log through from JobDef to JobSpec. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…omotion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements partition_foreground_background() which splits a JobSpec slice into two phases: jobs that must complete before daft exits (foreground) and jobs that can outlive the command (background). Background jobs transitively needed by any foreground job are promoted to foreground to preserve DAG validity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Creates src/coordinator/ module with LogStore that manages on-disk storage of background job metadata (meta.json) and output logs, using daft_state_dir() for the XDG-compliant base path. Adds serde feature to chrono dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…und execution Add CoordinatorState for managing background hook jobs and fork_coordinator() for spawning a detached child process that runs jobs as threads with log output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Unix socket IPC so CLI commands can communicate with a running coordinator to list jobs, cancel them, or request shutdown. The protocol uses newline-delimited JSON over Unix domain sockets. - Define CoordinatorRequest/CoordinatorResponse protocol types in mod.rs - Add CoordinatorClient with connect, list_jobs, cancel_job, cancel_all - Add socket listener thread to coordinator process (non-blocking accept) - Add cancellation support via shared AtomicBool and child PID tracking - Write PID file on coordinator startup, clean up on exit - Add helper functions: coordinator_socket_path, coordinator_pid_path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…spatch Modify execute_yaml_hook_with_rc() to partition jobs into foreground and background phases, run foreground inline, and fork a coordinator for background jobs. Includes DAFT_NO_BACKGROUND_JOBS escape hatch and non-Unix fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…background spawning
Adds the `daft hooks jobs` CLI command with subcommands for listing, viewing logs, cancelling, retrying, and cleaning background hook jobs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add field-level merging for the `log` section in merge_configs(), so that overlay's retention/path fields take precedence while base fields are preserved when overlay is None. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Before removing a worktree (before the worktree-pre-remove hook runs), connect to the coordinator and cancel any background jobs that are actively running for that worktree. This prevents background jobs from writing to a directory that no longer exists. Cancellation is best-effort: if no coordinator is running, or if the coordinator cannot be reached, the error is silently ignored and removal proceeds normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Background Jobs section to hooks guide (config, promotion, logging, CLI) - Create CLI reference page for daft hooks jobs - Add DAFT_NO_BACKGROUND_JOBS to configuration env vars - Update SKILL.md with background job fields and commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…inator Background jobs now appear individually in the hook output with a (background) indicator and optional description, giving immediate visibility into what was dispatched without running daft hooks jobs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Background jobs now appear in the hook summary section alongside foreground jobs, with a blue ↻ indicator and (background) label, instead of only appearing as an on_message line. Adds Background variant to JobOutcome and on_job_background to JobPresenter trait. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renumbers stale step comments in run_single_background_job so the sequence is contiguous (1..10) after the recent PID-channel insert, clears cancelled_jobs entries after the classifier runs so a re-invocation of the same job name doesn't inherit a stale Cancelled flag, and routes the per_job_cancel test killer thread through cancel_single_job so future regressions in that helper (dropping the insert or skipping the SIGTERM) surface in the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JobInfo.worktree carries a branch slug (e.g. "feat/x"); previously we compared it against the filesystem worktree path, so the auto-cancel-on-removal flow silently never matched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The coordinator's handle_client_connection services exactly one request per stream, so reusing a CoordinatorClient across list_jobs and cancel_job sends produces a Broken-pipe error on the second send, silently leaving the bg job running. Open a fresh connection for list_jobs and each cancel_job. Also exposes the helper as pub(crate) so the branch-delete path can call it (next commit wires that up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 3 wired auto-cancel-on-removal into the prune codepath but left the branch-delete codepath (`git-worktree-branch -D`) silent: deleting a worktree with a live bg job would tear the worktree down without ever signalling the coordinator, so the job kept running and its meta.json stayed at "running" forever. Hook the same helper into delete_single_branch right before the pre-remove hook fires. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locks in cancel semantics: per-job cancel SIGTERMs and records Cancelled, worktree removal triggers auto-cancel via slug match. Both scenarios use a `sleep 30` bg job and redirect daft's stdio to /dev/null on the checkout step so the test framework's capture pipe doesn't keep the forked coordinator alive past the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cancel scenarios now poll for the JSON status flip instead of fixed-duration sleeping. Also harmonizes env -u DAFT_TESTING usage between the two scenarios so the asymmetry doesn't mislead future readers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The path field was parsed and merged but no execution code path read it; both writers hard-coded the XDG state-dir location. Stripping is preferable to shipping a documented no-op. The hooks-jobs/log subsystem is new on this branch and hasn't been released, so this isn't a breaking change for any real user. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to e55d1ce — the job-fields table at line 216 still listed path as a sub-field of log. Replaced with max_log_size (the only other per-job log field) so the two log tables in the file agree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yaml_jobs_to_specs now accepts the repo-level LogConfig and merges it under each per-job log block, so max_total_size / keep_last / stale_running_after configured at the top level finally reach build_repo_policy. Threaded through execute_yaml_hook_with_rc; the executor.rs caller passes yaml_config.log.as_ref(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces inline `crate::executor::LogConfig` and `crate::hooks::yaml_config_loader::merge_log_configs` paths with direct uses, matching the file's existing import style. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously silent was parsed and validated but had no execution effect — output.log was always written and the failure stderr notification always fired. Now: log is deleted on success, notification suppressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The silent-mode log deletion previously gated on cmd_result.success. A job that exited 0 but was cancelled mid-run would have its log deleted while status was set to Cancelled — losing the captured output for an interrupted job. Move the delete below the classifier and gate on node_status == Succeeded so the log survives whenever the recorded status is anything but a true success. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hooks without a log block submit an all-None RepoPolicy. The previous truncate-write silently wiped the user's persisted tuning back to defaults on every such fire. Now: only explicitly set fields override on-disk; unset fields preserve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Returns BudgetOutcome { evicted_invocations, freed_bytes,
freed_jobs } instead of just the eviction count, so cleanup
summaries accurately reflect what was deleted. Fixes the
'No old logs to clean.' message printing despite hundreds
of MB freed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hoists candidates_per_inv build above the dry-run early return so the same logic that counts invocation removals in the live path (count >= total) also fires in dry-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locks in: top-level log: defaults reach repo-policy.json, RepoPolicy not clobbered by hooks without log blocks, budget eviction reports full accounting, dry-run reports correct invocation count, silent background output behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cancel paths (`cancel_single_job`, `CancelAll`, `Shutdown`) signalled the bare child PID. On macOS, `bash -c "<single-cmd>"` exec-optimises into the inner command so the PID maps directly to the workload — SIGTERM killed it. On Ubuntu (`/bin/sh = dash`) the same single-command shape sometimes fork+waits instead of execing, so the registered PID was `sh`'s. SIGTERM killed `sh`, the grandchild (e.g. `sleep 30`) was reparented to init and lived on, leaving the job stuck in `Running` — which is exactly what the two failing yaml-Linux scenarios caught. Spawn each bg child in its own session via `pre_exec` + `setsid()` so its PID equals its PGID, then send SIGTERM to `-pid` (the whole group). Now every descendant receives the signal regardless of whether `sh` exec'd or fork+waited. `run_command_interactive` is left as-is; it inherits the parent's TTY and must stay in the parent's session. Verified locally: `mise run test:manual -- --ci` is 484/484 green (including bg-job-cancel-by-name and bg-job-cancel-on-worktree-remove, which were the two failures on the PR's yaml-Linux matrix legs).
93a6c89 to
79b4aba
Compare
Closed
avihut
added a commit
that referenced
this pull request
Apr 28, 2026
PR #411 carried BREAKING CHANGE: footers in its commit-body for internal renames of unreleased surfaces (--json, hooks jobs clean). release-plz parsed those and bumped major. The user-facing changes are additive, so target 1.10.0 and drop the breaking marker from the changelog entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
avihut
added a commit
that referenced
this pull request
Apr 28, 2026
PR #411 carried BREAKING CHANGE: footers in its commit-body for internal renames of unreleased surfaces (--json, hooks jobs clean). release-plz parsed those and bumped major. The user-facing changes are additive, so target 1.10.0 and drop the breaking marker from the changelog entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
avihut
added a commit
that referenced
this pull request
Apr 28, 2026
* chore: release v2.0.0 * chore: regenerate man pages and CLI docs for new version * chore: target 1.10.0 instead of 2.0.0 for release PR #411 carried BREAKING CHANGE: footers in its commit-body for internal renames of unreleased surfaces (--json, hooks jobs clean). release-plz parsed those and bumped major. The user-facing changes are additive, so target 1.10.0 and drop the breaking marker from the changelog entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: wheatley-the-moronic-ci-bot[bot] <253382460+wheatley-the-moronic-ci-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Avihu Turzion <avihu.turzion@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 28, 2026
avihut
added a commit
that referenced
this pull request
Apr 28, 2026
The background hook coordinator added in #411 used Unix-only APIs (`std::os::unix::net::{UnixStream, UnixListener}`, `libc::kill`, `libc::SIGTERM`, `libc::pid_t`) without conditional compilation. The v1.10.0 release workflow's `build-local-artifacts (x86_64-pc-windows-msvc)` job consequently failed with 16 errors in `src/coordinator/{client,process}.rs`. Gate all Unix-only socket and `libc` code in `client.rs` and `process.rs` behind `#[cfg(unix)]` (including their test modules), and provide a `#[cfg(not(unix))]` `CoordinatorClient` stub whose `connect()` always returns `Ok(None)`. Existing call sites in `commands/hooks/jobs.rs` and `core/worktree/prune.rs` already handle the `None` case as "no coordinator running", and the only invokers of `fork_coordinator` are already gated, so the public API stays compatible. To prevent this class of regression from slipping past PR review again, add a `windows-check` job to `test.yml` that runs `cargo check --all-targets` on `windows-latest`. This catches Windows compile breaks on every PR rather than only at release tag time. Fixes #424
2 tasks
avihut
added a commit
that referenced
this pull request
Apr 28, 2026
…426) * fix(hooks): gate coordinator IPC behind cfg(unix) so Windows builds The background hook coordinator added in #411 used Unix-only APIs (`std::os::unix::net::{UnixStream, UnixListener}`, `libc::kill`, `libc::SIGTERM`, `libc::pid_t`) without conditional compilation. The v1.10.0 release workflow's `build-local-artifacts (x86_64-pc-windows-msvc)` job consequently failed with 16 errors in `src/coordinator/{client,process}.rs`. Gate all Unix-only socket and `libc` code in `client.rs` and `process.rs` behind `#[cfg(unix)]` (including their test modules), and provide a `#[cfg(not(unix))]` `CoordinatorClient` stub whose `connect()` always returns `Ok(None)`. Existing call sites in `commands/hooks/jobs.rs` and `core/worktree/prune.rs` already handle the `None` case as "no coordinator running", and the only invokers of `fork_coordinator` are already gated, so the public API stays compatible. To prevent this class of regression from slipping past PR review again, add a `windows-check` job to `test.yml` that runs `cargo check --all-targets` on `windows-latest`. This catches Windows compile breaks on every PR rather than only at release tag time. Fixes #424 * fix(hooks): gate PermissionsExt import in executor tests for Windows The new `windows-check` PR job (cargo check --all-targets) caught an un-gated `use std::os::unix::fs::PermissionsExt;` at the top of `src/hooks/executor.rs`'s test module. Its only consumer is already inside a `#[cfg(unix)]` block, so the import just needs the same gate. Pre-existing: this didn't break the release workflow because dist's build doesn't compile the lib-test target — only the `--all-targets` check exercises the test modules on Windows.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Big feature branch landing the background-hook-jobs subsystem and everything that grew around it. Three independent sub-projects bundled into one PR (each was originally going to ship separately, but they share enough state — log store, coordinator IPC, retention policy — that splitting them at this point would create more churn than it saves). 194 commits, +35.2k / -302 lines, 119 files.
What's in here
Background hook jobs (sub-project A core).
worktree-post-createand friends can declarebackground: truejobs that run asynchronously after the command returns. A single forked coordinator process per repo manages them — runs jobs as threads, persists results, exposes a Unix-socket IPC for cancel/list/shutdown. Newsrc/coordinator/module:process.rs(fork + run),client.rs/ipc.rs(Unix-socket protocol),log_store.rs(<state>/jobs/<repo-uuid>/<inv-id>/<job-name>/{meta.json,output.log}),clean_policy.rs(retention/budget/keep-last).Universal hook invocation logging. Every hook fire — foreground or background, declared or not — produces an invocation record + per-job records. Removes the long-standing gap where pre-remove hooks with only foreground jobs left no audit trail. Cross-worktree resolution lets you address jobs even after the worktree directory is gone.
`daft hooks jobs` CLI. New top-level command tree: `list`, `logs`, `cancel` (per-job + `--all`), `retry` (single + invocation-prefix bulk), `prune` (replaces the deprecated `clean`). Multi-format output (`--format json|yaml|tsv|csv|markdown|toon|ndjson|template`) and a `--template` flag for Tera. Filters: `--worktree`, `--status`, `--hook`, `--all` for cross-repo views. Composite addressing (`worktree::job`, `inv:job`, `worktree:inv:job`) with hex/slash heuristics so users don't have to think about disambiguation.
Log maintenance (sub-project B). Three-pass cleanup: per-log truncation (`max_log_size`, default 10 MB), retention sweep (`retention`, default 7 d, with `keep_last` floor of 3), per-repo budget (`max_total_size`, default 500 MB, LRU eviction). Auto-fires once per 24 h via a detached `daft __clean-logs` child (gated by `DAFT_NO_LOG_CLEAN` and CI detection). Manual escape hatch: `daft hooks jobs prune [--dry-run] [--older-than]`. Repo-level policy (`max_total_size`/`keep_last`/`stale_running_after`) persisted to a `repo-policy.json` sidecar so cleanup never has to re-read `daft.yml`.
UX polish. Outline renderer for the listing's vertical-spine timeline; column alignment across invocations; proper sub-second duration rendering (`36ms` instead of `<1m`); tab completion for hooks-jobs commands (job names, invocation prefixes, worktree drill-down); deprecation warnings for the old `hooks jobs clean` verb.
Ultrareview bug fixes (final 20 commits). Wrapping up four findings from a cloud review:
Why now
The three sub-projects feed each other tightly. Sub-project A introduced the log store; B added the cleanup machinery that needs A's persisted retention; the universal-logging work sits between them. Splitting the merge at this stage would mean either landing B without enough of A's machinery to test against (`prune` with no logs to prune), or landing the universal-logging changes on top of an unmerged sub-project A. Cleaner to ship the bundle.
Spec/plan trail
User-facing docs in `docs/guide/hooks.md` (`Background Jobs`, `Log Configuration`) and `docs/cli/daft-hooks-jobs.md`.
Deferred follow-ups
Tracked in `~/.claude/projects/.../memory/project_hooks_jobs_followups.md` so they don't drop on the floor:
Test plan
🤖 Generated with Claude Code