Skip to content

Background hook coordinator ignores needs: between background jobs #454

@avihut

Description

@avihut

Summary

The background hook coordinator spawns every background job as a thread in one tight loop and joins them all at the end, without consulting the needs: field. As a result, needs: is honored on paper (parsed, recorded in meta JSON, used by partition logic) but ignored at runtime when the dependency is also a background job.

Repro

daft.yml from a real-world repo (tax-analyzer):

- name: db-migrate
  background: true
  needs: [install, ensure-db]
  run: prisma migrate deploy

- name: db-seed
  background: true
  needs: [db-migrate]      # ← honored on paper, ignored at runtime
  run: prisma db seed

After daft repo clone …, both worktrees show the same pattern in their job meta JSON ($DAFT_STATE_DIR/jobs/<inv>/<wt>/{db-migrate,db-seed}/meta.json):

job started_at finished_at exit
db-migrate 2026-05-02T09:46:14.588257Z 2026-05-02T09:46:16.963859Z 0
db-seed 2026-05-02T09:46:14.588324Z 2026-05-02T09:46:16.746757Z 1

db-seed started 67μs after db-migrate was spawned — well before it finished. Prisma seed hits P2021: The table 'public.FactType' does not exist. Both worktrees, identical race.

Root cause

src/coordinator/process.rs:95-131 (run_all_with_cancel):

for job in &self.jobs {
    let handle = std::thread::spawn(move || {
        run_single_background_job(...);
    });
    handles.push(handle);
}
for handle in handles { handle.join().ok(); }

Every background job is spawned as a thread immediately, in parallel. There is no DAG walk, no waiting on needs:, no result-propagation between threads. job.needs is only ever read at process.rs:224 to copy into the meta JSON for display.

The foreground side honors needs: (src/hooks/yaml_executor/partition.rs even has a test_partition_background_promoted_by_foreground_dependency test for the case where a foreground job depends on a background one). Once the partition is done and the background bucket reaches the coordinator, dependency information is discarded.

Why nobody caught it before

Most other background-on-background dependencies in the wild are forgiving (e.g. lint needs: [typecheck] where typecheck is fast and lint just emits stale errors). Prisma's P2021 is the unforgiving case — schema literally doesn't exist for ~2s during migration, and seed hits it dead-on. The tax-analyzer repo is the first real-world setup we've seen that surfaces it sharply.

Proposed fix

Two shapes:

  1. Wave-based scheduler (recommended): topo-sort background jobs by needs:, spawn each wave as parallel threads, wait for the whole wave to finish before spawning the next. Matches the existing partition.rs mental model. Simpler.
  2. Per-job gates: before each thread runs run_command, block on a per-dep Arc<(Mutex, Condvar)> (or oneshot::channel) signalled by the dep thread on completion. More concurrency but only matters with deep chains, which we don't have today.

Either way, classify a dep as "satisfied" only when its JobStatus becomes Completed. If a dep ends Failed/Cancelled/Skipped, downstream jobs should be Skipped (not silently spawned anyway).

Regression test

Cover with a unit test in src/coordinator/process.rs that defines two background jobs A → B (B needs: [A]) where A sleeps 200ms and writes a marker file, and B reads-or-fails. After run_all, assert:

  • B.started_at >= A.finished_at
  • both Completed
  • file content as expected

Plus a YAML scenario in tests/manual/scenarios/hooks/ that mirrors the tax-analyzer shape (two background jobs, second needs: first) and asserts ordering via timestamps in the recorded meta.

Severity

Bug, not feature. The YAML schema documents needs: as a dependency declaration, the docs say so, the foreground executor honors it, the partition logic honors it, and the meta JSON faithfully records it. The runtime contract is broken in exactly one place.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingfixBug fixhooks

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions