Summary
The background hook coordinator spawns every background job as a thread in one tight loop and joins them all at the end, without consulting the needs: field. As a result, needs: is honored on paper (parsed, recorded in meta JSON, used by partition logic) but ignored at runtime when the dependency is also a background job.
Repro
daft.yml from a real-world repo (tax-analyzer):
- name: db-migrate
background: true
needs: [install, ensure-db]
run: prisma migrate deploy
- name: db-seed
background: true
needs: [db-migrate] # ← honored on paper, ignored at runtime
run: prisma db seed
After daft repo clone …, both worktrees show the same pattern in their job meta JSON ($DAFT_STATE_DIR/jobs/<inv>/<wt>/{db-migrate,db-seed}/meta.json):
| job |
started_at |
finished_at |
exit |
| db-migrate |
2026-05-02T09:46:14.588257Z |
2026-05-02T09:46:16.963859Z |
0 |
| db-seed |
2026-05-02T09:46:14.588324Z |
2026-05-02T09:46:16.746757Z |
1 |
db-seed started 67μs after db-migrate was spawned — well before it finished. Prisma seed hits P2021: The table 'public.FactType' does not exist. Both worktrees, identical race.
Root cause
src/coordinator/process.rs:95-131 (run_all_with_cancel):
for job in &self.jobs {
let handle = std::thread::spawn(move || {
run_single_background_job(...);
});
handles.push(handle);
}
for handle in handles { handle.join().ok(); }
Every background job is spawned as a thread immediately, in parallel. There is no DAG walk, no waiting on needs:, no result-propagation between threads. job.needs is only ever read at process.rs:224 to copy into the meta JSON for display.
The foreground side honors needs: (src/hooks/yaml_executor/partition.rs even has a test_partition_background_promoted_by_foreground_dependency test for the case where a foreground job depends on a background one). Once the partition is done and the background bucket reaches the coordinator, dependency information is discarded.
Why nobody caught it before
Most other background-on-background dependencies in the wild are forgiving (e.g. lint needs: [typecheck] where typecheck is fast and lint just emits stale errors). Prisma's P2021 is the unforgiving case — schema literally doesn't exist for ~2s during migration, and seed hits it dead-on. The tax-analyzer repo is the first real-world setup we've seen that surfaces it sharply.
Proposed fix
Two shapes:
- Wave-based scheduler (recommended): topo-sort background jobs by
needs:, spawn each wave as parallel threads, wait for the whole wave to finish before spawning the next. Matches the existing partition.rs mental model. Simpler.
- Per-job gates: before each thread runs
run_command, block on a per-dep Arc<(Mutex, Condvar)> (or oneshot::channel) signalled by the dep thread on completion. More concurrency but only matters with deep chains, which we don't have today.
Either way, classify a dep as "satisfied" only when its JobStatus becomes Completed. If a dep ends Failed/Cancelled/Skipped, downstream jobs should be Skipped (not silently spawned anyway).
Regression test
Cover with a unit test in src/coordinator/process.rs that defines two background jobs A → B (B needs: [A]) where A sleeps 200ms and writes a marker file, and B reads-or-fails. After run_all, assert:
B.started_at >= A.finished_at
- both
Completed
- file content as expected
Plus a YAML scenario in tests/manual/scenarios/hooks/ that mirrors the tax-analyzer shape (two background jobs, second needs: first) and asserts ordering via timestamps in the recorded meta.
Severity
Bug, not feature. The YAML schema documents needs: as a dependency declaration, the docs say so, the foreground executor honors it, the partition logic honors it, and the meta JSON faithfully records it. The runtime contract is broken in exactly one place.
Summary
The background hook coordinator spawns every background job as a thread in one tight loop and
joins them all at the end, without consulting theneeds:field. As a result,needs:is honored on paper (parsed, recorded in meta JSON, used by partition logic) but ignored at runtime when the dependency is also a background job.Repro
daft.ymlfrom a real-world repo (tax-analyzer):After
daft repo clone …, both worktrees show the same pattern in their job meta JSON ($DAFT_STATE_DIR/jobs/<inv>/<wt>/{db-migrate,db-seed}/meta.json):db-seedstarted 67μs afterdb-migratewas spawned — well before it finished. Prisma seed hitsP2021: The table 'public.FactType' does not exist. Both worktrees, identical race.Root cause
src/coordinator/process.rs:95-131(run_all_with_cancel):Every background job is spawned as a thread immediately, in parallel. There is no DAG walk, no waiting on
needs:, no result-propagation between threads.job.needsis only ever read atprocess.rs:224to copy into the meta JSON for display.The foreground side honors
needs:(src/hooks/yaml_executor/partition.rseven has atest_partition_background_promoted_by_foreground_dependencytest for the case where a foreground job depends on a background one). Once the partition is done and the background bucket reaches the coordinator, dependency information is discarded.Why nobody caught it before
Most other background-on-background dependencies in the wild are forgiving (e.g.
lint needs: [typecheck]where typecheck is fast and lint just emits stale errors). Prisma'sP2021is the unforgiving case — schema literally doesn't exist for ~2s during migration, and seed hits it dead-on. Thetax-analyzerrepo is the first real-world setup we've seen that surfaces it sharply.Proposed fix
Two shapes:
needs:, spawn each wave as parallel threads, wait for the whole wave to finish before spawning the next. Matches the existingpartition.rsmental model. Simpler.run_command, block on a per-depArc<(Mutex, Condvar)>(oroneshot::channel) signalled by the dep thread on completion. More concurrency but only matters with deep chains, which we don't have today.Either way, classify a dep as "satisfied" only when its
JobStatusbecomesCompleted. If a dep endsFailed/Cancelled/Skipped, downstream jobs should beSkipped(not silently spawned anyway).Regression test
Cover with a unit test in
src/coordinator/process.rsthat defines two background jobs A → B (Bneeds: [A]) where A sleeps 200ms and writes a marker file, and B reads-or-fails. Afterrun_all, assert:B.started_at >= A.finished_atCompletedPlus a YAML scenario in
tests/manual/scenarios/hooks/that mirrors the tax-analyzer shape (two background jobs, secondneeds:first) and asserts ordering via timestamps in the recorded meta.Severity
Bug, not feature. The YAML schema documents
needs:as a dependency declaration, the docs say so, the foreground executor honors it, the partition logic honors it, and the meta JSON faithfully records it. The runtime contract is broken in exactly one place.