Skip to content

fix(prefork): propagate cancel_running_job to running children (#82)#141

Merged
pratyush618 merged 5 commits into
masterfrom
fix/prefork-cancel-running
May 6, 2026
Merged

fix(prefork): propagate cancel_running_job to running children (#82)#141
pratyush618 merged 5 commits into
masterfrom
fix/prefork-cancel-running

Conversation

@pratyush618
Copy link
Copy Markdown
Collaborator

Summary

  • queue.cancel_running_job(job_id) now stops a cooperatively-checking task running in a prefork child within ~100 ms instead of being ignored until the task completes naturally.
  • Adds a deterministic side channel: WorkerDispatcher::notify_cancel (default no-op) → bounded crossbeam channel → cancel-router thread in PreforkPool → in-band Cancel { job_id } IPC message → child-side stdin reader thread → local cancel set consulted by current_job.check_cancelled() via a registered hook.
  • Cross-platform: no SIGUSR1 or signal-handling — pure stdio pipes, so behaviour is identical on Linux, macOS, and (when prefork is supported) Windows.
  • Child is not killed by cancel; subsequent jobs continue to be served on the same pool.
  • Storage cancel flag (Storage::request_cancel) is still set, so pending-job cancel and the existing thread-pool path are unchanged.

Closes #82.

Test plan

  • cargo check --workspace (default + postgres + redis + native-async)
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test --workspace
  • uv run python -m pytest tests/python/ (498 passed, 9 skipped)
  • New test_prefork_cancel_running_job_stops_quickly and test_prefork_cancel_does_not_kill_child exercise the full end-to-end path
  • uv run ruff check py_src/ tests/
  • uv run mypy py_src/taskito/ --no-incremental

In-process pools (native-async, classic async) implement notify_cancel
as a no-op, so installing them on `self.dispatcher` is unnecessary — and
breaks shutdown when native-async is enabled, because the parent thread
keeps an `Arc<PyObject>` alive (async executor → PyResultSender → result
channel sender), preventing the drain loop's `recv` from observing
disconnection until the 30s drain timeout. Restore the original
"dispatcher lives only inside the runtime task" ownership for in-process
pools.
@pratyush618 pratyush618 merged commit 79479fc into master May 6, 2026
19 checks passed
@pratyush618 pratyush618 deleted the fix/prefork-cancel-running branch May 6, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prefork: queue.cancel(job_id) does not propagate to running children

1 participant