Skip to content

fix: cancel_invocation auto-escalates when graceful cancel isn't enough#976

Merged
gregmagolan merged 1 commit intomainfrom
cancel_fix
Mar 27, 2026
Merged

fix: cancel_invocation auto-escalates when graceful cancel isn't enough#976
gregmagolan merged 1 commit intomainfrom
cancel_fix

Conversation

@gregmagolan
Copy link
Copy Markdown
Member

@gregmagolan gregmagolan commented Mar 27, 2026

Summary

cancel_invocation() now automatically escalates when a graceful cancel isn't enough to stop a Bazel build.

How it works

cancel_invocation() sends the 1st SIGINT to the Bazel client (graceful cancel) and returns a Cancellation object. When wait() is called, if the build hasn't stopped within force_kill_after_ms (default 5000ms), it escalates following Bazel's 3-SIGINT protocol:

Step Signal Bazel behavior
cancel_invocation() 1st SIGINT CancelRequest RPC — stop scheduling new actions
force() / auto-escalation 2nd SIGINT Repeated CancelRequest (server already cancelling)
force() / auto-escalation 3rd SIGINT Bazel's built-in KillServerProcess — server SIGKILL'd, client exits
Last resort (after 5s) SIGKILL We SIGKILL both client and server ourselves

If the client has already crashed, falls back to SIGKILL on the server daemon directly (PID read from <output_base>/server/server.pid.txt). If the server isn't busy, force() is a no-op.

AXL examples

Default — auto-escalates after 5s:

cancellation = ctx.bazel.cancel_invocation()
cancellation.wait()

Custom grace period:

cancellation = ctx.bazel.cancel_invocation(force_kill_after_ms = 10000)
cancellation.wait()

Manual control:

cancellation = ctx.bazel.cancel_invocation(force_kill_after_ms = 0)
if not cancellation.wait(timeout_ms = 5000):
    cancellation.force()

Safety rails

  • wait(timeout_ms=N) with force_kill_after_ms > 0error (ambiguous — pick one mode)
  • wait() with force_kill_after_ms = 0 and no timeout_mserror (would hang forever)

Changes

  • cancel.rs: wait() supports timeout_ms and auto-escalation. force() sends 2nd + 3rd SIGINT (Bazel's 3-stage protocol), monitors client PID, falls back to SIGKILL. Guards against killing an idle server.
  • mod.rs: cancel_invocation accepts force_kill_after_ms param.
  • info.rs: Added server_pid_nonblocking() (reads PID from disk via output_base). Capture stderr from bazel info for diagnostics.
  • axl.axl: New test cases for force-cancel, auto-escalation on a slow build, repeated cancel calls, and stale cancellation objects.
  • examples/slow_build/: Test fixture with a 30s sleep genrule for cancel tests.

Test plan

  • Rust unit tests for validate_wait_params — all 4 combinations of timeout_ms/force_kill_after_ms
  • AXL integration tests:
    • Test 6: cancel_invocation() with bad startup flags
    • Test 7: Manual force() on a running build
    • Test 8: Auto-escalation on a sleep 30 build with force_kill_after_ms=1000
    • Test 9: wait(timeout_ms) on non-busy server returns True
    • Test 10: Three cancel_invocation() calls in a row (discarded results)
    • Test 11: busy/wait()/force() on a stale Cancellation from a finished build

🤖 Generated with Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57d9545d3a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/axl-runtime/src/engine/bazel/mod.rs Outdated
@gregmagolan gregmagolan force-pushed the cancel_fix branch 8 times, most recently from f27700e to b4bc4f4 Compare March 27, 2026 17:05
@gregmagolan gregmagolan changed the title fix: cancel_invocation no longer fails when bazel server info is unavailable fix: cancel_invocation auto-escalates when graceful cancel isn't enough Mar 27, 2026
@gregmagolan gregmagolan force-pushed the cancel_fix branch 10 times, most recently from 467feb7 to b81b940 Compare March 27, 2026 17:50
A single SIGINT only requests graceful cancellation from Bazel, which
often isn't enough to stop a running build. This caused wait() to hang
indefinitely.

cancel_invocation() now accepts force_kill_after_ms (default 5000ms).
When wait() is called and the build hasn't stopped within the grace
period, it automatically escalates: sends a second SIGINT to the Bazel
client (triggering Bazel's forceful cancel protocol), or if the client
has crashed, sends SIGKILL to the server daemon via the PID read from
<output_base>/server/server.pid.txt.

API:
  # Happy path — auto-escalates after 5s (default)
  cancellation = ctx.bazel.cancel_invocation()
  cancellation.wait()

  # Manual control
  cancellation = ctx.bazel.cancel_invocation(force_kill_after_ms = 0)
  if not cancellation.wait(timeout_ms = 5000):
      cancellation.force()

Additional fixes:
- wait(timeout_ms) returns False on timeout instead of hanging forever
- Mixing timeout_ms and force_kill_after_ms is a hard error
- wait() with no timeout and no force_kill_after_ms is a hard error
- Capture stderr from bazel info for diagnostic logging
@gregmagolan gregmagolan merged commit 123ef00 into main Mar 27, 2026
4 checks passed
@gregmagolan gregmagolan deleted the cancel_fix branch March 27, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant