feat: add BUILDKITE_JOB_TIMED_OUT env var for hooks#3871
Open
mastermanas805 wants to merge 3 commits intobuildkite:mainfrom
Open
feat: add BUILDKITE_JOB_TIMED_OUT env var for hooks#3871mastermanas805 wants to merge 3 commits intobuildkite:mainfrom
mastermanas805 wants to merge 3 commits intobuildkite:mainfrom
Conversation
When a Buildkite job is cancelled because of a job-level timeout, the post-command hook now sees BUILDKITE_JOB_TIMED_OUT=true alongside the existing BUILDKITE_JOB_CANCELLED, so hooks can distinguish a timeout from a manual cancellation without having to do arithmetic on job start time and timeout configuration. The job runner already polls GetJobState and cancels on the canceling /canceled states. It now also recognises the timing_out/timed_out states and routes those through a new CancelReasonJobTimeout. Before sending the interrupt signal, the agent drops a marker file at a path it has shared with the bootstrap subprocess via BUILDKITE_AGENT_JOB_TIMEOUT_FILE. The executor's Cancel handler reads that path and, if the file is present, sets BUILDKITE_JOB_TIMED_OUT on the shell env, which then propagates into the post-command hook just like BUILDKITE_JOB_CANCELLED does today (buildkite#3213). The marker file lives in the same temp directory used for the job env files and is removed during cleanup. Failing to write or remove it is non-fatal: the cancellation still proceeds, and the hook just sees the existing BUILDKITE_JOB_CANCELLED. Fixes buildkite#3592
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds
BUILDKITE_JOB_TIMED_OUTfor hooks (andCancelReasonJobTimeout) so post-command hooks can distinguish timeout from other cancellations. Mirrors#3213'sBUILDKITE_JOB_CANCELLEDdesign.Fixes #3592
The agent doesn't enforce timeouts itself —
JobRunner.jobCancellationCheckerpollsGetJobStateand now routestiming_out/timed_outserver states throughCancel(CancelReasonJobTimeout). Cross-process signal → bootstrap → hook env uses a per-job marker file (path passed viaBUILDKITE_AGENT_JOB_TIMEOUT_FILE, protected from job-level override): agent writes the marker before signaling, bootstrap'sCancelreads it and setsBUILDKITE_JOB_TIMED_OUT=trueon the hook env.Testing
go test ./...cleango test -race ./agent/ ./internal/job/ ./env/cleango tool gofumpt -extra -d .cleangolangci-lint run0 issuesexecutor.Cancel(no-marker / marker-present / bad-path),jobTimeoutFilePath, andCancelReason.String()Disclosures / Credits
Used Claude Code (Opus 4.7) to design the marker-file plumbing and draft the patch. Reviewed and tested locally.