Global setting for per step timeout #170

evilmarty · 2016-07-22T04:24:37Z

Per-step timeout is supported in build pipelines via timeout_in_seconds and via the interface but it would be great to set a default timeout_in_minutes either as an agent option or build setting. By default the value could be zero indicate an indefinite timeout.

The reason for this is to avoid agents being stuck on jobs that are either exceptionally too long or stuck because of bugs. Making sure every step is configured with an automatic timeout is difficult to manage, especially with numerous projects that include pipeline definitions in source control.

The text was updated successfully, but these errors were encountered:

ozbillwang · 2017-05-03T04:32:41Z

@evilmarty

The link you provided about timeout_in_seconds has no option about timeout now.

I currently can define timeout via #36 through web interface.

But how to put this timeout option in pipeline.yml?

Is it the same question raised here?

Updates

Thanks, @evilmarty

I search again and found the document: https://buildkite.com/docs/pipelines/command-step

I can add it in pipeline.yml now.

timeout_in_minutes: 60

evilmarty · 2017-05-03T04:47:52Z

The docs have been updated and have removed the step declarations examples. It is in the master branch of your docs so maybe a regression?

My question is how can I set a global timeout in the absence of one being set in the UI or in a YAML file?

avtar · 2018-02-12T21:45:50Z

I'm curious about this as well. Is there a way to have a global timeout that doesn't involve the web interface?

avtar · 2018-03-07T19:17:19Z

Anyone? Bueller?

pda · 2018-06-27T03:36:33Z

I'd very much like to see an agent-level default job timeout so that frozen jobs don't run forever.

This is especially important because the scaling policy for https://github.com/buildkite/elastic-ci-stack-for-aws currently requires zero running jobs before scaling in. So a single frozen job can prevent scale-in and cost lots of money on a large stack.

A configuration option on https://github.com/buildkite/agent would be great — however I did a bit of exploration in the hopes of opening a PR but it looks like the timeout is driven server-side so there's no good way to add the option on the agent without some backend changes.

pda · 2018-06-27T04:01:38Z

Trying to think how this could work as agent configuration when timeouts are backend-driven.

It would be possible to implement an agent-side timeout. However I don't think there's an existing way for the agent to communicate that it was a timeout; it would look like a general command failure. And the agent timeout could race the server-side step timeout if they're similar. The agent API could be extended to allow agent-driven timeouts, but it would still be racy and inconsistent with per-step timeouts. I don't think this is a good idea.

Instead, when an agent connects to the backend it could advertise the default timeout. Then it can be visible on the agent listing etc. When a job is allocated to an agent, it would use the per-step timeout if present, otherwise the agent default timeout. Enforcing the timeout (per-step or per-agent) remains backend driven. That doesn't seem like such a bad option.

keithpitt · 2018-07-03T02:07:03Z

I think this is an important thing to fix! Will move discussion over to the PR.

BRMatt · 2019-06-12T11:05:24Z

Just want to chime in to say this would be really useful - our elastic stack bill went through the roof because we didn't notice a few stuck jobs that prevented our stack from scaling down for ~3w. 😱

If a step in build hangs or takes an unusually long time, previously CI would let it continue, occupying machines forever. In lieu of a global timeout (buildkite/feedback#170, https://forum.buildkite.community/t/pipeline-timeouts/722), we can manually apply a timeout to every step, as a last resort to catch slow/hung builds. This uses the `timeout_in_minutes` (https://buildkite.com/docs/pipelines/command-step#command-step-attributes) optional attribute: > The number of minutes a job created from this step is allowed to run. If the job does not finish within this limit, it will be automatically canceled and the build will fail. Our steps currently range from ~30 seconds to ~10 minutes, so 30 minutes should be a safe "something serious is wrong" timeout. See: #905

goodspark · 2021-10-07T00:40:13Z

Sorry for yet another +1 comment, but this would be really useful.

heidimhurst · 2022-05-10T09:30:20Z

+1, would be very useful

samsarkleio · 2022-06-07T16:14:45Z

+1 would be very useful

heidimhurst · 2022-07-26T11:17:41Z

fwiw this appears to now be available in the UI pipeline settings > builds; see Changelog notes

Suggest closing this issue @evilmarty

gtirloni mentioned this issue Apr 16, 2018

notification for stuck builds #333

Closed

pda mentioned this issue Jun 27, 2018

Proposal for --default-timeout-in-minutes flag buildkite/agent#788

Open

2 tasks

huonw mentioned this issue Feb 20, 2020

Add a timeout to every CI step to halt hung builds stellargraph/stellargraph#906

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global setting for per step timeout #170

Global setting for per step timeout #170

evilmarty commented Jul 22, 2016

ozbillwang commented May 3, 2017 •

edited

Loading

evilmarty commented May 3, 2017

avtar commented Feb 12, 2018

avtar commented Mar 7, 2018

pda commented Jun 27, 2018

pda commented Jun 27, 2018

keithpitt commented Jul 3, 2018

BRMatt commented Jun 12, 2019

goodspark commented Oct 7, 2021

heidimhurst commented May 10, 2022

samsarkleio commented Jun 7, 2022

heidimhurst commented Jul 26, 2022 •

edited

Loading

Global setting for per step timeout #170

Global setting for per step timeout #170

Comments

evilmarty commented Jul 22, 2016

ozbillwang commented May 3, 2017 • edited Loading

Updates

evilmarty commented May 3, 2017

avtar commented Feb 12, 2018

avtar commented Mar 7, 2018

pda commented Jun 27, 2018

pda commented Jun 27, 2018

keithpitt commented Jul 3, 2018

BRMatt commented Jun 12, 2019

goodspark commented Oct 7, 2021

heidimhurst commented May 10, 2022

samsarkleio commented Jun 7, 2022

heidimhurst commented Jul 26, 2022 • edited Loading

ozbillwang commented May 3, 2017 •

edited

Loading

heidimhurst commented Jul 26, 2022 •

edited

Loading