Add optional timeout for waiting for jobs in ephemeral mode? #60

nwf · 2022-05-16T18:50:02Z

Could we get a way to bound the amount of time spent waiting for a job, at

Line 879 in e596b4f

    
           err := vssConnection.RequestWithContext(xctx, "c3a054f6-7a8a-49c0-944e-3a8e5d7adfd7", "5.1-preview", "GET", map[string]string{

, if we're --ephemeral?

In particular, the concerning scenarios are of this form: if someone creates a PR which kicks off a request for ephemeral runners, and then cancels the workflow before the ephemeral runner is actually ready, the runner will subsequently come up and get stuck, here, because no jobs remain in GitHub's queue.

This is tangentially related to #59, in that the runner doesn't know what job it was created for and so doesn't know that the job has already been cancelled. Moreover, while we do get a cancellation push message (a workflow_job message indicating "completed" but with a null runner), we can't teardown the environment of the runner associated by job ID, because it might have picked up a different job with the same labels in the same repository. It's all kind of sad. :(

Anyway, if there's existing support for this and I've merely overlooked it, I'm sorry for the noise.

The text was updated successfully, but these errors were encountered:

ChristopherHX · 2022-05-16T19:54:18Z

I'm not shure if you need a timeout to do that.

The following feature is not documented and is different compared to actions/runner

Send sigint to the runner process, if you only send it once this runner keep any job running and stops as soon as no job is running.

Another sigint / sigterm will cancel any running job, which is not what you want

A timeout to trigger the same behavior could be added

nwf · 2022-05-16T22:55:26Z

Oh, fantastic! Yes, that should work great. Thanks!

nwf-msr · 2022-05-25T17:19:47Z

Ah, but, minor issue: it looks like ephemeral runners don't clean themselves up if told to stop waiting with a single SIGINT, so they linger as registered on github and need to be manually cleaned up.

ChristopherHX · 2022-05-29T20:24:53Z

ephemeral runners don't clean themselves up

I tried to do it, but it seems like the registred agent haven't enough permission to delete itself from the service. The actions service seem to delete an ephemeral runner only after it received one job, otherwise you either have to delete it with a runner registration/delete / PAT token or wait 30 days then githubs does it for you.

You will see the same behavior for actions/runner.

nwf · 2022-05-29T20:58:02Z

Thanks for investigating! I will see about adding logic on the management node to de-register the runner if it sees a runner time out while waiting.

nwf closed this as completed May 16, 2022

bduffany mentioned this issue Feb 21, 2024

Option to limit runner to a particular run or job actions/runner#620

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional timeout for waiting for jobs in ephemeral mode? #60

Add optional timeout for waiting for jobs in ephemeral mode? #60

nwf commented May 16, 2022

ChristopherHX commented May 16, 2022 •

edited

nwf commented May 16, 2022

nwf-msr commented May 25, 2022

ChristopherHX commented May 29, 2022

nwf commented May 29, 2022

Add optional timeout for waiting for jobs in ephemeral mode? #60

Add optional timeout for waiting for jobs in ephemeral mode? #60

Comments

nwf commented May 16, 2022

ChristopherHX commented May 16, 2022 • edited

nwf commented May 16, 2022

nwf-msr commented May 25, 2022

ChristopherHX commented May 29, 2022

nwf commented May 29, 2022

ChristopherHX commented May 16, 2022 •

edited