self hosted runner is not accepting jobs from queue. #592

npalm · 2020-07-15T18:35:16Z

Describe the bug
Self hosted idle runner is not consuming queued jobs (runner version 263, release version).

To Reproduce
Steps to reproduce the behavior:

Assume you have a runner in the state offline (self-hosted)
New action workflow is triggered
Create a new runner (config and run). We scale them automatically
New runner is not consuming the queued jobs.

This was working perfectly for a few months, but got recently broeken.

It got a bit more strange, trigger a new workflow. The runner wil see this jobs, start an upgrade to the prereleas, see Auto updater wants to update to a pre-release version #581 . And jobs stars, Queued jobs remains stucks today (2020-07-15 between 12 and 18 CET). Now around 20:30 CET the queued jobs are consumed as well. But all still strange since it is the event of the second trigger that cause the update of a non released version. Should 267 not be released?

Expected behavior
Runner in the idle status should consume jobs (labels are matching).

Runner Version and Platform

GitHub cloud + runner version 263

OS of the machine running the runner? OSX/Windows/Linux/...

What's not working?

No error message, see behaviror above.

Job Log Output

If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.

Runner and Worker's Diagnostic Logs

n/a

reference: our setup is available here: https://github.com/philips-labs/terraform-aws-github-runner

N2D4 · 2020-07-15T20:14:23Z

EDIT: In our case, it turned out to be a misconfiguration on our side. As soon as the runner exited, we rebooted the machine, which caused the update to fail. I'll leave this comment here in case it helps someone else regardless.

Original:

Same issue here, runner seems to try updating from 2.263.0 to 2.267.1 every time we queue a job, and does not actually run the job. Logs:

2020-07-15 19:37:23Z: Listening for Jobs
Runner update in progress, do not shutdown runner.
Downloading 2.267.1 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.

√ Connected to GitHub

2020-07-15 19:38:55Z: Listening for Jobs
Runner update in progress, do not shutdown runner.
Downloading 2.267.1 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.

√ Connected to GitHub

2020-07-15 19:39:34Z: Listening for Jobs

This seems to have started a few hours ago.

mforutan · 2020-08-10T01:37:17Z

We are seeing the same issue, our setup has two jobs which first one is scaling up the runner if needed and wait for it, then second job is running the actual pipeline, but some time the second job doesn't start when the runner state is offline even after the runner is ready to accept jobs, this is not happening if the runner is in the idle state instead of offline state

Edit: Adding 60s sleep time at the end of the first job and combining with needs: scale-job attribute for the second job works as workaround for us

patrickmscott · 2020-08-12T14:54:01Z

We are having the same issue. If jobs are queued when all runners are offline, those jobs are never run when runners come back online.

PickledChris · 2020-08-24T13:11:01Z

+1, we have this issue as well, we have an offline runner for each repo, then schedule the most recent runner version when we detect a workflow.
Manually restarting the job seems to work as there's then a runner scheduled for it to be deployed to, but I'd have to build some automation to automatically restart jobs that had been queued...

MartinNowak · 2020-08-28T18:21:34Z

It behaves as if the workflows/runs are scheduled onto specified runners when they are created/triggered, so that they wouldn't get rescheduled when new runners are added.
Not sure what the problem is (in particular since @npalm mentioned it did work a while ago.
Is the source code available somewhere?

vitobotta · 2020-09-10T20:35:44Z

Having this issue right now. I have idle workers "listening for jobs" and nothing happens. Workflows are not started. I tried cancelling and restarting but it doesn't seem to help. I started to see this today. Anyone else experiencing this at the moment?

deeno35 · 2020-09-10T21:36:22Z

Same for us. Restarting didn't do anything, but I manually downloaded actions-runner-linux-x64-2.273.1.tar.gz and untar'd right on top of the existing install and then restarted the daemon. That got jobs flowing again.

I have a feeling the updater was having issues updating from 2.273.0 -> 2.273.1 and the server side was not placing jobs on runners running a previous version.

This has happened to me quite a number of times in various ways since I've started using github actions (5 or 6?) where the runner tried to pick up a new release and jobs stopped running because of that and needed manual intervention.

deeno35 · 2020-09-15T18:24:50Z

It's back! 2.273.1 -> 2.273.2 auto update killed our runner. Upon manual restart, the instance is sitting around idle, yet jobs are waiting to be picked up. Manual untar of 2.273.2 right on top of my current install + service restart fixed it again.

This is the cause of the runner dying in the first place (I've seen this 3 or 4 times):

/home/ec2-user/actions-runner/_work/_update.sh: line 31: nul: Permission denied

Is there a reason /dev/null isn't being used in _update.sh? Is this to be platform agnostic?

lokesh755 · 2020-09-16T16:20:23Z

We’re aware of the current issue where runners added after queuing a run will not pick up the jobs if the added runner is not the latest version. We’re rolling out the fix and will update the issue again once it’s deployed everywhere. In meantime, you can unblock yourselves by adding the latest runner every time.

lanen · 2020-09-18T05:49:25Z

Is that all right now ?

My Workflow also hang in queued. My Self-host version is : actions-runner-linux-arm64-2.272.3.tar.gz

[2020-09-18 03:01:20Z WARN GitHubActionsService] Authentication failed with status code 401.
Transfer-Encoding: chunked^M
WWW-Authenticate: Bearer^M
Strict-Transport-Security: max-age=2592000^M
X-TFS-ProcessId: c244f0e2-fc5b-4e43-9aad-94f7924d2494^M
ActivityId: 6333337f-d4cd-464f-8b37-d79565ebf02c^M
X-TFS-Session: b10f7d39-de6b-4a31-90a0-7a218a3041b8^M
X-VSS-E2EID: 2da95130-1a17-4e57-86a2-e70746624609^M
X-VSS-SenderDeploymentId: 13a19993-c6bc-326c-afb4-32c5519f46f0^M
X-TFS-ServiceError: The+user+%27System%3aPublicAccess%3baaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa%27+is+not+authorized+to+access+this+resource.^M
X-VSS-S2STargetService: 0000005A-0000-8888-8000-000000000000/visualstudio.com^M
X-MSEdge-Ref: Ref A: 9DC762193D88407F83C306B199AF8276 Ref B: HKBEDGE0306 Ref C: 2020-09-18T03:01:20Z^M
Date: Fri, 18 Sep 2020 03:01:20 GMT^M

lokesh755 · 2020-09-30T13:22:04Z

Fix has been deployed. And new runners (with older versions) should be able to pick the jobs that are queued before adding them.

lokesh755 · 2020-09-30T13:23:00Z

@lanen It seems like a different issue. Did you check on the UI, if the runner exists or it's deleted?

pamidu-A · 2020-10-12T09:29:13Z

@lokesh755 i am still facing same issue (runner version : v2.273.5) runners didn't pick the jobs that are queued before adding them.

joseproura · 2021-02-08T16:06:45Z

Same happens to me with the last verison 2.276.1, I have been able to solve adding a delay in the creation of a fargate that runs the runner, but something happens when 2 jobs fire to close and one of them is missed

PavelSusloparov · 2021-03-30T21:54:49Z

Have a similar issue specifically with the organization setup.
Personal GitHub queue work fine for the same scenario and jobs are getting picked up.

lokesh755 · 2021-03-31T12:13:00Z

@PavelSusloparov Could you provide your org and repo details?

RobinDaugherty · 2021-05-31T16:08:42Z

Same issue here with actions-runner-osx-x64-2.278.0

darwinz · 2021-06-16T19:41:51Z

We also had the same issue running version 2.277.1 on Ubuntu (actions-runner-linux-x64-2.277.1.tar.gz). We ended up upgrading manually to version 2.278.0 to get past the issue

igagis · 2021-06-22T10:56:48Z

I have self-hosted runner on ARM machine and in about 50% of cases it does not pick up the job, no matter if it is queued or not. Cancelling the run and starting again makes it pick up the job. It is very annoying that I have to cancel and re-run all jobs almost for every commit because of this issue.

Runner version 2.278.0

ViacheslavKudinov · 2021-07-09T09:21:53Z

Hi @lokesh755 it is still valid issue on 2.278.0.
We have it now.
In run log:

Found online and idle self-hosted runner(s) in the current repository's organization/enterprise account that matches the required labels: 'self-hosted, ****'
Waiting for a self-hosted runner to pick up this job...

but at the same time new 2 runners were spin up.

project-administrator · 2021-08-04T07:24:19Z

I've just re-created the runner. First problem is that it does not pick up the job:

Found online and idle self-hosted runner(s) in the current repository that matches the required labels: ...
Waiting for a self-hosted runner to pick up this job...

Second problem is that I am not able to delete the old offline runner from the web-ui:

Sorry, there was a problem deleting your runner.

chrisdone · 2021-12-06T14:46:12Z

Second problem is that I am not able to delete the old offline runner from the web-ui:

Sorry, there was a problem deleting your runner.

Me neither.

Not happy with this at all.

I'll just have to make a fresh runner on github one and make my ci scripts stop using the old one.

igagis · 2021-12-06T14:51:23Z

Because of this issue I nowadays don't use official github's runner for self-hosted machines. I switched to using this alternative runner and it works perfectly for me. It can even be installed via Debian package.

btmurrell · 2023-11-16T19:46:52Z

We had the same problem. The solution was very obscure. A support person asked me "did you recently change this repo from private to public?" -- yes, i did... why would that matter?

There is a security setting in your runner group (something i never configured) which, by default, prevents a self-hosted runner in that group from picking up jobs from public repos. Change it to suit your needs, heeding the warning.

npalm added the bug Something isn't working label Jul 15, 2020

npalm mentioned this issue Jul 16, 2020

Runners not executing jobs, just idle and shut down philips-labs/terraform-aws-github-runner#74

Closed

This was referenced Aug 26, 2020

Self hosted runners on idle, but doesn't pickup any job/workflow #679

Closed

Auto updater wants to update to a pre-release version #581

Closed

lokesh755 closed this as completed Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

self hosted runner is not accepting jobs from queue. #592

self hosted runner is not accepting jobs from queue. #592

npalm commented Jul 15, 2020 •

edited

N2D4 commented Jul 15, 2020 •

edited

mforutan commented Aug 10, 2020 •

edited

patrickmscott commented Aug 12, 2020

PickledChris commented Aug 24, 2020 •

edited

MartinNowak commented Aug 28, 2020

vitobotta commented Sep 10, 2020

deeno35 commented Sep 10, 2020

deeno35 commented Sep 15, 2020 •

edited

lokesh755 commented Sep 16, 2020 •

edited

lanen commented Sep 18, 2020

lokesh755 commented Sep 30, 2020 •

edited

lokesh755 commented Sep 30, 2020

pamidu-A commented Oct 12, 2020

joseproura commented Feb 8, 2021

PavelSusloparov commented Mar 30, 2021

lokesh755 commented Mar 31, 2021

RobinDaugherty commented May 31, 2021

darwinz commented Jun 16, 2021 •

edited

igagis commented Jun 22, 2021

ViacheslavKudinov commented Jul 9, 2021 •

edited

project-administrator commented Aug 4, 2021

chrisdone commented Dec 6, 2021

igagis commented Dec 6, 2021

btmurrell commented Nov 16, 2023 •

edited

self hosted runner is not accepting jobs from queue. #592

self hosted runner is not accepting jobs from queue. #592

Comments

npalm commented Jul 15, 2020 • edited

Runner Version and Platform

What's not working?

Job Log Output

Runner and Worker's Diagnostic Logs

N2D4 commented Jul 15, 2020 • edited

mforutan commented Aug 10, 2020 • edited

patrickmscott commented Aug 12, 2020

PickledChris commented Aug 24, 2020 • edited

MartinNowak commented Aug 28, 2020

vitobotta commented Sep 10, 2020

deeno35 commented Sep 10, 2020

deeno35 commented Sep 15, 2020 • edited

lokesh755 commented Sep 16, 2020 • edited

lanen commented Sep 18, 2020

lokesh755 commented Sep 30, 2020 • edited

lokesh755 commented Sep 30, 2020

pamidu-A commented Oct 12, 2020

joseproura commented Feb 8, 2021

PavelSusloparov commented Mar 30, 2021

lokesh755 commented Mar 31, 2021

RobinDaugherty commented May 31, 2021

darwinz commented Jun 16, 2021 • edited

igagis commented Jun 22, 2021

ViacheslavKudinov commented Jul 9, 2021 • edited

project-administrator commented Aug 4, 2021

chrisdone commented Dec 6, 2021

igagis commented Dec 6, 2021

btmurrell commented Nov 16, 2023 • edited

npalm commented Jul 15, 2020 •

edited

N2D4 commented Jul 15, 2020 •

edited

mforutan commented Aug 10, 2020 •

edited

PickledChris commented Aug 24, 2020 •

edited

deeno35 commented Sep 15, 2020 •

edited

lokesh755 commented Sep 16, 2020 •

edited

lokesh755 commented Sep 30, 2020 •

edited

darwinz commented Jun 16, 2021 •

edited

ViacheslavKudinov commented Jul 9, 2021 •

edited

btmurrell commented Nov 16, 2023 •

edited