Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS runners getting stuck on Job is about to start running on the hosted runner #2609

Open
j-bennet opened this issue May 18, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@j-bennet
Copy link

j-bennet commented May 18, 2023

Describe the bug

The macos jobs on a repo keep getting stuck. From the UI it looks like they are stuck on a "Run tests" step, but there's no output from the tests. The raw log shows this:

2023-05-18T16:47:33.7552355Z Requested labels: macos-latest
2023-05-18T16:47:33.7552694Z Job defined at: coiled/dask-bigquery/.github/workflows/tests.yml@refs/heads/try-fix-ci
2023-05-18T16:47:33.7552805Z Waiting for a runner to pick up this job...
2023-05-18T16:47:34.1259321Z Job is waiting for a hosted runner to come online.
2023-05-18T16:47:40.7586904Z Job is about to start running on the hosted runner: GitHub Actions 6 (hosted)

This is happening on https://github.com/coiled/dask-bigquery.

To Reproduce

Steps to reproduce the behavior:

  1. Open a PR into the repo.
  2. See the macOS jobs get stuck.

Expected behavior

Jobs finish.

Runner Version and Platform

Version of your runner?

macos-latest

What's not working?

macOS jobs hang while running tests, then timeouts. There is no output from the tests until the job is canceled or times out. There are 3 jobs, sometimes 1 or 2 of them finish successfully.

image

Job Log Output

2023-05-18T16:47:33.7552355Z Requested labels: macos-latest
2023-05-18T16:47:33.7552694Z Job defined at: coiled/dask-bigquery/.github/workflows/tests.yml@refs/heads/try-fix-ci
2023-05-18T16:47:33.7552805Z Waiting for a runner to pick up this job...
2023-05-18T16:47:34.1259321Z Job is waiting for a hosted runner to come online.
2023-05-18T16:47:40.7586904Z Job is about to start running on the hosted runner: GitHub Actions 6 (hosted)

Runner and Worker's Diagnostic Logs

Log is attached. The job was canceled after producing no output and hanging for 15 or 20 minutes.

stuck-runner-log.txt

@j-bennet j-bennet added the bug Something isn't working label May 18, 2023
@j-bennet
Copy link
Author

cc @ncclementi

@ruvceskistefan
Copy link
Contributor

Hey @j-bennet,

I looked into your repo and actions, there is no stuck action on mac os, is this still happening for you or we can close the issue?

@ncclementi
Copy link

@ruvceskistefan They all failed because they timed out after a long time, but we can't get consistent runs, for some reason they hang. You can see that the failed ones here https://github.com/coiled/dask-bigquery/actions/runs/5027634794 for example, are all due to time outs.

@jellespijker
Copy link

We're experiencing similar behaviour on our repo https://github.com/Ultimaker/cura-binary-data/actions/
image (2)
image (1)

@sentrivana
Copy link

sentrivana commented Jun 20, 2023

We're seeing the same for non-MacOS runners with our ubuntu-20.04 workflows.

Some workflows don't even attempt to get a runner and will remain in "Waiting" or "Queued" indefinitely, some do but will get stuck on a variation of:

Requested labels: ubuntu-20.04
Job defined at: getsentry/sentry-python/.github/workflows/test-integration-aiohttp.yml@refs/pull/2181/merge
Waiting for a runner to pick up this job...
Job is waiting for a hosted runner to come online.
Job is about to start running on the hosted runner: GitHub Actions 184 (hosted).

I've now cancelled and rerun the latter and it succeeded, so this is intermittent.

@jacsamell
Copy link

We're seeing the same for our builds, gets stuck in Job is about to start:
https://github.com/.../actions/runs/5323119633/jobs/9640530290

Requested labels: ubuntu-latest
Job defined at: .../.github/workflows/root.yml@refs/heads/master
Waiting for a runner to pick up this job...
Job is waiting for a hosted runner to come online.
Job is about to start running on the hosted runner: GitHub Actions 27 (hosted)

Also when try to cancel the job to retry the cancel hangs as well

@FearlessHyena
Copy link

If a job hasn't started within a few minutes of Job is about to start running on the hosted runner: *** the runner most likely has issues so it's never going to start
Rather waiting any longer and than timing out, it be better for Actions to automatically try rescheduling the job on another runner at least once before failing
I've also noticed that the job status when queried via the GH API still shows up as queued instead of in_progress even though the runner has already been selected at that point so it's technically not queued anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants