Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac OS X jobs frequently fail for no apparent reason with missing logs and without ever running if: always () steps #736

Closed
1 of 5 tasks
JasonGross opened this issue Apr 17, 2020 · 31 comments
Assignees
Labels
Area: Apple investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS

Comments

@JasonGross
Copy link

Describe the bug
Frequently my Mac OS X jobs fail. Usually, when a job fails, the subsequent steps display as canceled and take 0s, and the if: always () steps run anyway, as in:
image
(From https://github.com/mit-plv/fiat-crypto/runs/593609449?check_suite_focus=true )
However, the Mac job failures, such as https://github.com/mit-plv/fiat-crypto/pull/753/checks?check_run_id=593732097 , display broken logs, where the failing step has no contents (no down arrow), and all subsequent steps fail with 0s, including the if: always () steps:
image
Furthermore, if I click the three dots and click "View raw logs", the logs are missing; I get directed to a page like https://github.com/mit-plv/fiat-crypto/commit/c31a955db7a356f1788d979ac9b1ed1a4fc67674/checks/593732097/logs which says only

2020-04-16T22:40:17.1881748Z ##[section]Starting: Request a runner to run this job
2020-04-16T22:40:17.9419267Z Requesting a hosted runner in current repository's account/organization with labels: 'macos-latest', require runner match: True
2020-04-16T22:40:18.0349714Z Labels matched hosted runners has been found, waiting for one of them get assigned for this job.
2020-04-16T22:40:18.0610578Z ##[section]Finishing: Request a runner to run this job

Area for Triage: Apple

Question, Bug, or Feature?: Bug

Virtual environments affected

  • macOS 10.15
  • Ubuntu 16.04 LTS
  • Ubuntu 18.04 LTS
  • Windows Server 2016 R2
  • Windows Server 2019

Expected behavior
I should get sensible logs, or, better, the jobs should not be failing at all (they work fine if I restart the job enough times, and they work fine consistently on Linux and often on Windows)

Actual behavior
See above. Link: https://github.com/mit-plv/fiat-crypto/pull/753/checks?check_run_id=593732097

@github-actions github-actions bot added Area: Apple bug Something isn't working needs triage labels Apr 17, 2020
@JasonGross JasonGross changed the title Mac OS X jobs frequently fail for no apparent reason without ever running if: always () steps and missing logs Mac OS X jobs frequently fail for no apparent reason with missing logs and without ever running if: always () steps Apr 17, 2020
@maxim-lobanov
Copy link
Contributor

@JasonGross , thank you for report of this issue!
Unfortunately, this repository manages only image content but we will try to escalate this issue to appropriate team.

@TingluoHuang , @ericsciple , @alepauly , is it something that actions/runner does?

@TingluoHuang
Copy link
Member

@maxim-lobanov can you check runner diagnostic for this job to see why the runner can't upload log? I don't know how to access the runner log for hosted mac pool.

@Darleev Darleev added investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS and removed bug Something isn't working needs triage labels Apr 20, 2020
@JasonGross
Copy link
Author

I'm now seeing this happen on our Linux jobs too, such as https://github.com/mit-plv/fiat-crypto/pull/766/checks?check_run_id=606437846
image
It seems to be happening on the artifact upload step on Linux, maybe the machines are running out of space or something?

@maxim-lobanov
Copy link
Contributor

Hello, Just a quick update, issue can come from bug on our backend. We are still looking at it.

@maxim-lobanov
Copy link
Contributor

@JasonGross , Hello!
Could you please check if you still see the same issues?

@maxim-lobanov
Copy link
Contributor

Closing this for now but please let us know if you still see the same issue

@JasonGross
Copy link
Author

It's been happening less often, but it just happened again : https://github.com/JasonGross/fiat-crypto/runs/697783422
image

I've also seen the Mac OS jobs frequently show up as "cancelled" when I didn't cancel them, and I don't believe anyone else did, either.

So I guess this issue should be re-opened

@maxim-lobanov maxim-lobanov reopened this May 22, 2020
@svenmuennich
Copy link

We've had the same issue on one of our builds scheduled to run Monday through Friday at 01:30 UTC. It would be cancelled randomly after running for about 10-15 minutes.

Last night it succeeded for the first time in weeks but I will continue to monitor.

@joehinkle11
Copy link

I had this problem, so I made an example project to show GitHub support https://github.com/joehinkle11/Mac-GitHub-Actions-Test/actions

They also responded with an email saying

Hi Joe,

Thank you for your continued patience while we investigated these issues. For context, there is an existing issue tracking this:

#736

Due to similar reliability reports and errors when using our current MacOS platform for GitHub Actions, we have decided to make larger changes that will take provide a long-term solution.

We understand that you may continue to experience reliability issues while on the current platform, and hope to provide a better experience as soon as possible. If you notice any issues with billing on the next billing cycle, please reach out.

At this time we have improvements planned for early July and will keep our customers up to date through our blogs and changelog

Please let us know if you have any questions or concerns!

Cheers,
GitHub Support

Hope this helps anyone who is working on an Action and doesn't yet realize it's a bug with GitHub and not their scripts.

@JasonGross
Copy link
Author

I've also had frequent random cancellations of GH Actions jobs (especially Mac OS), with missing logs, such as https://github.com/mit-plv/fiat-crypto/runs/791672004?check_suite_focus=true
image
And here's one where the logs are present https://github.com/mit-plv/fiat-crypto/runs/791678094?check_suite_focus=true :
image

GitHub won't even tell me who canceled these jobs, or why they were canceled. (Was it because I pushed another commit that triggered the workflow? Is GitHub now forcibly canceling jobs on old commits, even those which are on the tip of their branch but are not the newest one running across all branches?)

@cytopia
Copy link

cytopia commented Jun 21, 2020

I can also confirm that jobs on MacOS are cancelled for no apparent reason: https://github.com/cytopia/pwncat/pull/80/checks?check_run_id=792119613

Additionally to say there are not logs or other info regarding why it had been cancelled

@TingluoHuang
Copy link
Member

@TingluoHuang
Copy link
Member

We fixed a configuration issue in the service that causes mac hosted build hit this error every day in 1:00-2:00 AM UTC

@svenmuennich
Copy link

We fixed a configuration issue in the service that causes mac hosted build hit this error every day in 1:00-2:00 AM UTC

That sounds promising 🎉 Is that fix already live?

@TingluoHuang
Copy link
Member

@svenmuennich the fix is already live, and I can confirm from the telemetry that the fix works as expected.

👇 we no longer have the big spike every night.
image

@svenmuennich
Copy link

Great! Thank you 🥇

@maxim-lobanov
Copy link
Contributor

Will keep this issue opened for a few more days, @svenmuennich , @cytopia , @JasonGross , could you please report back if you still see the same issues

@JasonGross
Copy link
Author

We still see the same issues. Here is a build from 8 hours ago (Wed, 24 Jun 2020 08:08:20 GMT) that failed in this way: https://github.com/mit-plv/fiat-crypto/pull/817/checks?check_run_id=802532325
image

Attempting to fetch the raw logs gives

2020-06-24T08:08:06.3800369Z ##[section]Starting: Request a runner to run this job
2020-06-24T08:08:06.6634916Z Can't find any online and idle self-hosted runner in current repository that matches the required labels: 'macos-latest'
2020-06-24T08:08:06.6634949Z Can't find any online and idle self-hosted runner in current repository's account/organization that matches the required labels: 'macos-latest'
2020-06-24T08:08:06.6634965Z Found online and idle hosted runner in current repository's account/organization that matches the required labels: 'macos-latest'
2020-06-24T08:08:06.8916332Z ##[section]Finishing: Request a runner to run this job

which is bizarre.

@maxim-lobanov
Copy link
Contributor

@TingluoHuang , can it be something different?

@TingluoHuang
Copy link
Member

@JasonGross
Copy link
Author

https://github.com/github/c2c-actions-compute/issues/643 is a 404 for me; is there any issue I can track about this (other than this present one)?

@svenmuennich
Copy link

Last night our scheduled build failed again. This time we got an error though:

An error occurred while provisioning resources (Error Type: Disconnect).

No idea whether that is related to this issue.

@maxim-lobanov
Copy link
Contributor

Hello everyone!
We have recently done some changes on our side. Could you please check if you still see the same issue (steps without logs)?

@alexanderkasten
Copy link

alexanderkasten commented Oct 7, 2020

Hello everyone!
We have recently done some changes on our side. Could you please check if you still see the same issue (steps without logs)?

I have the same problem on macOS https://github.com/atlas-engine/AtlasStudio/runs/1215960239.
Bildschirmfoto 2020-10-07 um 09 52 58

@maxim-lobanov

@j-stephan
Copy link

@maxim-lobanov: We also observe this behaviour from time to time. Example: https://github.com/alpaka-group/alpaka/runs/2708464529?check_suite_focus=true

@JasonGross
Copy link
Author

@maxim-lobanov Would you reopen this bug? https://github.com/mit-plv/fiat-crypto/runs/2979458972 has the log-less red ❌'s with missing raw logs
image

2021-07-03T15:24:52.2128487Z Can't find any online and idle self-hosted or hosted runner in the current repository, account/organization that matches the required labels: 'macos-latest'
2021-07-03T15:24:52.2128608Z Found online and busy hosted runner(s) in the current repository's organization account that matches the required labels: 'macos-latest'. Hit concurrency limits on the hosted runners. Waiting for one of them to get assigned for this job.
2021-07-03T15:24:52.2128637Z Waiting for a hosted runner in 'organization' to pick this job...

Download log archive results in an archive which simply does not contain logs for any of the red ❌'s, and with the same incomplete raw logs.

@miketimofeev
Copy link
Contributor

Hi @JasonGross! Sorry to hear that.
I've checked the telemetry and the root cause seems to be the same as here — #3517
We will notify the engineering team about these new cases.

@deivid-rodriguez
Copy link

We're also getting this. Just informing in case having more examples can help isolating the root cause. Example run at https://github.com/rubygems/rubygems/runs/2987926029.

@adam84luong
Copy link

image

I still got the issue, it keep processing very long time and finally automatically canceled with no reason.
while that step in progress, no logs is generated. And if I try to "View Raw Logs" it shows like below

image

@JanBessai
Copy link

Same issue.. "Run tests" part of MacOS-latest jobs randomly hang or take forever in https://github.com/combinators/cls-scala
Decided to factor them out into a separate workflow so I can restart them more easily until this gets resolved.

@JanBessai
Copy link

Any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Apple investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS
Projects
None yet
Development

No branches or pull requests