-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Pipes appear to not pass on EOF on macOS runners #884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
DebuggingI've figured out enough to guess that diffing the logsTravis and Actions use slightly different log formats -- and they set
( ) full diff(- is Travis, + is Actions here)
Points in this diff that stand out to me:
|
In https://github.com/kousu/hanging-actions/tree/14572ba1b8b6554599a4b64f64a80f002a68922b/ I'm testing if the same bug shows up with As I expected, this hung too: https://github.com/kousu/hanging-actions/runs/1609762569 -- but I tried writing my own infinite sequence and got two different results:
And in all cases, Travis is still fine! Does the runner do something with |
We're wondering if maybe it has to do with some stray |
Does this repro on any self hosted runner? |
@TingluoHuang , yes, I can reproduce this issue on macOS self-hosted runner. EDIT: |
Thanks for investigating this @maxim-lobanov :) :) |
Any fixes ? |
I just encountered this issue. It would be great if this issue could be resolved to save debugging time for those who encounter this issue. |
I am running into this here: https://github.com/tavianator/homebrew-tap/runs/4874104560?check_suite_focus=true. I think I found the issue: macOS uses runner/src/Runner.Worker/Handlers/ScriptHandler.cs Lines 270 to 279 in c95d5ea
runner/src/Misc/layoutbin/macos-run-invoker.js Lines 1 to 13 in c95d5ea
Old versions of I suspect using Node 16 instead of Node 12 would fix it, but I can't confirm easily because I don't have hands-on access to macOS. |
Looks like #1621 does this. Can anyone check whether it fixes this issue? |
GH actions macOS runners have a bug that makes scripts run with SIGPIPE ignored, ultimately causing `yes` to loop forever instead of exiting when a pipe breaks. Work around it by explicitly resetting the handler using `env` from coreutils.
|
We've run into this same issue testing the new alpha M1 OSX self-hosted runner. Our runner locally has node 18 installed, and I think part of the work to release the new M1 runner was to convert github first party actions to node 16. |
Unclear that this has to do with Node. It seems to be that ProcessInvoker neither catches the final output line from @kousu 's repro case ( |
Describe the bug
(first reported at actions/runner-images#2352)
Using
yes | some_script
anywhere in a workflow on macOS hangs forever.some_script
will terminate butyes
will keep going, blocking the rest of the workflow.To Reproduce
I've made a minimal test case here: https://github.com/kousu/hanging-actions/. It basically just tests:
plus collecting some context for debugging.
To reproduce, just fork my repo and watch its Actions tab. Linux (and Travis!) will pass easily, but Actions on macOS hangs until cancelled.
Expected behavior
All platforms should succeed in approximately the same time and way.
Runner Version and Platform
From my logs:
Request a runner to run this job
Operating System
Virtual Environment
Thanks to @maxim-lobanov we know that it only happens under actions/runner, and not under https://github.com/microsoft/azure-pipelines-agent (even though azure pipelines runs the same macOS images) or when connected over VNC or ssh.
What's not working?
The symptom is that the Linux builds finish immediately while the macOS build hangs in yellow forever:
In fact even Travis on nearly (but not exactly the same) version of macOS works fine.
For example, consider this Actions Workflow and this this Travis script for comparison, everything finishes in about a minute except for Actions-macOS.
This should only take a moment to finish, and on Travis it does:
but on Actions it's at 3 minutes and counting. I've had jobs hung much longer too -- up to their 6 hour limit -- before I noticed what was going on:
The only way to stop the job is to cancel it. It never undeadlocks.
Job Log Output
Runner and Worker's Diagnostic Logs
I don't have access to these! If I install a runner locally and reproduce there I'll update this.
The text was updated successfully, but these errors were encountered: