-
Notifications
You must be signed in to change notification settings - Fork 956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support configure runner as ephemeral. #660
Conversation
If someone uses this on a beta version of GHES how will we handle error messages? |
Hi, this looks like a very great change and really looking forward to that. Would perhaps also be good to extend the docs/design/auth.md with such information. Already many thanks in advance and if you need someone testing that on GHES, I would be happy to help. |
It is a very limited solution for creating fresh environments. |
@TingluoHuang I tried this out, and one thing I found is that the process doesnt seem to exit if the runner was auto updated prior to running its one job. Is that something you are aware of? |
May I know how to use ephemeral with run.sh/runsvc.sh now? |
What's the status on this? I'm assuming this is still waiting for server-side changes, if so, is that publicly being tracked anywhere? I've been working around the "single use" self-hosted runner issues by creating an orchestrator of sorts, which keeps N amount of runners running (all running inside a Docker container) with the This has been fairly unreliably for several reasons:
The upside is that this has provided a very nice way to provide semi-isolated environments for runners/jobs, as each runner would run in a fresh Docker container, but with the downside of additional action containers running on the same host. |
Also interested in an answer to @Shegox question. Given one is running unknown code on a runner. To safely run unknown code we`d like to reset a VM to a Snapshot after every run. To do so one can run on an ephemeral runner. As the unknown code has root permissions there (to access docker or install packages) the assumption is this code could also alter/access the runner process itself. Is it guaranteed that with the access token of the ephemeral runner, malicious code on this runner can not pull another workflow on his instance ? E.g. malicious workflows could extract api tokens from the runner and start a second runner process to pick up another workflow and be able to extract secrets from that workflow then. As of https://github.com/actions/runner/blob/main/docs/design/auth.md i would expect as long as the initial workflow on the ephemeral runner did not finish, its token is valid and malicious code would be able to use that token to fetch more Workflow jobs and extract secrets from them ? Or does the API actually ensure that just a single workflow job can be pulled with the token of the ephemeral worker and not any other ones in github-enterprise ? (its done this way on github.com already) |
Summary: Pull Request resolved: #56929 Artifacts were failing to unzip since they already existed in the current tree so this just forces the zip to go through no matter what Was observing that test phases will fail if attempting to zip over an already existing directory, https://github.com/pytorch/pytorch/runs/2424525136?check_suite_focus=true In the long run however it'd be good to have these binaries built out as part of the regular cmake process instead of being one off builds like they are now **NOTE**: This wouldn't be an issue if `--ephemeral` workers was a thing, see: actions/runner#660 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D28004271 Pulled By: seemethere fbshipit-source-id: c138bc85caac5d411a0126d27cc42c60fe88de60
Summary: Pull Request resolved: pytorch#56929 Artifacts were failing to unzip since they already existed in the current tree so this just forces the zip to go through no matter what Was observing that test phases will fail if attempting to zip over an already existing directory, https://github.com/pytorch/pytorch/runs/2424525136?check_suite_focus=true In the long run however it'd be good to have these binaries built out as part of the regular cmake process instead of being one off builds like they are now **NOTE**: This wouldn't be an issue if `--ephemeral` workers was a thing, see: actions/runner#660 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D28004271 Pulled By: seemethere fbshipit-source-id: c138bc85caac5d411a0126d27cc42c60fe88de60
Summary: Pull Request resolved: pytorch#56929 Artifacts were failing to unzip since they already existed in the current tree so this just forces the zip to go through no matter what Was observing that test phases will fail if attempting to zip over an already existing directory, https://github.com/pytorch/pytorch/runs/2424525136?check_suite_focus=true In the long run however it'd be good to have these binaries built out as part of the regular cmake process instead of being one off builds like they are now **NOTE**: This wouldn't be an issue if `--ephemeral` workers was a thing, see: actions/runner#660 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D28004271 Pulled By: seemethere fbshipit-source-id: c138bc85caac5d411a0126d27cc42c60fe88de60
Hi, just wanted to check if this is still on the roadmap? We have an autoscaling group of self-hosted runners but it's very unreliable - we often just get "this check failed" with no log output after jobs time out, which I assume is because the service is allocating jobs to the runners that are scaling down. We really need to be able to configure the runners as ephemeral, but if that's not going to ship any time soon we will have to look at another approach. |
@bryanmacfarlane @TingluoHuang 👋 could you please provide an update on whether this will be merged (or general support for ephemeral runners in general)? |
@sethvargo We're actively working on this. We should merge this sometime this month probably sooner :) |
@lokesh755 @TingluoHuang Any updates to when this will hit production? |
Can I already use |
This shipped in the latest release, v2.282.0. |
Oh, I would like to know if I need to update server-side GHES as I got |
With the new --ephemeral flag, is there a way to have the config.sh wait until the runner has de-registered? As an example, if I make a docker image that runs a shell script on startup to register an ephemeral runner, what is the best way to have the script wait until the runner is done so the container doesn't exit? |
I use a shell script as a docker entrypoint, which calls An older version of this (based on I'm working on a PR to update that to use |
This PR currently breaks github actions in GHES. --once does not work anymore and --ephemeral not supported great. And github actions runner force updating itself to newest |
@zetaab I don't think we changed any behavior for |
|
I think that should only give you a warning but not actually fail anything. |
I'm getting |
|
@TingluoHuang As far as I can tell, --once is no longer usable with this update. As @rofafor said, it is no longer accepted as a valid flag. See https://github.com/actions/runner/pull/660/files#diff-b1f59ae3d34d9d3811ce43ed0214576cb4d9f3373a6734adf1318b5ab7e535eeL35 |
Like @rofafor said: I think your idea was to deprecate flag, but you actually removed it also. So now the problem is that --once does not work anymore. When using GHES --ephemeral does not work either. |
@aidan-mundy , @zetaab can you confirm that you are unable to use the If it doesn't work, please file an issue and provide your runner version and os. |
Here is what i just tried.
We do print out an error but the flag is no longer recognized, but the runner is still able to connect to the server and run a single job, and exit. Do I miss something here? |
The |
@rofafor We will keep the |
I want to add a summary here so it's obvious if you land here wondering about
If you have any issues with ephemeral/once please feel free to reach out (this issue works but you can also use the community support forms which might have better support for customer questions and let us file support tickets to help you). More information can be found in this runner issue. |
@hross When you say "next version" do you mean in a quarter (with V3.3.0) or in a couple weeks (with V3.2.1)? |
Disclaimer: not a GitHub employee GitHub normally releases feature only in minor (3.x) releases and not in patch releases (3.2.x). So I wouldn't actually expect it before 3.3.x (and maybe even later, but thats up to GitHub to confirm). The GitHub roadmap currently doesn't specify any concrete date for it. |
@Shegox is right. We will land it in 3.3.x (next version meaning "next major release"). |
For those of you that are enterprise server users and are waiting for this functionality, 3.3.0.rc1 is now available for preview. It includes the (looks like my estimate of "in a quarter" was slightly pessimistic, happy to see the prompt update from the GHES team!) |
Is there an easy way to run a command after |
@Manouchehri I've achieved this by running ephemeral under systemd and then using a |
The service will make sure to only ever send one job to this runner.
The service will remove the runner registration from service after the job finish.