-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release: Use Jobs to spawn release-runner in separate containers #216
Conversation
This is similar to what commit 5fe95e6 did for the cockpit/tests container. This makes it easier to work in OpenShift environments with unprivileged random UIDs. Also drop the unused /build/rpmbuild directory. This fixes running release-running in the /build directory, as that needs to be empty for `git clone` to actually work.
I rebuilt the container, rolled it out to dockerhub, and deployed this on CentOS CI. I tried it once again with my cockpit-ostree fork, and I got a successful release: GitHub, COPR, log. Logs from webhook container:
Logs from spawned Job pod:
Unfortunately the job object stays around after success:
The |
@stefwalter: Are you interested in reviewing this? (This might also be beneficial for the learn container; happy to help and explain things) If not, I'll ask someone else. |
This needs some work. Apparently this isn't able to create a second "release-job" job even after it completes. So that naming needs to become less static, perhaps with the project name and release tag attached. |
Cockpit 180 release also went through without a hitch: https://fedorapeople.org/groups/cockpit/logs/release-180/log |
Just "release-<version>" is not good enough any more in these days and age where we use Cockpituous to release many different projects. Add the project name to the identifier.
Add a restricted "system:serviceaccount:cockpit:create-job" service account that can only handle Jobs to spawn new pods. This needs to be run by the cluster administrator.
I fixed the ambiguous job object name, and also fixed the release log name on the sink while I was at it. Everything re-rolled out again. I updated the above comment for the new URLs, cockpit-ostree 177 release was successful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just have 1 question.
See https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/ This provides several benefits: - Allow multiple releases to happen in parallel. Right now, pushing a release tag to one project while another release is running (or hanging), will be silently ignored. - Simplify/robustify the structure of our webhook, as a release request does not need to be handled synchronously. - Separate the credentials for the webhook and all the release secrets, providing better isolation in case the web server gets compromised. In cockpit-release.yaml, drop the release secrets and use the new "create-job" service account. Copy webhook's home directory initialization logic to release-runner, so that this can run in a freshly spawned cockpit/release container with /run/secrets/release and a random UID. Simplify webhooks's setup() to drop the passswd setup (now unnecessary, as this doesn't do any actual work aside from `oc create`), and only set up the temporary $HOME with the webhook secret. In theory, the latter could also be done statically in Dockerfile with a symlink; this is a simplification that can be done in a follow-up PR. In webhook, replace the direct invocation of release-runner with a job that creates a new cockpit/release pod. This is fast, so it can happen synchronously in the HTTP handler. Closes #216
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
See https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
This provides several benefits:
release tag to one project while another release is running (or
hanging), will be silently ignored.
does not need to be handled synchronously.
providing better isolation in case the web server gets compromised.
In cockpit-release.yaml, drop the release secrets and use the new
"create-job" service account.
Copy webhook's home directory initialization logic to release-runner, so
that this can run in a freshly spawned cockpit/release container with
/run/secrets/release and a random UID.
Simplify webhooks's setup() to drop the passswd setup (now unnecessary,
as this doesn't do any actual work aside from
oc create
), and only setup the temporary $HOME with the webhook secret. In theory, the latter
could also be done statically in Dockerfile with a symlink; this is a
simplification that can be done in a follow-up PR.
In webhook, replace the direct invocation of release-runner with a job
that creates a new cockpit/release pod. This is fast, so it can happen
synchronously in the HTTP handler.