release: Use Jobs to spawn release-runner in separate containers #216

martinpitt · 2018-10-12T08:55:51Z

See https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

This provides several benefits:

Allow multiple releases to happen in parallel. Right now, pushing a
release tag to one project while another release is running (or
hanging), will be silently ignored.
Simplify/robustify the structure of our webhook, as a release request
does not need to be handled synchronously.
Separate the credentials for the webhook and all the release secrets,
providing better isolation in case the web server gets compromised.

In cockpit-release.yaml, drop the release secrets and use the new
"create-job" service account.

Copy webhook's home directory initialization logic to release-runner, so
that this can run in a freshly spawned cockpit/release container with
/run/secrets/release and a random UID.

Simplify webhooks's setup() to drop the passswd setup (now unnecessary,
as this doesn't do any actual work aside from oc create), and only set
up the temporary $HOME with the webhook secret. In theory, the latter
could also be done statically in Dockerfile with a symlink; this is a
simplification that can be done in a follow-up PR.

In webhook, replace the direct invocation of release-runner with a job
that creates a new cockpit/release pod. This is fast, so it can happen
synchronously in the HTTP handler.

This is similar to what commit 5fe95e6 did for the cockpit/tests container. This makes it easier to work in OpenShift environments with unprivileged random UIDs. Also drop the unused /build/rpmbuild directory. This fixes running release-running in the /build directory, as that needs to be empty for `git clone` to actually work.

martinpitt · 2018-10-12T09:11:16Z

I rebuilt the container, rolled it out to dockerhub, and deployed this on CentOS CI. I tried it once again with my cockpit-ostree fork, and I got a successful release: GitHub, COPR, log.

Logs from webhook container:

$ oc logs release-jtlkm  -f
DEBUG:root:event: create, path: /cockpituous-release
DEBUG:root:{"ref":"177","ref_type":"tag","master_branch":"master", [...]
INFO:root:Releasing project https://github.com/martinpitt/cockpit-ostree.git, tag 177, script ./cockpituous-release
job "release-job-cockpit-ostree-177" created

Logs from spawned Job pod:

$ oc logs -f release-job-cockpit-ostree-177-rz7s4
> Initializing /home/user
Cloning into '.'...
HEAD is now at 238af8d disable fedora
http://fedorapeople.org/groups/cockpit/logs/release-cockpit-ostree-177/
> Initializing /home/user
> Starting: release-source
[...]
  09:04:36 Build 808612: running
  09:09:09 Build 808612: succeeded
> Completed job

Unfortunately the job object stays around after success:

$ oc get jobs
NAME          DESIRED   SUCCESSFUL   AGE
release-job-cockpit-ostree-177   1         1            4m


$ oc get pods | grep job
release-job-cockpit-ostree-177-rz7s4   0/1       Completed   0          5m

The ttlSecondsAfterFinished: 0 option in the job (docs) is supposed to clean this up, but it's apparently not yet implemented in the relatively old OpenShift 3.6 setup on CentOS CI. So for now these completed job containers stay around, and we have to clean them up from time to time. But I don't see this as a big issue, they don't take up significant resources. As a workaround, a follow-up PR could regularly remove these. I filed issue #218 about this.

martinpitt · 2018-10-12T09:12:03Z

@stefwalter: Are you interested in reviewing this? (This might also be beneficial for the learn container; happy to help and explain things) If not, I'll ask someone else.

martinpitt · 2018-10-12T10:24:47Z

This needs some work. Apparently this isn't able to create a second "release-job" job even after it completes. So that naming needs to become less static, perhaps with the project name and release tag attached.

martinpitt · 2018-10-12T12:36:25Z

Cockpit 180 release also went through without a hitch: https://fedorapeople.org/groups/cockpit/logs/release-180/log

Just "release-<version>" is not good enough any more in these days and age where we use Cockpituous to release many different projects. Add the project name to the identifier.

Add a restricted "system:serviceaccount:cockpit:create-job" service account that can only handle Jobs to spawn new pods. This needs to be run by the cluster administrator.

martinpitt · 2018-10-12T14:20:22Z

I fixed the ambiguous job object name, and also fixed the release log name on the sink while I was at it. Everything re-rolled out again. I updated the above comment for the new URLs, cockpit-ostree 177 release was successful.

croissanne

Looks good! Just have 1 question.

release/release-runner

See https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/ This provides several benefits: - Allow multiple releases to happen in parallel. Right now, pushing a release tag to one project while another release is running (or hanging), will be silently ignored. - Simplify/robustify the structure of our webhook, as a release request does not need to be handled synchronously. - Separate the credentials for the webhook and all the release secrets, providing better isolation in case the web server gets compromised. In cockpit-release.yaml, drop the release secrets and use the new "create-job" service account. Copy webhook's home directory initialization logic to release-runner, so that this can run in a freshly spawned cockpit/release container with /run/secrets/release and a random UID. Simplify webhooks's setup() to drop the passswd setup (now unnecessary, as this doesn't do any actual work aside from `oc create`), and only set up the temporary $HOME with the webhook secret. In theory, the latter could also be done statically in Dockerfile with a symlink; this is a simplification that can be done in a follow-up PR. In webhook, replace the direct invocation of release-runner with a job that creates a new cockpit/release pod. This is fast, so it can happen synchronously in the HTTP handler. Closes #216

croissanne

Looks good!

martinpitt mentioned this pull request Oct 12, 2018

release: clean up old job objects/pods #218

Closed

martinpitt requested a review from stefwalter October 12, 2018 09:11

martinpitt added the needswork label Oct 12, 2018

martinpitt added 2 commits October 12, 2018 16:10

release: Disambiguate release log sink identifier

81e87ea

Just "release-<version>" is not good enough any more in these days and age where we use Cockpituous to release many different projects. Add the project name to the identifier.

release: Add service account and role objects for using Jobs

d304d92

Add a restricted "system:serviceaccount:cockpit:create-job" service account that can only handle Jobs to spawn new pods. This needs to be run by the cluster administrator.

martinpitt removed the needswork label Oct 12, 2018

martinpitt requested a review from croissanne October 16, 2018 06:50

croissanne reviewed Oct 16, 2018

View reviewed changes

release/release-runner Show resolved Hide resolved

croissanne approved these changes Oct 16, 2018

View reviewed changes

croissanne merged commit 5e79fa7 into cockpit-project:master Oct 16, 2018

martinpitt deleted the release-job branch October 16, 2018 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: Use Jobs to spawn release-runner in separate containers #216

release: Use Jobs to spawn release-runner in separate containers #216

martinpitt commented Oct 12, 2018

martinpitt commented Oct 12, 2018 •

edited

Loading

martinpitt commented Oct 12, 2018

martinpitt commented Oct 12, 2018

martinpitt commented Oct 12, 2018

martinpitt commented Oct 12, 2018

croissanne left a comment

croissanne left a comment

release: Use Jobs to spawn release-runner in separate containers #216

release: Use Jobs to spawn release-runner in separate containers #216

Conversation

martinpitt commented Oct 12, 2018

martinpitt commented Oct 12, 2018 • edited Loading

martinpitt commented Oct 12, 2018

martinpitt commented Oct 12, 2018

martinpitt commented Oct 12, 2018

martinpitt commented Oct 12, 2018

croissanne left a comment

Choose a reason for hiding this comment

croissanne left a comment

Choose a reason for hiding this comment

martinpitt commented Oct 12, 2018 •

edited

Loading