Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: Use Jobs to spawn release-runner in separate containers #216

Merged
merged 4 commits into from
Oct 16, 2018
Merged

release: Use Jobs to spawn release-runner in separate containers #216

merged 4 commits into from
Oct 16, 2018

Conversation

martinpitt
Copy link
Member

See https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

This provides several benefits:

  • Allow multiple releases to happen in parallel. Right now, pushing a
    release tag to one project while another release is running (or
    hanging), will be silently ignored.
  • Simplify/robustify the structure of our webhook, as a release request
    does not need to be handled synchronously.
  • Separate the credentials for the webhook and all the release secrets,
    providing better isolation in case the web server gets compromised.

In cockpit-release.yaml, drop the release secrets and use the new
"create-job" service account.

Copy webhook's home directory initialization logic to release-runner, so
that this can run in a freshly spawned cockpit/release container with
/run/secrets/release and a random UID.

Simplify webhooks's setup() to drop the passswd setup (now unnecessary,
as this doesn't do any actual work aside from oc create), and only set
up the temporary $HOME with the webhook secret. In theory, the latter
could also be done statically in Dockerfile with a symlink; this is a
simplification that can be done in a follow-up PR.

In webhook, replace the direct invocation of release-runner with a job
that creates a new cockpit/release pod. This is fast, so it can happen
synchronously in the HTTP handler.

This is similar to what commit 5fe95e6 did for the cockpit/tests
container. This makes it easier to work in OpenShift environments with
unprivileged random UIDs.

Also drop the unused /build/rpmbuild directory. This fixes running
release-running in the /build directory, as that needs to be empty for
`git clone` to actually work.
@martinpitt
Copy link
Member Author

martinpitt commented Oct 12, 2018

I rebuilt the container, rolled it out to dockerhub, and deployed this on CentOS CI. I tried it once again with my cockpit-ostree fork, and I got a successful release: GitHub, COPR, log.

Logs from webhook container:

$ oc logs release-jtlkm  -f
DEBUG:root:event: create, path: /cockpituous-release
DEBUG:root:{"ref":"177","ref_type":"tag","master_branch":"master", [...]
INFO:root:Releasing project https://github.com/martinpitt/cockpit-ostree.git, tag 177, script ./cockpituous-release
job "release-job-cockpit-ostree-177" created

Logs from spawned Job pod:

$ oc logs -f release-job-cockpit-ostree-177-rz7s4
> Initializing /home/user
Cloning into '.'...
HEAD is now at 238af8d disable fedora
http://fedorapeople.org/groups/cockpit/logs/release-cockpit-ostree-177/
> Initializing /home/user
> Starting: release-source
[...]
  09:04:36 Build 808612: running
  09:09:09 Build 808612: succeeded
> Completed job

Unfortunately the job object stays around after success:

$ oc get jobs
NAME          DESIRED   SUCCESSFUL   AGE
release-job-cockpit-ostree-177   1         1            4m


$ oc get pods | grep job
release-job-cockpit-ostree-177-rz7s4   0/1       Completed   0          5m

The ttlSecondsAfterFinished: 0 option in the job (docs) is supposed to clean this up, but it's apparently not yet implemented in the relatively old OpenShift 3.6 setup on CentOS CI. So for now these completed job containers stay around, and we have to clean them up from time to time. But I don't see this as a big issue, they don't take up significant resources. As a workaround, a follow-up PR could regularly remove these. I filed issue #218 about this.

@martinpitt
Copy link
Member Author

@stefwalter: Are you interested in reviewing this? (This might also be beneficial for the learn container; happy to help and explain things) If not, I'll ask someone else.

@martinpitt
Copy link
Member Author

This needs some work. Apparently this isn't able to create a second "release-job" job even after it completes. So that naming needs to become less static, perhaps with the project name and release tag attached.

@martinpitt
Copy link
Member Author

Cockpit 180 release also went through without a hitch: https://fedorapeople.org/groups/cockpit/logs/release-180/log

Just "release-<version>" is not good enough any more in these days and
age where we use Cockpituous to release many different projects. Add the
project name to the identifier.
Add a restricted "system:serviceaccount:cockpit:create-job" service
account that can only handle Jobs to spawn new pods.

This needs to be run by the cluster administrator.
@martinpitt
Copy link
Member Author

I fixed the ambiguous job object name, and also fixed the release log name on the sink while I was at it. Everything re-rolled out again. I updated the above comment for the new URLs, cockpit-ostree 177 release was successful.

Copy link
Contributor

@croissanne croissanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just have 1 question.

release/release-runner Show resolved Hide resolved
See https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

This provides several benefits:

 - Allow multiple releases to happen in parallel. Right now, pushing a
   release tag to one project while another release is running (or
   hanging), will be silently ignored.
 - Simplify/robustify the structure of our webhook, as a release request
   does not need to be handled synchronously.
 - Separate the credentials for the webhook and all the release secrets,
   providing better isolation in case the web server gets compromised.

In cockpit-release.yaml, drop the release secrets and use the new
"create-job" service account.

Copy webhook's home directory initialization logic to release-runner, so
that this can run in a freshly spawned cockpit/release container with
/run/secrets/release and a random UID.

Simplify webhooks's setup() to drop the passswd setup (now unnecessary,
as this doesn't do any actual work aside from `oc create`), and only set
up the temporary $HOME with the webhook secret. In theory, the latter
could also be done statically in Dockerfile with a symlink; this is a
simplification that can be done in a follow-up PR.

In webhook, replace the direct invocation of release-runner with a job
that creates a new cockpit/release pod. This is fast, so it can happen
synchronously in the HTTP handler.

Closes #216
Copy link
Contributor

@croissanne croissanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@croissanne croissanne merged commit 5e79fa7 into cockpit-project:master Oct 16, 2018
@martinpitt martinpitt deleted the release-job branch October 16, 2018 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants