RFC: Kubernetes Runtime #2

topherbullock · 2018-04-17T20:41:19Z

Status: draft

Still more details and thoughts to collect; specifically around how the new GC changes impact how a K8s runtime should clean up resources. I'm sure more questions will come up around using K8s effectively.

Please submit comments and feedback on individual lines of the proposal markdown, so that we can track open conversations.

edude03

Persistent Volume Claims is likely the right abstraction for this, however one thing I don't understand about concourse is how it manages volumes. Right now, if you run Concourse on Kubernetes and give the worker a Persistent Volume to put the btrfs image on, you often run into an issue where the state ATC expects the worker to be in is not the same as the actual state.

It seems to me that this approach should work fine - when the worker restarts the PV has the same volume image and thus inputs, resource etc should still be there.

Does baggaeclaim do something unusual with how it references its images that causes it to fail if the volume is unmounted and remounted?

02-k8s-runtime/proposal.md

+provides and interface the ATC can leverage to manage volumes similar to how
+Baggageclaim Volumes are managed. Could a replica set place a worker on each K8s
+Node and use `HostPath` Volumes or create Persistent Volumes to store Resource
+Caches, Task Caches, and Image Resource Caches, etc.?


02-k8s-runtime/proposal.md

+create a container, and K8s will cache these images. In order to support this
+as the preferred way to define container images, we will need to find a viable
+solution which saves the exported contents from `image_resource` to the K8s
+registry.


02-k8s-runtime/proposal.md

+Could a CustomResourceDefinition (CRD) be used to represent the Containers
+created for Concourse Tasks and Resources? This would allow a user or operator
+to easily recognize and differentiate Concourse Containers and their
+corresponding workloads from other workloads on the K8s Cluster.


02-k8s-runtime/proposal.md

+
+Tagging of Concourse Jobs for specific workers might need to change to
+accommodate K8s Pod `nodeSelector`s which allow users to select specific K8s
+Nodes schedule the necessary Workloads on. Affinity and anti-affinity also


nader-ziada · 2018-07-09T03:11:58Z

One approach I investigated for Volume management is the Local Persistent Volumes feature. Each worker node would have a local persistent volume, and a persistent volume claim, that be be then mounted in the Pod executing a given task. PersistentVolume nodeAffinity enables the Kubernetes scheduler to schedule Pods using local volumes to the correct node.
On this local persistent volume, folders represent each required volume and the subPath is mounted in the Pod that needs access to the volume. Here is a note about the subpath details and fixes to a recent vulnerability.
https://kubernetes.io/blog/2018/04/04/fixing-subpath-volume-vulnerability/

An example for creating local persistent volume in the following yaml files
https://gist.github.com/pivotal-nader-ziada/15d430bb16397c672e337f2e9275164c

This solution requires significant changes to how concourse is currently managing volumes, and at the same time is not using a kubernetes feature as is.

Signed-off-by: Jamie Klassen <cklassen@pivotal.io>

hairyhenderson · 2018-12-08T12:59:45Z

Just wanted to drop this here: https://blog.drone.io/drone-goes-kubernetes-native/ - discusses how Drone (a similar CI/CD tool) has leveraged Kubernetes. Some of the implementation details are potentially relevant (or not!)

added mention of Discord for asking whether to submit an RFC, per #2

andrewedstrom · 2018-12-28T03:26:19Z

@vito Why was this closed? Is this never going to happen?

vito · 2018-12-28T04:26:36Z

@aedstrom woops! Must have typoed the number in the commit that closed it. Reopening!

andrewedstrom · 2018-12-28T05:59:11Z

Glad to hear it! This is a really interesting idea. Has anyone tried a spike of this? Or is it such a huge undertaking that such a spike doesn't even seem feasible.

jwntrs · 2019-02-15T20:41:13Z

@topherbullock As part of this investigation it might be interesting to consider what a garden backend for kubernetes might look like, and how it would fit into our lifecycle. Also baggage claim would need ...a new driver? ...something higher level? We would likely have to rework some stuff but it could be an interesting approach.

romangithub1024 · 2019-02-15T20:59:25Z

I personally think that native integration with Kubernetes is a lot more work, but it will open way more doors for Concourse in the open-source community.

going k8s garden backend + k8s baggage claim driver route might be technically feasible. but people may still not treat it as "Kubernetes native". it might add more points or failure. and likely the experience of people who run Concourse on Kubernetes will still be suboptimal.

in other words, it would be great if we can look at it as "how can we make Concourse # 1 go-to option for people who run CI/CD on Kubernetes" as opposed to "just adding Kubernetes support to Concourse".

just my 2c... hopefully this makes sense :)

jwntrs · 2019-02-15T21:12:49Z

I guess it all depends on what the "native kubernetes abstraction" looks like. It could be that the garden backend interface is too restrictive and we lose out on some useful native k8s features. But if at the end of the day all we're doing is creating volumes and jobs and handing it off to the k8s runtime then there may not be much of a difference in the two approaches.

But yeah I totally agree that we want to provide a better user experience than "just a shim". Plus it would be really cool if whatever we come up with could be extensible enough so that we can support a concourse docker runtime.

topherbullock · 2019-03-04T16:08:10Z

Glad to hear it! This is a really interesting idea. Has anyone tried a spike of this? Or is it such a huge undertaking that such a spike doesn't even seem feasible.

@andrewedstrom Yeeeaaaah... implementing this is a giant leap rather than a few small steps. As a conversation piece this RFC has served its purpose for now, and a v2 would cover some more specific, actionable steps to leverage [something that isn't Garden] for Containers and Volumes.

As part of this investigation it might be interesting to consider what a garden backend for kubernetes might look like, and how it would fit into our lifecycle.

@pivotal-jwinters

I personally think that native integration with Kubernetes is a lot more work, but it will open way more doors for Concourse in the open-source community.

@ralekseenkov

I agree that we will need to consider the K8s runtime at a higher level than the granularity we have with Garden for now. We're ( likely ) dealing with a higher order of abstractions in K8s (Pods/Deployments/Jobs , rather than containers directly ) , and we don't have a fine-grained way to control Volumes and their lifecycle separate from the Pod our container(s) are a part of.

Concourse has evolved a lot on this path thanks to recent discussions about the design of the "Runtime" and the interface between Runtime and Core, new scheduling strategies, and the investigation of ephemeral check containers. I feel like we're slowly moving toward a path where the "runtime" has a higher-order interface which is leveraged by the business logic in Core+API, which warrants a separate RFC for "What does a unified interface for a swappable runtime look like?" separate from K8s discussions.

Knative has spawned a separate initiative for CI/CD, which means the conversations around the K8s runtime specifically have also evolved beyond this RFC.

topherbullock · 2019-03-04T16:20:30Z

Closing and creating 2 new discussions from this RFC :

K8s / Knative / Tekton
Swap-able Runtimes

Topher Bullock and others added 5 commits April 10, 2018 20:28

outline

d2e749e

Summary and some Q's

78d913b

More Q's, Glossary of Terms

c4e1b74

ATC Changes, DB Caveats

6afadba

Update with GC changes' impact on K8s workers

4953867

edude03 reviewed Jun 28, 2018

View reviewed changes

This was referenced Jun 28, 2018

Kubernetes as first-class citizen, tested and used in prod somewhere concourse/concourse#1731

Closed

RFC: Resources v2 #1

Closed

topherbullock referenced this pull request in jwntrs/rfcs Nov 20, 2018

introduce user-artfact API handler

63eb433

Signed-off-by: Jamie Klassen <cklassen@pivotal.io>

vito closed this in ad10183 Dec 14, 2018

vito added a commit that referenced this pull request Dec 14, 2018

more clarity around when to submit an RFC

3beb1ab

added mention of Discord for asking whether to submit an RFC, per #2

vito reopened this Dec 28, 2018

topherbullock closed this Mar 4, 2019

topherbullock mentioned this pull request Mar 5, 2019

Runtime Interface #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Kubernetes Runtime #2

RFC: Kubernetes Runtime #2

topherbullock commented Apr 17, 2018 •

edited

Loading

edude03 left a comment

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

nader-ziada commented Jul 9, 2018

hairyhenderson commented Dec 8, 2018

andrewedstrom commented Dec 28, 2018

vito commented Dec 28, 2018

andrewedstrom commented Dec 28, 2018

jwntrs commented Feb 15, 2019

romangithub1024 commented Feb 15, 2019

jwntrs commented Feb 15, 2019 •

edited

Loading

topherbullock commented Mar 4, 2019

topherbullock commented Mar 4, 2019

RFC: Kubernetes Runtime #2

RFC: Kubernetes Runtime #2

Conversation

topherbullock commented Apr 17, 2018 • edited Loading

edude03 left a comment

Choose a reason for hiding this comment

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

nader-ziada commented Jul 9, 2018

hairyhenderson commented Dec 8, 2018

andrewedstrom commented Dec 28, 2018

vito commented Dec 28, 2018

andrewedstrom commented Dec 28, 2018

jwntrs commented Feb 15, 2019

romangithub1024 commented Feb 15, 2019

jwntrs commented Feb 15, 2019 • edited Loading

topherbullock commented Mar 4, 2019

topherbullock commented Mar 4, 2019

topherbullock commented Apr 17, 2018 •

edited

Loading

jwntrs commented Feb 15, 2019 •

edited

Loading