-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Kubernetes Runtime #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persistent Volume Claims is likely the right abstraction for this, however one thing I don't understand about concourse is how it manages volumes. Right now, if you run Concourse on Kubernetes and give the worker a Persistent Volume to put the btrfs image on, you often run into an issue where the state ATC expects the worker to be in is not the same as the actual state.
It seems to me that this approach should work fine - when the worker restarts the PV has the same volume image and thus inputs, resource etc should still be there.
Does baggaeclaim do something unusual with how it references its images that causes it to fail if the volume is unmounted and remounted?
provides and interface the ATC can leverage to manage volumes similar to how | ||
Baggageclaim Volumes are managed. Could a replica set place a worker on each K8s | ||
Node and use `HostPath` Volumes or create Persistent Volumes to store Resource | ||
Caches, Task Caches, and Image Resource Caches, etc.? |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
create a container, and K8s will cache these images. In order to support this | ||
as the preferred way to define container images, we will need to find a viable | ||
solution which saves the exported contents from `image_resource` to the K8s | ||
registry. |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
Could a CustomResourceDefinition (CRD) be used to represent the Containers | ||
created for Concourse Tasks and Resources? This would allow a user or operator | ||
to easily recognize and differentiate Concourse Containers and their | ||
corresponding workloads from other workloads on the K8s Cluster. |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
|
||
Tagging of Concourse Jobs for specific workers might need to change to | ||
accommodate K8s Pod `nodeSelector`s which allow users to select specific K8s | ||
Nodes schedule the necessary Workloads on. Affinity and anti-affinity also |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
One approach I investigated for Volume management is the Local Persistent Volumes feature. Each worker node would have a local persistent volume, and a persistent volume claim, that be be then mounted in the Pod executing a given task. PersistentVolume nodeAffinity enables the Kubernetes scheduler to schedule Pods using local volumes to the correct node. An example for creating local persistent volume in the following yaml files This solution requires significant changes to how concourse is currently managing volumes, and at the same time is not using a kubernetes feature as is. |
Signed-off-by: Jamie Klassen <cklassen@pivotal.io>
Just wanted to drop this here: https://blog.drone.io/drone-goes-kubernetes-native/ - discusses how Drone (a similar CI/CD tool) has leveraged Kubernetes. Some of the implementation details are potentially relevant (or not!) |
added mention of Discord for asking whether to submit an RFC, per #2
@vito Why was this closed? Is this never going to happen? |
@aedstrom woops! Must have typoed the number in the commit that closed it. Reopening! |
Glad to hear it! This is a really interesting idea. Has anyone tried a spike of this? Or is it such a huge undertaking that such a spike doesn't even seem feasible. |
@topherbullock As part of this investigation it might be interesting to consider what a garden backend for kubernetes might look like, and how it would fit into our lifecycle. Also baggage claim would need ...a new driver? ...something higher level? We would likely have to rework some stuff but it could be an interesting approach. |
I personally think that native integration with Kubernetes is a lot more work, but it will open way more doors for Concourse in the open-source community. going k8s garden backend + k8s baggage claim driver route might be technically feasible. but people may still not treat it as "Kubernetes native". it might add more points or failure. and likely the experience of people who run Concourse on Kubernetes will still be suboptimal. in other words, it would be great if we can look at it as "how can we make Concourse # 1 go-to option for people who run CI/CD on Kubernetes" as opposed to "just adding Kubernetes support to Concourse". just my 2c... hopefully this makes sense :) |
I guess it all depends on what the "native kubernetes abstraction" looks like. It could be that the garden backend interface is too restrictive and we lose out on some useful native k8s features. But if at the end of the day all we're doing is creating But yeah I totally agree that we want to provide a better user experience than "just a shim". Plus it would be really cool if whatever we come up with could be extensible enough so that we can support a concourse docker runtime. |
@andrewedstrom Yeeeaaaah... implementing this is a giant leap rather than a few small steps. As a conversation piece this RFC has served its purpose for now, and a v2 would cover some more specific, actionable steps to leverage [something that isn't Garden] for Containers and Volumes.
I agree that we will need to consider the K8s runtime at a higher level than the granularity we have with Garden for now. We're ( likely ) dealing with a higher order of abstractions in K8s (Pods/Deployments/Jobs , rather than containers directly ) , and we don't have a fine-grained way to control Volumes and their lifecycle separate from the Pod our container(s) are a part of. Concourse has evolved a lot on this path thanks to recent discussions about the design of the "Runtime" and the interface between Runtime and Core, new scheduling strategies, and the investigation of ephemeral check containers. I feel like we're slowly moving toward a path where the "runtime" has a higher-order interface which is leveraged by the business logic in Core+API, which warrants a separate RFC for "What does a unified interface for a swappable runtime look like?" separate from K8s discussions. Knative has spawned a separate initiative for CI/CD, which means the conversations around the K8s runtime specifically have also evolved beyond this RFC. |
Closing and creating 2 new discussions from this RFC :
|
Proposal
Status: draft
Still more details and thoughts to collect; specifically around how the new GC changes impact how a K8s runtime should clean up resources. I'm sure more questions will come up around using K8s effectively.
Please submit comments and feedback on individual lines of the proposal markdown, so that we can track open conversations.