-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arbitrary image pinning support (i.e. to exclude from kubelet's image GC) #6930
Comments
Seems like kubernetes/kubernetes#109936 and the corresponding PR kubernetes/kubernetes#109936 are trying to achieve pinning arbitrary images by adding a new kubelet config, and therefore avoid relying on containerd to provide the list of images to be pinned. |
we have a gc in kubelet, and one in containerd, and user/admins have tools .. we definitely need to work together on this and other image services in the k8s.io namespace |
kubernetes/kubernetes#109936 brought up kubernetes/kubernetes@d3ae0a3#diff-583fcb2b2b9fed64e8ac45392565a924918b4a64a7d4b66df0d4bccd298e8769R1259 maybe it should be more than a recommendation ? |
IMO there should be levels of priority for image retention in the gc process, beyond the least recently used/size metric used today. But that's a different issue than pinning. IMO pin is a recommendation to not delete an image during gc, (last to remove in level if you have levels). |
locking from removal for security or other administrative reasons.. that would be different than pin.. though it depends on the definition of pin? .. if the pin is locked to a security process that would be interesting. |
Yep. But for the purpose of this issue, I am trying to limit the scope to only generalizing the |
I am ok with keeping it as a recommendation, if the contract around |
FWIW, in side-loaded image scenarios we desire to have 100% never-GCed images (consider pause) that may not have ever been pulled and may not be pullable (airgapped etc). KIND preloads core images and so does GKE, wouldn't be surprised if others do as well. For those images GCing them is counter productive, and the kubelet GC is rather naive. |
In this scenario is the preloading for warming the cache or because the image may not otherwise be available to pull? From a cleanup perspective, if the space is needed, they should be collectible. Having a naive implementation when they should be prioritized or scored in some way makes that complicated. For images not needed for running pods, the recommended way is not using |
For kind the former, we may be running an unpushed development build of Kubernetes. Other reasons they may not be pullable for KIND users:
Today we're essentially forced to just disable the GC entirely and users must periodically create a new cluster (which may do zero runtime pulling) or perform "GC" themselves. This works well in many environments, but is less-than-ideal.
Er - kubelet makes the selection today and kubelet's selection is naive. And kubelet is just trying to free up disk by any means, but doesn't have knobs for this (it has one deprecated knob for pause).
I don't think that's being discussed here, only CRI.
Storing it in an OCI image on disk just increases the space necessary and still effectively creates a "don't GC this" flag, in a roundabout way? Storing directly in containerd is more compact than also storing an intermediate OCI image on the target machine. I'm not sure about the "getting content directly from another containerd namespace approach", can you elaborate on what that should look like? The cluster administrator should know what they're doing when images are marked. |
Here is my two cents from previous disasters we have had to deal with because of image GC
Indeed. However, there should still be a way to retain certain cluster critical images, that when missing can take the entire cluster down with it. Take the following example.
Other than the duplication that @BenTheElder mentioned, this also involves one having to split the images into individual tarball via docker save like mechanism so that only the selective images can be loaded. Where as we generally prefer doing a bulk save and them split using
This is not always going to be possible right ? We have a collection of images that correspond to a set of Kubernetes Jobs that run over an event. In such cases these images can't be kept on a different namespace. And since these images won't have a running container, the GC will happily cleanup these images. Which is less than ideal for cases of running jobs. Also a separate namespace would mean the image needs to be side loaded first and then moved to the right namespaces when required or during the remediation. Which means there is still going to be duplication of artifacts on the system. Now keeping these two namespace for images will be an additional overhead. |
I am also putting here our case and why we need to exclude images from GC: We preload the GPU driver installer images on the nodes with GPUs attached, and more specifically we preload different driver installer images onto different nodes based on hardware types. To make it easier to use for our customers, we give the same name to all the driver installer images regardless of their versions, so that customers don't need to worry about the versions, and they can simply Since we use the same name for all images, we cannot make the images pullable. The nodes won't be able to tell what exact image to pull based on the names. On the other hand, the driver installer images are only used in an initContainer, so they will not be used (and become the target of GC) once the installation finishes. |
@ruiwen-zhao have we reached on any conclusion here? IMO the easy/secure way to support arbitrary images exemption from GC is config.toml option. We may not want to give all users access to pin images via annotation. Also one more thought here, what is the need to involve containerd here? we can directly provide images to kubelet via config? one possible reason could be in air gapped env images are pre loaded via containerd so it would be easier for user to pin it via containerd. |
I dont think we have reached any conclusion yet. I think containerd is involved here mostly because a. in CRI v1, kubelet expects the |
This is exactly our pain point. We don't want images to be GCed in airgap environment. You can GC unused containers and snapshots because they can be recreated. Can we at least provide an option to disable GC on images? |
containerd itself doesn't do any GC of images, so that'd be a question for modifying the kubelet. |
Right, the images were GCed by kubelet garbage collector. |
What is the problem you're trying to solve
Containerd users should be able to pin (i.e. to exclude from kubelet's image GC) arbitrary images.
There is an existing PR (#6456) pins sandbox image. However, we may want to generalize the functionality and allow users to pin arbitrary images. Creating this issue to discuss and (hopefully) finalize the expected behavior.
Describe the solution you'd like
We have 3 options here:
Option 1 containerd takes “pinning requests” through PullImageRequest.Image.Annotation field.
This is suggested in https://github.com/containerd/containerd/pull/6456/files#r803279310
Pros:
No extra user-facing API/flag/config needed. Users can pin arbitrary images by specifying pod annotations (because kubelet copies pod annotations over to PullImageRequest.Image.Annotation).
Cons:
Anyone who can modify pod spec can pin images. This might be too much privilege for pod owners, and may be especially bad for multi-tenant clusters.
Option 2 containerd takes “pinning requests” through an additional field in the CRI API.
Pros:
Limited access to image pinning ability. (Depending on how k8s will expose the functionality.)
Cons:
Kubelet and CRI changes required.
K8s needs to figure out how to expose this functionality to end users.
Option 3: containerd exposes the option through containerd config. (which toml section is TBD)
Pros:
Limited access to image pinning ability. Only users who have access to containerd config would be able to pin images.
No CRI/k8s change needed. Containerd will have the sole authority to decide which images to pin.
Cons:
K8s users/admins will not be able to decide what image to pin.
I propose option 3 here because it is both relatively simple (avoiding the back and forth between kubelet and containerd) and it limits the privilege of pinning images. However, I am open to discussion and other options.
Additional context
No response
The text was updated successfully, but these errors were encountered: