Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support daemonsets with VPA #799

Closed
vflaux opened this issue Nov 11, 2021 · 11 comments
Closed

Support daemonsets with VPA #799

vflaux opened this issue Nov 11, 2021 · 11 comments
Labels
feature New feature or request

Comments

@vflaux
Copy link

vflaux commented Nov 11, 2021

Tell us about your request
Are daemonsets with a Vertical Pod Autoscaler supported ?

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
If I understand correctly, when karpenter decide on a new node size, it does so using current unschedulable pods and future pods that will be created by daemonsets for this new node. The resources used for this decision are the one in the daemonsets spec.
But if you use a Vertical Pod Autoscaler that target a daemonset this behavior may not works as the containers resource requests may defers from the spec (the VPA admission controller update the containers requests). This may results in nodes too small to run all their assigned pods (or nodes always too big).

Are you currently working around this issue?
I'm not using karpenter.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@vflaux vflaux added the feature New feature or request label Nov 11, 2021
@ellistarn
Copy link
Contributor

Interesting! This seems like the cluster autoscaler may have a similar issue. I'm not deep on the VPA webhook. Is it possible to look up the VPA's current values for a given daemonset?

@vflaux
Copy link
Author

vflaux commented Nov 12, 2021

Cluster Autoscaler do not assign pods to nodes so if the new nodes are not sufficient to handle the pods, some pods will remain unschedulabe and new nodes will be added. I'm not even sure if it considers the daemonsets in its logic.
Karpenter assign pods to the new nodes it creates, so if the nodes happens to be too small some pods could be evicted by the daemonsets pods (or not able to run). The pods replacing the evicted ones should trigger the creation of new nodes and be able to run, but it would takes some time.

The VPA recommended values (used to override the containers resources requests) are stored in the VPA custom resource status. It shouldn't be too difficult to retrieve but it means supporting a custom resource from an external project.
Another way would be to submit the daemonsets pods in dry run mode to the api-server and use the resulting resources requests.

@JacobGabrielson
Copy link
Contributor

@ellistarn I know it's not exactly the same thing, but this feels reminiscent of PV/PVC support, where the PV/PVC attached to the Pod needs to be taken into account when scheduling the Pod?

@ellistarn
Copy link
Contributor

It shouldn't be too difficult to retrieve but it means supporting a custom resource from an external project.

I agree -- I'm a bit wary of this dependency

Another way would be to submit the daemonsets pods in dry run mode to the api-server and use the resulting resources requests.

This may be all we've got in the short term.

@sergiomacedo
Copy link

Hello,

any updates/recommendation on using karpenter with VPA?

@ellistarn
Copy link
Contributor

Right now we don't have a great path forward. I wonder what the solution is for Cluster Autoscaler.

@sergiomacedo
Copy link

Right now we don't have a great path forward. I wonder what the solution is for Cluster Autoscaler.

AFAIK their recommendation is "don't use them together". But it has been a long time since I saw this recommendation.

@ellistarn
Copy link
Contributor

From my understanding, this is a limitation of VPAs architecture. There's not much we can do on Karpenters side.

@billrayburn
Copy link
Contributor

Closing in favor of kubernetes-sigs/karpenter#731.

@underrun
Copy link

issue kubernetes-sigs/karpenter#731 isn't really an alternative to this issue. This issue is about what the difference between what karpenter uses to determine how to consolidate (the original issue says daemonset requests) vs what VPA uses to control vertical scale (individual container requests on pods that are re-written with a webhook).

issue kubernetes-sigs/karpenter#731 is a problem no matter what - and one i recently ran into. i have to manually exclude nodes in my provisioner of a size that are too small to fit all of my cluster-critical priority daemonsets becauase they don't exist until after karpenter provisions nodes and karpenter will not create a new node if a deamonset can't schedule (which makes sense because it wouldn't land on any other node anyway - except what should reallly happen is that the node should be replaced with a larger one).

This issue is less about daemonsets and more about VPA managing requests for a workload and karpenter trying to consolidate workloads at the same time.

An alternative to the suggest path of looking at the VPA recommendations is to look at the container requests that exist instead of the controlling resource (if that's what karpenter does) to determine resources required and make consolidation decisions.

Another way VPA adjusting pods and karpenter consolidating them could be made to play nicer together is to have VPA only set requests on rollout rather than evicting pods to adjust them. That way we don't have 2 controllers evicting pods to affect cluster/workload shape.

It seems very reasonable to use VPA+karpenter for non-horizontally scalable replicated workloads like HA prometheus where the pods have anti-affinity or topology spread constraints that make consolidation neither desirable nor possible.

Please consider re-opening this issue and making some effort to either recommend ways that VPA can be used with karpenter and/or adjusting how karpenter works to play nicer with systems where webhooks adjust the requests of incoming pods.

gfcroft pushed a commit to gfcroft/karpenter-provider-aws that referenced this issue Nov 25, 2023
@jtdoepke
Copy link

Another way would be to submit the daemonsets pods in dry run mode to the api-server and use the resulting resources requests.

This seems like a good idea since it would support not only VPA but also any other mutating webhooks that might modify pod resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants