-
Notifications
You must be signed in to change notification settings - Fork 832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topology Spread Constraints calculations don't apply across mutually exclusive provisioners #975
Comments
This is a super tricky problem and definitely an edge case for topology. Thanks for pointing this out. Can you dig in a little bit for the use case of using a single topology spread selector across distinct provisioners? Provisioners manage distinct sets of capacity and are unaware of each other. In order to compute topology, we need to know the set of viable zones to spread across. Right now, that uses the zones of whichever provisioner was selected. That case you're pointing out is cross-provisioner topology awareness. It's possible to construct topology selectors across multiple deployments, but this is not recommended:
In order to achieve what you're talking about, we would need to look at all of the viable zones for all provisioners that are able to schedule the pod, and then compute topology that way. This seems somewhat computationally expensive, and will require significant changes to the code. Anything you can provide to elaborate on the use case can help us make prioritization decisions of if/how to prioritize this. |
This is something for us to consider on how topology spread works across provisioners, but I suspect that part of the issue here is that kOps IGs are often setup w/ only 1 AZ which would require multiple provisioners. Ideally, one IG would span all zones available for worker nodes and you shouldn't run into weird topology spreads like you're seeing. Is there a use-case for multiple provisioners? |
So kOps will by default use 1 AZ per IG because of limitations with ASGs. There is not a particular reason to do this by default for karpenter.sh for the same reason. On one hand, I do have use cases for Provisioners with individual zones (related to avoiding cross-AZ traffic across VPCs and accounts), so with even more Provisioners (one for this workload, one for generic workloads) this can still work. So similar to how karpenter bails on Pod affinities, it may be that it should bail on I also think documentation should be very clear about how karpenter handles spread constraints and that it may not satisfy workload requirement. |
Agree that we need to resolve this more long term, but it seems like the use case is fairly contrived. I'll think a bit about whether or not about the easiest way to enforce DoNotSchedule across multiple provisioners. |
One way of doing it is something like
To me, this seems reasonable. In our usecase we need those specific Provisioners for those zone specific workloads, and then we need an extra Provisioner that handles Deployments with zonal spread constraints. Note that for kOps, we currently hard code the subnet selector to |
Removing burning from this for a couple reasons. Right now we recommend that only one provisioner is viable per pod, and this bug doesn't cause problems for this case. In the future we may introduce other features like provisioner priority (#783), where this would become a bigger deal. Marking this as blocked until then. |
This issue is stale because it has been open 25 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Bump. 25 days is aggressive |
WIP for this here: #1673 |
Version
Karpenter: v0.5.2
Kubernetes: v1.23.0
Expected Behavior
topologySpreadConstraint
at least withwhenUnsatisfiable: DoNotScheduler
should never schedule when violated.Actual Behavior
If there are two Provisioners with mutually exclusive subnets, Karpenter will just pick one and schedule all instances and Pods in the subnets that Provisioner has, going far above the configured
maxSkew
. This happens even if there are other Nodes in the cluster with othertopologyKey
values.Steps to Reproduce the Problem
Create two Providers with mutually exclusive or only partially overlapping subnets.
Create a deployment with a handful of replicas, spread constraint with
maxSkew: 1
andtopologyKey
on zoneThe text was updated successfully, but these errors were encountered: