Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA isn't scaling up due to unfamiliar extended resources requests #6804

Open
fmuyassarov opened this issue May 8, 2024 · 8 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@fmuyassarov
Copy link
Member

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

CA: 9.37.0
K8s: 1.29

What k8s version are you using (kubectl version)?:
Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3-eks-adc7111

What environment is this in?:
Amazon EKS

What did you expect to happen?:
Hi. We have a scenario where extended resources are added to nodes only after nodes start. However, I see that CA is not creating a node for pods that have extended resources specified in their spec. My understanding was as such that, CA will ignore resources it is unaware of and move on with scaling up, which isn't the case.

What happened instead?:
CA didn't scale the Nodes.

How to reproduce it (as minimally and precisely as possible):

  1. Create an EKS cluster
  2. Deploy CA
  3. Scale up workload (with extended resources) Pods to trigger CA scaling the nodes

Anything else we need to know?:

@fmuyassarov fmuyassarov added the kind/bug Categorizes issue or PR as related to a bug. label May 8, 2024
@fmuyassarov
Copy link
Member Author

Hi @MaciekPytel. I found out a similar issue in #3852 but then got impression that CA does ignore extended resources which are unfamiliar to it and move on with scaling up. Has that behavior changed recently?

@fmuyassarov
Copy link
Member Author

@x13n any thoughts on this?

@x13n
Copy link
Member

x13n commented May 10, 2024

CA will observe other nodes in a node group to get an idea how new nodes will look like - if they contain extended resources that are later required by a pod, a scale up should get triggered. However, if there are no nodes or if there's a mix of node with and without the extended resource, CA will have no idea about such extended resource and won't trigger scale up because it will think it won't help the pod.

@fmuyassarov
Copy link
Member Author

bserve other nodes in a node group

Is there way to tell CA to ignore the resources it is unaware of ?

@x13n
Copy link
Member

x13n commented May 13, 2024

I don't think so - why do you ask? What would be the use case for ignoring some resources?

@fmuyassarov
Copy link
Member Author

I don't think so - why do you ask? What would be the use case for ignoring some resources?

In my situation, there's a controller in place that adds extended resources to new nodes. These extended resources are later requested by some pods, just like the native resources CPU and memory. Let's take an example: imagine I have two nodes in my cluster that are almost maxed out, leaving no free resources for new pods. Now, I want to create a workload Pod that requests extended resources. The CA realizes it can't fit these Pods onto existing nodes due to the resource shortage (in this case, the extended resources like example.com/myresource). Since the CA doesn't recognize these extended resources, it would be helpful if it could still proceed with creating new nodes. Meanwhile, my external controller would continue its job of adding the extended resources to the nodes. Currently, these pods remain pending because there's no node with enough extended resources, and the CA can't create new nodes.

@x13n
Copy link
Member

x13n commented May 14, 2024

If CA ignored extended resources on both pods and nodes, it could scale a node group that won't have the extended resource later on. Or it could also create nodes for pods requesting, say, example.com/myresource: 1000, even though your controller can only set the allocatable up to, say, example.com/myresource: 10. Or it could create nodes with example.com/myresource even though the pod requested example.com/myotherresource. Such pods would stay pending, only now there will also be an empty node around. I guess in a controlled environment you could ensure these edge conditions won't happen, so some kind of optional feature in CA might make sense. On the other hand, isn't this a use case for Dynamic Resource Allocation? Current version doesn't really work with autoscaling, but IIUC there is ongoing work to make it compatible.

CC @towca @MaciekPytel

@towca
Copy link
Collaborator

towca commented May 21, 2024

I think DRA would solve this eventually, if the controller that adds the custom resources is migrated to do it via DRA instead. I know there are also some discussions about integrating existing custom resources with DRA "automatically", so it might not even need changes in the controller. The plan is to get Cluster Autoscaler to work with Structured Parameters DRA in Kubernetes 1.31 FWIW.

If this is needed earlier, or in older minor versions, a new CustomResourcesProcessor would have to be written to simulate the resource. Or as suggested above, a workaround could be to keep at least 1 node around in the node group. This way CA will take the existing node as a template instead of trying to simulate one from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants