Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing status subresource when using custom VPA recommender with GKE native VPA setup #6828

Closed
FrancoisPoinsot opened this issue May 14, 2024 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@FrancoisPoinsot
Copy link

FrancoisPoinsot commented May 14, 2024

Which component are you using?:

vertical-pod-autoscaler

What version of the component are you using?:

Component version: 1.1.1

What k8s version are you using (kubectl version)?:

kubectl version Output
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.12", GitCommit:"12031002905c0410706974560cbdf2dad9278919", GitTreeState:"clean", BuildDate:"2024-03-15T02:15:31Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.12-gke.1115000", GitCommit:"885327b7f1bebce409c843425b4688e3eeed33f4", GitTreeState:"clean", BuildDate:"2024-03-28T09:16:53Z", GoVersion:"go1.21.8 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}  

What environment is this in?:

GKE

What did you expect to happen?:

I expected I could deploy an instance of a custom recommender and it would interfaces nicely with everything else that GKE deploys natively for the VPA.

What happened instead?:

When deploying a custom recommender in GKE with GKE's VPA enabled, the custom recommender failed when attempting to update the status subresource:

recommender.go:128] Cannot update VPA test-francois/test-francois object. Reason: verticalpodautoscalers.autoscaling.k8s.io "test-francois" not found

How to reproduce it (as minimally and precisely as possible):

  1. have a GKE cluster with GKE's VPA enabled
  2. deploy only a vpa recommender (so without the updater or admissionController) with --recommender-name, with it's service account and permissions. I used cowboysysops's helm chart as base.
  3. deploy only the CRD that GKE has not deployed: vpaCheckpoints

Anything else we need to know?:

I figured there is a very small difference between the VerticalPodAutoscaler CRD that GKE deploys and the one available in this repo.
GKE's: spec.subresources: {}
vs: spec.subresources.status: {}

And indeed editing the CRD to add status field in subresources solves the problem.


But here is the issue. I wanted to:

  • minimise the change of adding a custom recommender in GKE
  • keep GKE's native support

Editing the CRD deployed by GKE sounds unreliable to me, as there is a risk it will be reverted later.

Am I missing some simpler way to deploy a custom recommender in GKE?
Or is there a more reliable way to update the CRD that would have no risk to be reverted?

It doesn't seem obvious to me why this CRD change cause this issue though.
Because using GKE's VPA, there will be a status eventually set in each VPA objects. So it looks like status declaration in the subresource shouldn't be mandatory

@FrancoisPoinsot FrancoisPoinsot added the kind/bug Categorizes issue or PR as related to a bug. label May 14, 2024
@FrancoisPoinsot
Copy link
Author

FrancoisPoinsot commented May 14, 2024

I understand well that the community here is not responsible for GKE's implementation.
I am not asking for adding status in GKE's version of the CRD.

I am asking for either guidance,
or maybe a fix in-code, if my assumption about "status field shouldn't be mandatory" is true.

@FrancoisPoinsot
Copy link
Author

I think the VPA CRD deployed by GKE is just an older version
Probably that one: https://github.com/kubernetes/autoscaler/blame/b7d68c05248fed09bd0758759f70293b104f43ca/vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml

@voelzmo
Copy link
Contributor

voelzmo commented May 15, 2024

Right, the /status subresource was introduced with vpa-1.0. If GKE's CRD doesn't have this, it seems it is based on an older version. For your own deployment, you could switch back to vpa 0.14 then, which is the version right before 1.0.

@FrancoisPoinsot
Copy link
Author

For posterity, I have confirmed that any upgrade of the GKE cluster revert the CRD to its original form.
So editing the CRD is definitly a bad idea.

@FrancoisPoinsot
Copy link
Author

I confirm that I could get a working custom recommender using GKE's VPA, doing the following:

  • use v0.14.0 for the recommender image
  • grant to the recommender update and patch permissions on the VPA object.

I am going to close this issue as I don't see anything that can be done on VPA project side.
The only sane thing to do would be to update the CRD deployed by GCP. But I don't know any good channel where to publish such request.

@marevers
Copy link

@FrancoisPoinsot I know this is closed already, but just as an FYI: I experienced this same issue on AKS (both 1.27 and 1.29). The fix you proposed to use the 0.14.0 image rather than 1.0.0 fixed it for me too.

@FrancoisPoinsot
Copy link
Author

FrancoisPoinsot commented May 28, 2024

I am currently talking with GCP to see if this CRD deployed there can be upgraded. Fairly sure if it ends up happening this will not be only for my clusters, but everyone.

I hadn't faced that issue in Azure, because I am not relying on the native VPA feature there.
Thanks for the added info that the same is going on there.

Also: the credit for the fix goes to @voelzmo

@FrancoisPoinsot
Copy link
Author

For GKE, here is the public issue tracker that got created as a result: https://issuetracker.google.com/issues/345166946

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants