Skip to content

AEP-7862: Decouple Startup CPU Boost from VPA modes - updates #8175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2025

Conversation

laoj2
Copy link
Contributor

@laoj2 laoj2 commented May 27, 2025

What type of PR is this?

/kind documentation
/kind feature

What this PR does / why we need it:

Changes to AEP-7862:

  • Decouple startup boost config from regular VPA, so the API is simplified, in case users want to only use CPU boost without regular VPA updates (we no longer require a new VPA mode + have a way to specify the startup boost config at the VPA/workload level).
  • Added more yaml examples

#7862

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. area/vertical-pod-autoscaler needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 27, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @laoj2. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 27, 2025
@laoj2
Copy link
Contributor Author

laoj2 commented May 27, 2025

/assign @omerap12
/assign @adrianmoisey

@laoj2 laoj2 force-pushed the aep-7862-fixes-to-api branch from 954bbbc to 0b42a31 Compare May 28, 2025 17:18
@laoj2 laoj2 requested review from adrianmoisey and omerap12 May 28, 2025 20:18
@laoj2 laoj2 force-pushed the aep-7862-fixes-to-api branch from 0b42a31 to ea041d5 Compare June 6, 2025 16:07
@laoj2 laoj2 requested a review from raywainman June 6, 2025 16:08
@omerap12
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 16, 2025
@omerap12
Copy link
Member

@shubhamrai1993 , something wrong with Github I can't comment on your review so Ill do it here.

This is indeed a valid concern. I don't see any reasonable solution for that to be honest.

force recreation of the pod on container crash so that it starts in a boosted state

I don’t think this is a good solution. Evicting the entire pod means that the pod would go through the full scheduling cycle again. In the worst-case scenario, the original node might no longer have available resources, and the pod’s spot could be taken. This could lead to the need to provision another node or preempt another pod - depending on the user’s configuration.

This might not be possible anymore if the node where the pod was scheduled no longer has enough resources. transition back to boosted resources if pod has reached unhealthy state or has crashed.

True. But if the pod has reached an unhealthy state, it should generally be evicted anyway. So in my opinion, the user should handle the eviction explicitly in such cases.

Maybe @laoj2 has some ideas?

@adrianmoisey
Copy link
Member

Unrelated to the changes in this PR specifically but, what would happen in a scenario where a the CPU resources for a pod has been scaled down by VPA after readiness/liveness probe success and subsequently crashes because of some reason (OOM, transition to unhealthy state).

This is a problem because if a restart is attempted with the non-boosted (lower) resources, then the initial cpu spike will get throttled and might even get stuck in CrashLoopBackoff with repeated readiness/liveness failures.

This is indeed a valid concern. I don't see any reasonable solution for that to be honest.

I have some thought some thoughts on this. The AEP is suggesting a CPU boost only. I don't think that CPU throttling can be the cause of an OOM (I guess it can, due to slower garbage collection). But if the non-boosted CPU is the cause of a problem, then that means that the CPU limits are too low anyway, and the VPA needs to increase them.

Also, this feature isn't even created yet, and will be in alpha. It's possible we want to test this in the real world a bit.

@omerap12
Copy link
Member

Unrelated to the changes in this PR specifically but, what would happen in a scenario where a the CPU resources for a pod has been scaled down by VPA after readiness/liveness probe success and subsequently crashes because of some reason (OOM, transition to unhealthy state).

This is a problem because if a restart is attempted with the non-boosted (lower) resources, then the initial cpu spike will get throttled and might even get stuck in CrashLoopBackoff with repeated readiness/liveness failures.

This is indeed a valid concern. I don't see any reasonable solution for that to be honest.

I have some thought some thoughts on this. The AEP is suggesting a CPU boost only. I don't think that CPU throttling can be the cause of an OOM (I guess it can, due to slower garbage collection). But if the non-boosted CPU is the cause of a problem, then that means that the CPU limits are too low anyway, and the VPA needs to increase them.

Also, this feature isn't even created yet, and will be in alpha. It's possible we want to test this in the real world a bit.

Agree. we can open a discussion about how we want to handle memory boost when the time comes

@laoj2
Copy link
Contributor Author

laoj2 commented Jun 19, 2025

Unrelated to the changes in this PR specifically but, what would happen in a scenario where a the CPU resources for a pod has been scaled down by VPA after readiness/liveness probe success and subsequently crashes because of some reason (OOM, transition to unhealthy state).

In the current design, the containers get boosted via the admission-controller webhook. So, you're right that it'll requires a pod recreation to get the resources boosted again.

For option (1), starting a container with lower requests and boosting them afterwards can be problematic. The resize request itself may be slow (O(10s) to finish?) which may be too long for some applications. And for some apps (like JMV-based apps), the container detection mechanism won't reflect the increased resources that happened after the application has already started, so the boosted resources will be wasted.

For option (2), we could consider adding an option to evict the pod upon a container restart. But I agree with @omerap12 and @adrianmoisey here: we can start simple, test this feature out in the real world, then we can consider tweaks to the logic and see if we need to handle these cases.

@laoj2 laoj2 force-pushed the aep-7862-fixes-to-api branch from ea041d5 to 61a7116 Compare June 23, 2025 20:58
Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm , there are just a few notes from me

updateMode: "Off"
startupBoost:
cpu:
value: 3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @adrianmoisey pointed out, using floats is discouraged: #8026 (comment).

Perhaps we should apply the same approach I suggested here: https://github.com/kubernetes/autoscaler/pull/8026/files#diff-b666c5599d02971edb0c08090acb86761b7c726c97c4c6c1bbf00ff5883a2364R87.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I changed this to string. Value is a "factor" when Type=factor or a quantity when Type=quantity, so we need this to be a string to give us enough flexibility to handle the quantity cases. So, we can handle the conversion logic in the controller code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! seems ok to me

@laoj2 laoj2 force-pushed the aep-7862-fixes-to-api branch from 61a7116 to eba5f1e Compare June 24, 2025 17:36
@laoj2 laoj2 requested a review from omerap12 June 24, 2025 17:38
@omerap12
Copy link
Member

/lgtm
/assign @adrianmoisey

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2025
@adrianmoisey
Copy link
Member

/title AEP-7862: Decouple Startup CPU Boost from VPA modes - updates

@adrianmoisey
Copy link
Member

/retitle AEP-7862: Decouple Startup CPU Boost from VPA modes - updates

@k8s-ci-robot k8s-ci-robot changed the title Decouple Startup CPU Boost from VPA modes AEP-7862: Decouple Startup CPU Boost from VPA modes - updates Jun 25, 2025
multiplier for the recommended CPU request. For example, a value of `2` will
double the CPU request.
* `Quantity`: The `StartupBoost.CPU.Value` field will be interpreted as an
absolute CPU resource quantity (e.g., `"500m"`, `"1"`) to be used as the CPU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to sanity check this. If Type was set to Factor, and Value was set to 500m, would that be invalid and the API reject it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, @adrianmoisey. I updated the Validation section with this information.

@adrianmoisey
Copy link
Member

I'm generally happy with this
Pinging more folks for visibility: @voelzmo @raywainman @maxcao13

@laoj2 laoj2 force-pushed the aep-7862-fixes-to-api branch from eba5f1e to 4d5a01c Compare June 25, 2025 20:15
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 25, 2025
@laoj2 laoj2 force-pushed the aep-7862-fixes-to-api branch from 4d5a01c to ecb4297 Compare June 25, 2025 20:16
during the startup boost phase.
* [Optional] `StartupBoost.CPU.Duration`: if specified, it indicates for how
long to keep the pod boosted **after** it goes to `Ready`.
The new `StartupBoost` parameter will be added to both:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is adding the option to both the spec and the container resource policy simply just for convenience?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that people will want to configure the CPU boost only for the main container of a pod. So, the container resource policy helps with that, so you don't get all of the pod's containers boosted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, and looks good to me. :)

@adrianmoisey
Copy link
Member

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2025
Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Looking forward for the implementation of this feature!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianmoisey, laoj2, omerap12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit af75d6e into kubernetes:master Jun 26, 2025
7 checks passed
@adrianmoisey
Copy link
Member

Hey @kubernetes/sig-autoscaling-leads
I the spirit of #8277, could you give this a retrospective review?

@raywainman
Copy link
Member

raywainman commented Jul 2, 2025

Overall LGTM. Just one comment on my end... (and apologies for not getting this in time)

I'm not a huge fan of overloading the value field for both the boost and value cases, it feels a bit icky to have to parse the string depending on the use case. It would also be really nice to be able to use the resource.Quantity type for the boost case for example.

Otherwise we will need to parse the string field to handle all the various ways someone could encode a CPU value (eg. 500m, 0.5, 5x10^-1, etc...)

You could have two fields:

  • factorValue (type int)
  • quantity (type resource.Quantity)

Then we have some simple validation to make sure only one of these is used at a time.

I know the API is slightly uglier this way but in my opinion a lot less ambiguous.

(This may have been discussed previously - I did a quick grep of the comments but couldn't find anything but curious what you all think).

cc @laoj2 @adrianmoisey @omerap12

@adrianmoisey
Copy link
Member

(This may have been discussed previously - I did a quick grep of the comments but couldn't find anything but curious what you all think).

It was discussed in #8175 (comment)

I think you're right though. I had mis-read the HPA Horizontalpodautoscaler.spec.metrics.resource.target field and assumed that a single value was overloaded, which is why I had made that suggestion. However, I was wrong:

$ kubectl explain Horizontalpodautoscaler.spec.metrics.resource.target
GROUP:      autoscaling
KIND:       HorizontalPodAutoscaler
VERSION:    v2

FIELD: target <MetricTarget>


DESCRIPTION:
    target specifies the target value for the given metric
    MetricTarget defines the target value, average value, or average utilization
    of a specific metric

FIELDS:
  averageUtilization	<integer>
    averageUtilization is the target value of the average of the resource metric
    across all relevant pods, represented as a percentage of the requested value
    of the resource for the pods. Currently only valid for Resource metric
    source type

  averageValue	<Quantity>
    averageValue is the target value of the average of the metric across all
    relevant pods (as a quantity)

  type	<string> -required-
    type represents whether the metric type is Utilization, Value, or
    AverageValue

  value	<Quantity>
    value is the target value of the metric (as a quantity).

You could have two fields:

  • factorValue (type int)
  • quantity (type resource.Quantity)

Agreed, so in summary:

StartupBoost.CPU.Type - Could be Factor or Quantity. 
StartupBoost.CPU.factorValue
StartupBoost.CPU.quantity

Depending on which Type is selected, we need to enforce that it's "value" counterpart is also set, so only these are the only valid combinations:

StartupBoost.CPU.Type - Factor
StartupBoost.CPU.factorValue - 2
StartupBoost.CPU.Type - Quantity. 
StartupBoost.CPU.quantity - 200Mi

And if it's unset, it's equivalent of:

StartupBoost.CPU.Type - Factor
StartupBoost.CPU.factorValue - 1

OR

StartupBoost.CPU.Type - Quantity. 
StartupBoost.CPU.quantity - 0Mi

I'm unsure if we want to set (or need to set?) defaults to either of those?

@omerap12
Copy link
Member

omerap12 commented Jul 2, 2025

Thanks @raywainman! I agree.
@adrianmoisey , IMO I don't think we need any defaults.

@adrianmoisey
Copy link
Member

@laoj2 Just want to make sure you saw this discussion, sorry for having it post-merge

@laoj2
Copy link
Contributor Author

laoj2 commented Jul 7, 2025

Thanks for the recommendations @raywainman @adrianmoisey.

@adrianmoisey, regarding your summary. I think we can skip the StartupBoost.CPU.Type field?

And only one of StartupBoost.CPU.quantity or StartupBoost.CPU.factorValue can be set at a time (and this can be enforced in the controller).

Yeah, we probably don't need defaults and if we need to "disable" the feature for a single container, we can set StartupBoost.CPU.factorValue=1 in the ContainerPolicy.

I'll send a PR with the updates.

@adrianmoisey
Copy link
Member

regarding your summary. I think we can skip the StartupBoost.CPU.Type field?

It's a pattern I've noticed in the HPA.
It has Horizontalpodautoscaler.spec.metrics.type and also Horizontalpodautoscaler.spec.metrics.resource.target.type

May be it's worth asking in #sig-api-machinery (on slack) on what they recommend?

@raywainman
Copy link
Member

May be it's worth asking in #sig-api-machinery (on slack) on what they recommend?

Good idea - then let's put together a quick PR to update the AEP with the outcome.

Thanks all!

@laoj2
Copy link
Contributor Author

laoj2 commented Jul 9, 2025

Thanks folks! I asked #sig-api-machinery about this here: https://kubernetes.slack.com/archives/C0EG7JC6T/p1751910562504969

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants