Rolling update of deployments creates MAX number of pods X 2 according to HPA, regardless of current load #84142

paalkr · 2019-10-21T07:01:41Z

What happened:
The bug described in #72775 is still present in 1.14.8
When a change is made to the running deployment that requires a rolling update to be executed, both the current replicaset and the new replicaset is scaled out to max number of pods regardless of actual load.

What you expected to happen:
The bug to be fixed
#79035
#79708

How to reproduce it (as minimally and precisely as possible):
Follow this guide https://gist.github.com/skaji/575e4a383fac0f0d11b5b51220a6049c
The cluster does NOT have to be an EKS cluster, the bug is not related to the underlying platform or infrastructure.

Anything else we need to know?:
This is bug is really critical, and should be addressed and hotfixed as soon as possible.

Environment:

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"windows/amd64"} Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:02:12Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: AWS EC2
OS (e.g: cat /etc/os-release): Container Linux by CoreOS 2247.5.0 (Rhyolite)
Kernel (e.g. uname -a): 4.19.78-coreos
Install tools: kube-aws v.0.14.2 (https://github.com/kubernetes-incubator/kube-aws)

The text was updated successfully, but these errors were encountered:

paalkr · 2019-10-21T07:20:07Z

@kubernetes/sig-autoscalling-bugs

paalkr · 2019-10-22T08:50:11Z

This bug does only apply to the autoscaling/v2beta2 and autoscaling/v2beta1 API version, not the autoscaling/v1 API.

thdrnsdk · 2019-10-24T08:21:19Z

same issue
i think it was really critical iuuse so hope hot fixed asap

neolit123 · 2019-10-24T18:16:39Z

/sig autoscaling

JoelSpeed · 2019-10-28T16:41:58Z

I've been looking into this and have made some interesting observations about the behaviour within our systems at least:

Firstly, during the update, we see New size: x; reason: All metrics below target even though x is greater in size than the current replica count, and this continues to increase over time. This suggests the bug may be somewhere in here:

kubernetes/pkg/controller/podautoscaler/horizontal.go

Lines 624 to 627 in 4485c6f

    
           if desiredReplicas < currentReplicas { 
        
           	rescaleReason = "All metrics below target" 
        
           } 
        
           desiredReplicas = a.normalizeDesiredReplicas(hpa, key, currentReplicas, desiredReplicas)

currentReplicas is set to scale.Spec.Replicas and so I believe desiredReplicas is being increased (to greater than currentReplicas) by the normalizeDesiredReplicas method.

Secondly, we see the scale doubling (approximately) during the rolling update. The observation I've made it that it seems to be getting updated to the total count of the old and new replicasets pods combined, as if it is using the number of pods that match the label matcher for the deployment as the desired count when it finally updates the scale. I have yet to find a piece of code that verifies this though, merely an observation.

Hopefully this helps narrow down the bug, it may also be a wild goose chase 😅 I can do more testing if anyone has any ideas, just let me know

qoobic · 2019-11-07T09:38:24Z

Just for information, I am seeing similar behaviour using a custom metric, not CPU - I am using sum(rate(http_server_requests_seconds_count[5m])). In our dev environment the config is min of 1, max of 3. Even when there are no requests coming in, hpa is scaling to 3 or 4 pods then cooling back down to 3 pods after 5 minutes.

shibataka000 · 2019-11-09T06:12:54Z

In `Step3`

CPU Utilization of Pod-1 is 1 / 100 = 0.01.
CPU Utilization of Pod-2 is 1 / 100 = 0.01.

So CPU Utilization of Deployment is (1 + 1) / (100 + 100) = 0.01.

replicaCount = ceil(readyPodCount * CurrentResourceUtilization / desiredResourceUtilization) = ceil(2 * 0.01 / 0.5) = 1

So calculated number of replicas is 1. But minReplicas is 2, so as a result, new number of replicas is 2.

In `Step 5`

New Pod-3 is created due to RollingUpdate with maxSurge: 1 and ~~minUnavailable~~ maxUnavailable: 0 .

CPU Utilization of Pod-1 is 1 / 100 = 0.01.
CPU Utilization of Pod-2 is 1 / 100 = 0.01.
CPU Utilization of Pod-3 is 100 / 100 = 1.0 because metrics-server can't provide metrics of freshly maded Pod-3 and hpa-controller treats pod's resource utilization as 1.0 when deployment tend to scale in.

So CPU Utilization of Deployment is (1 + 1 + 100) / (100 + 100 + 100) = 0.34.

replicaCount = ceil(len(metrics) * CurrentResourceUtilization / desiredResourceUtilization) = ceil(3 * 0.34 / 0.5) = 3

So new number of replicas is 3 in spite of CurrentResourceUtilization / desiredResourceUtilization < 1.0 .

shibataka000 · 2019-11-09T06:13:31Z

I'd like to fix this issue in #85027 .

max-rocket-internet · 2019-11-11T11:19:21Z

@shibataka000 as workaround, can we just set minUnavailable: 1 to solve it?

paalkr · 2019-11-11T12:38:47Z

@shibataka000 , just to make sure this is not forgotten
Specifying maxUnavailable: 0 also results in an uncontrolled roll out. Please verify that the fix in #85027 apply to the maxSurge: 1 and maxUnavailable: 0 scenario as well.

Or is your #84142 (comment) about minUnavailable: 0 just a typo? Your PR actually specifies maxUnavailable: 0, which is what I reported in this bug.

In Step 5
New Pod-3 is created due to RollingUpdate with maxSurge: 1 and minUnavailable: 0.

paalkr · 2019-11-11T12:50:40Z

@max-rocket-internet , if I'm not mistaken this bug is related to maxUnavailable: 0, not minUnavailable: 0. So specifying minUnavailable: 1 won't help.

Edit: There is actually no such thing as a minUnavailable:, which wouldn't make any logical sense either.
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#deploymentstrategy-v1-apps

max-rocket-internet · 2019-11-11T13:02:35Z

Sorry, I meant maxUnavailable: 0 but copy pasted the incorrect text. I think minUnavailable is simply a typo.

So, as workaround, can we just set maxUnavailable: 1 to solve it?

paalkr · 2019-11-11T13:10:55Z

Unfortunately maxUnavailable: 1 is really not a good workaround. Say you have a deployment with 200 pods, maxUnavailable: 1 would slow down the rolling update dramatically, as only one new pod could be created at a time, instead of all 200 at once.

If the startup time for your pod is 10 seconds, replacing all 200 pods would take 10 seconds with maxUnavailable: 0, but 2000 seconds (33.3 minutes) with maxUnavailable: 1

And according to the documentation, setting maxUnavailable to anything except for 0 will reduce the number of pods (basically reducing capacity) during a rolling update. We cannot tolerate any reduced capacity during a rolling update, as the service powered by the pods might get saturated with incoming customer requests.

Example: when this is set to 30%, the old ReplicaSet can be scaled down to 70% of desired pods immediately when the rolling update starts. Once new pods are ready, old ReplicaSet can be scaled down further, followed by scaling up the new ReplicaSet, ensuring that the total number of pods available at all times during the update is at least 70% of desired pods.

My understanding is that the maxUnavailable setting is to ensure that your cluster (worker nodes) are not saturated during a rolling update. In our case we do have enough free capacity in the cluster to always be able to immediately replace all pods for any running service.

shibataka000 · 2019-11-11T13:21:41Z

@paalkr @max-rocket-internet I'm sorry. minUnavailable is typo. maxUnavailable is correct.

shibataka000 · 2019-11-11T13:50:18Z

@max-rocket-internet

as workaround, can we just set maxUnavailable: 1 to solve it?

Unfortunately no. When I try scenario described in https://gist.github.com/skaji/575e4a383fac0f0d11b5b51220a6049c with maxUnavailable: 1, HPA scale deployment regardless of current load. len(metrics) become 3 depends on timing.

fejta-bot · 2020-02-09T14:42:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

paalkr · 2020-02-09T22:14:05Z

/remove-lifecycle stale

paalkr · 2020-02-09T22:16:24Z

What is the status of this issue? I still believe it's a critical bug that needs to be addressed.

caiohasouza · 2020-02-12T02:01:59Z

+1

danielgblanco · 2020-03-03T16:25:14Z

We may be running into the same or a problem that may be connected to this. We run a deployment with

  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

The HPA, which targets the deployment, is set to scale on a custom metric which is scraped every 30 seconds (as the value is a calculated as a rate it takes 60 seconds for the first value to appear).

When the rolling update runs, even with completely idle deployment at 1 replicas, the HPA starts to add replicas to the deployment until it either hits the max replicas (which is 8 for this deployment) or until the first Pod generates a metric. The in calcPlainMetricReplicas method works as intended. It uses 100% of the target value for pods missing metrics, so, with 2 replicas running, one at 100% and the other one at 1% it still returns 2 replicas as the desired replicas count.

I believe the problem is related to how the current number of replicas is taken from the deployment spec during HPA reconciliation (code). During a rolling update, we saw the deployment spec and the deployment status being out of sync for long periods of time (over 3 minutes).

$ while sleep 1; do
>     date
>     kubectl get deployment my-deployment -o=jsonpath='{.spec.replicas} {.status.replicas}'
>     echo
> done

Mon  2 Mar 2020 14:42:26 GMT
1 1
[...]
Mon  2 Mar 2020 14:43:22 GMT
1 1
Mon  2 Mar 2020 14:43:23 GMT
1 2
[...]
Mon  2 Mar 2020 14:43:28 GMT
2 3
[...]
Mon  2 Mar 2020 14:43:42 GMT
3 4
[...]
Mon  2 Mar 2020 14:43:57 GMT
4 5
[...]
Mon  2 Mar 2020 14:44:13 GMT
5 6
[...]
Mon  2 Mar 2020 14:44:27 GMT
6 7
[...]
Mon  2 Mar 2020 14:47:50 GMT
6 6
[...]
Mon  2 Mar 2020 14:49:28 GMT
5 5
[...]
Mon  2 Mar 2020 14:49:43 GMT
4 4
[...]
Mon  2 Mar 2020 14:49:58 GMT
3 3
[...]
Mon  2 Mar 2020 14:52:28 GMT
2 2
[...]
Mon  2 Mar 2020 14:52:58 GMT
1 1

This does not always happen, but it happens most of the times we do a rolling update. The HPA status is also confusing, as it indicates that the metric is above target when in fact is well below target:

  Type    Reason             Age                 From                       Message
  ----    ------             ----                ----                       -------
  Normal  SuccessfulRescale  2m44s (x3 over 5d)  horizontal-pod-autoscaler  New size: 2; reason: pods metric my_custom_metric above target
  Normal  SuccessfulRescale  2m29s (x3 over 5d)  horizontal-pod-autoscaler  New size: 3; reason: pods metric my_custom_metric above target
  Normal  SuccessfulRescale  2m14s (x3 over 5d)  horizontal-pod-autoscaler  New size: 4; reason: pods metric my_custom_metric above target
  Normal  SuccessfulRescale  119s (x3 over 5d)   horizontal-pod-autoscaler  New size: 5; reason: pods metric my_custom_metric above target
  Normal  SuccessfulRescale  104s                horizontal-pod-autoscaler  New size: 6; reason: pods metric my_custom_metric above target

Should the HPA reconciliation use the current number of replicas from the status rather that from the spec?

josephburnett · 2020-03-19T09:02:31Z

@paalkr thanks for the report. I am able to reproduce this on 1.14.10 with your test yaml.

I will find the root cause and come back to you shortly.

josephburnett · 2020-03-19T09:02:45Z

/assign josephburnett

bgpat · 2020-03-19T09:25:39Z

This issue is reproduced in my environment (v1.15.9) too, and it's resolved by replacing the metric type from PodsMetricSource to ObjectMetricSource. I suspect the formula calculating the average is wrong.

josephburnett · 2020-03-24T09:52:57Z

I've cherry picked #85027 on top of 1.14.8 and it indeed fixes the bug. I've approved the PR and requested to fix it in calcPlainMetricReplicas as well.

paalkr · 2020-03-24T17:33:31Z

@josephburnett , thx a lot! So in the next release of 1.14, 1.15, 1.16 and 1.17 this bug will be fixed?

gjtempleton · 2020-03-24T17:41:35Z

Unless I'm mistaken this won't be cherry-picked back to 1.14 as it's fallen out of support as of the GA release of 1.17 as per the support policy.

At best it'll be getting back to 1.15.

josephburnett · 2020-03-25T13:19:27Z

I'll work on backporting this to 1.15 - 1.18.

josephburnett · 2020-03-26T11:36:38Z

Cherry pick process (for reference): https://github.com/kubernetes/community/blob/83af19fd21fe3af7cd88630e7239b974091645d2/contributors/devel/sig-release/cherry-picks.md#cherry-pick-review

1:15: #89514
1.16: #89515
1.17: #89516
1.18: #89517

shyamjos · 2020-05-02T06:49:34Z

This bug is affecting us badly , Is this fix merged in 1.16.x release ? if not is there a temporary solution to fix this issue

djjayeeta · 2020-05-16T00:58:55Z

@shyamjos
This fix is available from 1.16.9.

Fix bug about unintentional scale out during updating deployment. **MR 描述 / 目的**: kubernetes#89465 **关联 issue**: Fixes kubernetes#84142 **代码审查须知**: **MR 是否对用户有影响？**: ```发布须知 ```

paalkr added the kind/bug Categorizes issue or PR as related to a bug. label Oct 21, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 21, 2019

k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 24, 2019

shibataka000 mentioned this issue Nov 9, 2019

Fix HPA bug about unintentional scale out during updating deployment. #85027

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 9, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 9, 2020

k8s-ci-robot assigned josephburnett Mar 19, 2020

k8s-ci-robot closed this as completed in #85027 Mar 24, 2020

yolocs mentioned this issue Sep 30, 2020

Avoid unintentional HPA replicas increase during deployment updates google/knative-gcp#1789

Closed

This was referenced Jun 16, 2021

Fix HPA bug about unintentional scale out during updating deployment #102910

Closed

Fix HPA bug about unintentional scale out during updating deployment #102914

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling update of deployments creates MAX number of pods X 2 according to HPA, regardless of current load #84142

Rolling update of deployments creates MAX number of pods X 2 according to HPA, regardless of current load #84142

paalkr commented Oct 21, 2019 •

edited

paalkr commented Oct 21, 2019

paalkr commented Oct 22, 2019

thdrnsdk commented Oct 24, 2019

neolit123 commented Oct 24, 2019

JoelSpeed commented Oct 28, 2019

qoobic commented Nov 7, 2019

shibataka000 commented Nov 9, 2019 •

edited

shibataka000 commented Nov 9, 2019

max-rocket-internet commented Nov 11, 2019

paalkr commented Nov 11, 2019 •

edited

paalkr commented Nov 11, 2019 •

edited

max-rocket-internet commented Nov 11, 2019

paalkr commented Nov 11, 2019 •

edited

shibataka000 commented Nov 11, 2019

shibataka000 commented Nov 11, 2019

fejta-bot commented Feb 9, 2020

paalkr commented Feb 9, 2020

paalkr commented Feb 9, 2020

caiohasouza commented Feb 12, 2020

danielgblanco commented Mar 3, 2020

josephburnett commented Mar 19, 2020

josephburnett commented Mar 19, 2020

bgpat commented Mar 19, 2020

josephburnett commented Mar 24, 2020

paalkr commented Mar 24, 2020

gjtempleton commented Mar 24, 2020

josephburnett commented Mar 25, 2020

josephburnett commented Mar 26, 2020

shyamjos commented May 2, 2020

djjayeeta commented May 16, 2020

Rolling update of deployments creates MAX number of pods X 2 according to HPA, regardless of current load #84142

Rolling update of deployments creates MAX number of pods X 2 according to HPA, regardless of current load #84142

Comments

paalkr commented Oct 21, 2019 • edited

paalkr commented Oct 21, 2019

paalkr commented Oct 22, 2019

thdrnsdk commented Oct 24, 2019

neolit123 commented Oct 24, 2019

JoelSpeed commented Oct 28, 2019

qoobic commented Nov 7, 2019

shibataka000 commented Nov 9, 2019 • edited

In Step3

In Step 5

shibataka000 commented Nov 9, 2019

max-rocket-internet commented Nov 11, 2019

paalkr commented Nov 11, 2019 • edited

paalkr commented Nov 11, 2019 • edited

max-rocket-internet commented Nov 11, 2019

paalkr commented Nov 11, 2019 • edited

shibataka000 commented Nov 11, 2019

shibataka000 commented Nov 11, 2019

fejta-bot commented Feb 9, 2020

paalkr commented Feb 9, 2020

paalkr commented Feb 9, 2020

caiohasouza commented Feb 12, 2020

danielgblanco commented Mar 3, 2020

josephburnett commented Mar 19, 2020

josephburnett commented Mar 19, 2020

bgpat commented Mar 19, 2020

josephburnett commented Mar 24, 2020

paalkr commented Mar 24, 2020

gjtempleton commented Mar 24, 2020

josephburnett commented Mar 25, 2020

josephburnett commented Mar 26, 2020

shyamjos commented May 2, 2020

djjayeeta commented May 16, 2020

paalkr commented Oct 21, 2019 •

edited

shibataka000 commented Nov 9, 2019 •

edited

In `Step3`

In `Step 5`

paalkr commented Nov 11, 2019 •

edited

paalkr commented Nov 11, 2019 •

edited

paalkr commented Nov 11, 2019 •

edited