Common Horizontal Pod Autoscaler events in AKS cluster #1137

brobichaud · 2019-08-05T21:21:16Z

What happened:
I am frequently seeing warning events regarding metrics and the horizontal pod autoscaler as seen below:

LAST SEEN   TYPE      REASON                         OBJECT                                          MESSAGE
109s        Warning   FailedGetResourceMetric        horizontalpodautoscaler/dev-burns-portal-hpa    unable to get metrics for resource cpu: no metrics returned from resource metrics API
26m         Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/dev-burns-portal-hpa    failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API

It's unclear to me whether this is truly an AKS setup issue or is really a k8s or hpa issue, so I'm starting here. I didn't see these when I had a raw AKS-Engine based cluster not long ago.

What you expected to happen:
For the HPA to successfully acquire metrics on every call.

How to reproduce it (as minimally and precisely as possible):
Setup a simple AKS cluster with Windows nodepool, deploy some pods and setup simple CPU-based autoscale rules, such as:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: dev-burns-portal-hpa
spec:
  minReplicas: 2
  maxReplicas: 4
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 80
        type: Utilization
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: dev-burns-portal

Monitor system events and you should see events similar to above referenced.

Anything else we need to know?:
I was not seeing these with a raw AKS-Engine based cluster of similar configuration.

Environment:

Kubernetes version (use kubectl version): v1.14.3
Size of cluster (how many worker nodes are in the cluster?): Windows nodepool with 2 F8s_v2 nodes
General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.): All Windows, mix of AspNetCore web apps and DotNetFramework workers.
Others:

The text was updated successfully, but these errors were encountered:

mikkelhegn · 2019-08-12T08:39:53Z

Could this be caused by this issue with slow response from the api? kubernetes/kubernetes#75752

I've seen metrics become unavailable through kubectl top nodes, with e.g. 10 containers on a Windows node.

brobichaud · 2019-08-15T00:14:31Z

Hmmm, that issue is definitely suspicious looking, but it is hard for me to say if that would actually cause this. I do see "kubectl top nodes" return "unknown" on my Windows nodepool currently.

cpunella · 2019-08-15T07:52:43Z

Hi, have you set resource limit in the deployment yaml definition?

brobichaud · 2019-08-15T15:56:31Z

@cpunella Yes I do have both CPU and RAM limits in all of my deployments. Does that somehow affect hpa or access to metrics data?

zhiweiv · 2019-08-18T06:45:36Z

It should be the metrics api issue, you can check the metrics server pod logs,
kubectl logs -n kube-system --tail=100 metrics-server-66dbbb67db-vtzq9

you will find something like:
Failed to get kubelet_summary:10.10.0.252:10255 response in time

brobichaud · 2019-08-19T21:32:00Z

Indeed @zhiweiv I am seeing events like:
Failed to get kubelet_summary...

mixed in with a ton of:
No metrics for pod default/dev

Thanks for the steps to confirm.

cpunella · 2019-09-05T10:55:21Z

@brobichaud yes, I found that if you don't set limits in the deployment, metrics are not collected ...

brobichaud · 2019-09-06T15:44:57Z

@brobichaud yes, I found that if you don't set limits in the deployment, metrics are not collected ...

But I do have limits on all of my deployments, yet metrics seem hit or miss...

partha-sarathi-sarkar · 2019-11-28T06:49:06Z

Hi @brobichaud
Can you please try this script and let me know it is working or not

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: dev-burns-portal-hpa
spec:
minReplicas: 2
maxReplicas: 4
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name:
targetCPUUtilizationPercentage: 80

And make sure your deployment file you have allocate resources for your pods

brobichaud · 2020-03-02T20:56:20Z

Does anyone know if it is accurate to say that this should be addressed by this change: kubernetes/kubernetes#74991

mikkelhegn · 2020-03-02T21:35:45Z

Thanks my understanding - @marosset to confirm.

marosset · 2020-03-02T23:18:11Z

I think so.
Also, the fix is still only available in 1.18+ (my PRs to backport the fix to 1.15-1.17 are still in limbo)

brobichaud · 2020-03-02T23:21:34Z

I think so.
Also, the fix is still only available in 1.18+ (my PRs to backport the fix to 1.15-1.17 are still in limbo)

If you do end up getting this into earlier k8s releases could you update this issue?

mikkelhegn · 2020-06-23T14:15:20Z

This got fixed in kubernetes/kubernetes#87730, but introduced this issue: kubernetes/kubernetes#90554, which is now fixed and deployed in AKS from 1.16.9+, 1.17.5+ and 1.18.1+

triage-new-issues bot added the triage label Aug 5, 2019

mikkelhegn added the windows label Aug 12, 2019

triage-new-issues bot removed the triage label Aug 12, 2019

mikkelhegn self-assigned this Aug 12, 2019

mikkelhegn closed this as completed Jun 23, 2020

Azure locked as resolved and limited conversation to collaborators Jul 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common Horizontal Pod Autoscaler events in AKS cluster #1137

Common Horizontal Pod Autoscaler events in AKS cluster #1137

brobichaud commented Aug 5, 2019

mikkelhegn commented Aug 12, 2019

brobichaud commented Aug 15, 2019

cpunella commented Aug 15, 2019

brobichaud commented Aug 15, 2019

zhiweiv commented Aug 18, 2019 •

edited

brobichaud commented Aug 19, 2019

cpunella commented Sep 5, 2019

brobichaud commented Sep 6, 2019

partha-sarathi-sarkar commented Nov 28, 2019 •

edited

brobichaud commented Mar 2, 2020 •

edited

mikkelhegn commented Mar 2, 2020

marosset commented Mar 2, 2020

brobichaud commented Mar 2, 2020

mikkelhegn commented Jun 23, 2020

Common Horizontal Pod Autoscaler events in AKS cluster #1137

Common Horizontal Pod Autoscaler events in AKS cluster #1137

Comments

brobichaud commented Aug 5, 2019

mikkelhegn commented Aug 12, 2019

brobichaud commented Aug 15, 2019

cpunella commented Aug 15, 2019

brobichaud commented Aug 15, 2019

zhiweiv commented Aug 18, 2019 • edited

brobichaud commented Aug 19, 2019

cpunella commented Sep 5, 2019

brobichaud commented Sep 6, 2019

partha-sarathi-sarkar commented Nov 28, 2019 • edited

brobichaud commented Mar 2, 2020 • edited

mikkelhegn commented Mar 2, 2020

marosset commented Mar 2, 2020

brobichaud commented Mar 2, 2020

mikkelhegn commented Jun 23, 2020

zhiweiv commented Aug 18, 2019 •

edited

partha-sarathi-sarkar commented Nov 28, 2019 •

edited

brobichaud commented Mar 2, 2020 •

edited