Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common Horizontal Pod Autoscaler events in AKS cluster #1137

Closed
brobichaud opened this issue Aug 5, 2019 · 14 comments
Closed

Common Horizontal Pod Autoscaler events in AKS cluster #1137

brobichaud opened this issue Aug 5, 2019 · 14 comments
Assignees
Labels

Comments

@brobichaud
Copy link

What happened:
I am frequently seeing warning events regarding metrics and the horizontal pod autoscaler as seen below:

LAST SEEN   TYPE      REASON                         OBJECT                                          MESSAGE
109s        Warning   FailedGetResourceMetric        horizontalpodautoscaler/dev-burns-portal-hpa    unable to get metrics for resource cpu: no metrics returned from resource metrics API
26m         Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/dev-burns-portal-hpa    failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API

It's unclear to me whether this is truly an AKS setup issue or is really a k8s or hpa issue, so I'm starting here. I didn't see these when I had a raw AKS-Engine based cluster not long ago.

What you expected to happen:
For the HPA to successfully acquire metrics on every call.

How to reproduce it (as minimally and precisely as possible):
Setup a simple AKS cluster with Windows nodepool, deploy some pods and setup simple CPU-based autoscale rules, such as:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: dev-burns-portal-hpa
spec:
  minReplicas: 2
  maxReplicas: 4
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 80
        type: Utilization
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: dev-burns-portal

Monitor system events and you should see events similar to above referenced.

Anything else we need to know?:
I was not seeing these with a raw AKS-Engine based cluster of similar configuration.

Environment:

  • Kubernetes version (use kubectl version): v1.14.3
  • Size of cluster (how many worker nodes are in the cluster?): Windows nodepool with 2 F8s_v2 nodes
  • General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.): All Windows, mix of AspNetCore web apps and DotNetFramework workers.
  • Others:
@mikkelhegn
Copy link
Contributor

Could this be caused by this issue with slow response from the api? kubernetes/kubernetes#75752

I've seen metrics become unavailable through kubectl top nodes, with e.g. 10 containers on a Windows node.

@mikkelhegn mikkelhegn self-assigned this Aug 12, 2019
@brobichaud
Copy link
Author

Hmmm, that issue is definitely suspicious looking, but it is hard for me to say if that would actually cause this. I do see "kubectl top nodes" return "unknown" on my Windows nodepool currently.

@cpunella
Copy link

Hi, have you set resource limit in the deployment yaml definition?

@brobichaud
Copy link
Author

@cpunella Yes I do have both CPU and RAM limits in all of my deployments. Does that somehow affect hpa or access to metrics data?

@zhiweiv
Copy link

zhiweiv commented Aug 18, 2019

It should be the metrics api issue, you can check the metrics server pod logs,
kubectl logs -n kube-system --tail=100 metrics-server-66dbbb67db-vtzq9

you will find something like:
Failed to get kubelet_summary:10.10.0.252:10255 response in time

@brobichaud
Copy link
Author

Indeed @zhiweiv I am seeing events like:
Failed to get kubelet_summary...

mixed in with a ton of:
No metrics for pod default/dev

Thanks for the steps to confirm.

@cpunella
Copy link

cpunella commented Sep 5, 2019

@brobichaud yes, I found that if you don't set limits in the deployment, metrics are not collected ...

@brobichaud
Copy link
Author

@brobichaud yes, I found that if you don't set limits in the deployment, metrics are not collected ...

But I do have limits on all of my deployments, yet metrics seem hit or miss...

@partha-sarathi-sarkar
Copy link

partha-sarathi-sarkar commented Nov 28, 2019

Hi @brobichaud
Can you please try this script and let me know it is working or not

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: dev-burns-portal-hpa
spec:
minReplicas: 2
maxReplicas: 4
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name:
targetCPUUtilizationPercentage: 80

And make sure your deployment file you have allocate resources for your pods

@brobichaud
Copy link
Author

brobichaud commented Mar 2, 2020

Does anyone know if it is accurate to say that this should be addressed by this change: kubernetes/kubernetes#74991

@mikkelhegn
Copy link
Contributor

Thanks my understanding - @marosset to confirm.

@marosset
Copy link

marosset commented Mar 2, 2020

I think so.
Also, the fix is still only available in 1.18+ (my PRs to backport the fix to 1.15-1.17 are still in limbo)

@brobichaud
Copy link
Author

I think so.
Also, the fix is still only available in 1.18+ (my PRs to backport the fix to 1.15-1.17 are still in limbo)

If you do end up getting this into earlier k8s releases could you update this issue?

@mikkelhegn
Copy link
Contributor

This got fixed in kubernetes/kubernetes#87730, but introduced this issue: kubernetes/kubernetes#90554, which is now fixed and deployed in AKS from 1.16.9+, 1.17.5+ and 1.18.1+

@Azure Azure locked as resolved and limited conversation to collaborators Jul 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants