Skip to content
This repository has been archived by the owner on Jan 22, 2021. It is now read-only.

Strange HPA behaviour with custom insights metric (pods don't scale down) #34

Closed
larsmaes opened this issue Oct 1, 2018 · 14 comments
Closed
Labels
bug Something isn't working

Comments

@larsmaes
Copy link

larsmaes commented Oct 1, 2018

I am using v0.4.1 of custom metric adapter and the HPA is acting strangely.

Example my curren value is 1 and my targetaverage is 100k. Is should not scale up, but is should scale down.

This is a describe example:

Name:                                  wordpress-database-hpa
Namespace:                             default
Labels:                                <none>
Annotations:                           <none>
CreationTimestamp:                     Mon, 01 Oct 2018 23:02:45 +0200
Reference:                             StatefulSet/wordpress-database-mariadb-slave
Metrics:                               ( current / target )
  "custom-requestspersecond" on pods:  1 / 100k
Min replicas:                          1
Max replicas:                          5
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  the last scale time was sufficiently old as to warrant a new scale
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from pods metric custom-requestspersecond
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
Events:           <none>
@jsturtevant
Copy link
Collaborator

There don't appear to be any current scaling events. A few questions:

  • Did you scale up and then scale back down? I assume you are referring to ScalingLimited condition as the strange behavior?
  • What is the current replica count of the stateful set (kubectl get StatefulSet/wordpress-database-mariadb-slave)?
  • What version of Kubernetes are you running?

@larsmaes
Copy link
Author

larsmaes commented Oct 1, 2018

Well is was ust trying to understand it. but it just wants to scale up.. there seems to be something wrong with the replica calculation or something.. Is there anywhere i can check some debugging on this?

I am running v1.10.7 with azure aks
after deleting the hpa and resetting the replica count to 1 in the deployment this is what start happening:

Name:                                  wordpress-frontend-hpa
Namespace:                             default
Labels:                                <none>
Annotations:                           <none>
CreationTimestamp:                     Tue, 02 Oct 2018 01:13:49 +0200
Reference:                             Deployment/wordpress-frontend-wordpress
Metrics:                               ( current / target )
  "custom-requestspersecond" on pods:  1 / 10
Min replicas:                          2
Max replicas:                          10
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 5
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric custom-requestspersecond
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  6m    horizontal-pod-autoscaler  New size: 3; reason: pods metric custom-requestspersecond above target
  Normal  SuccessfulRescale  3m    horizontal-pod-autoscaler  New size: 4; reason: pods metric custom-requestspersecond above target
  Normal  SuccessfulRescale  9s    horizontal-pod-autoscaler  New size: 5; reason: pods metric custom-requestspersecond above target

@jsturtevant
Copy link
Collaborator

Could you share your HPA and the deployment you are trying to scale? The two examples you shared are different from each other and it is difficult to see a pattern.

Are you using a custom metric in App Insights or a built-in Metric? Looking at the first example there could be a miss-match between the quantities types that are being compared. See this info on metric quantities from the kubernetes docs.

If you are using a custom metric that you are reporting could you give some details on the what it is and value that is being reported to AI? This will help replicate the issue.

There are a couple ways to debug:

  • view the value in appinsights: https://dev.applicationinsights.io/apiexplorer/metrics to make sure they are what you are expecting
  • view the raw value returned from the adapter: kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/<your-namespace>/pods/*/custom-requestspersecond"
  • view the logs of the metric adapter (k logs <metricadapter-pod>) to see how the request is built and make sure going to the correct AppInisghts endpoint
  • If those don't help you could look at the controller logs thought this would be a last spot to look, as it is much more likely something isn't configured properly

@larsmaes
Copy link
Author

larsmaes commented Oct 3, 2018

I am using a custom metric, that is provided by my application (customMetric/requestspersecond).
In Insights it shows ok. The API explorer also shows the correct values. The raw get on the k8s api also provides correct results.. is see nothing strange in the custom-metrics logs.. I almost suspect the HPA replica calculator to have a bug with raw values..

hpa

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: wordpress-frontend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: wordpress-frontend-wordpress 
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metricName: customMetrics-requestspersecond
      targetAverageValue: 8
---

deployment:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: 2018-09-26T19:08:10Z
  generation: 107
  labels:
    app: wordpress-frontend-wordpress
    chart: wordpress-3.0.1
    heritage: Tiller
    release: wordpress-frontend
  name: wordpress-frontend-wordpress
  namespace: default
  resourceVersion: "728340"
  selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/wordpress-frontend-wordpress
  uid: 8290085e-c1bf-11e8-99a4-b6f735be52dd
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: wordpress-frontend-wordpress
      release: wordpress-frontend
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: wordpress-frontend-wordpress
        chart: wordpress-3.0.1
        release: wordpress-frontend
    spec:
      containers:
      - env:
        - name: ALLOW_EMPTY_PASSWORD
          value: "yes"
        - name: MARIADB_HOST
          value: wordpress-database-mariadb
        - name: MARIADB_PORT_NUMBER
          value: "3306"
        - name: WORDPRESS_DATABASE_NAME
          value: wordpress
        - name: WORDPRESS_DATABASE_USER
          value: wordpress
        - name: WORDPRESS_DATABASE_PASSWORD
          valueFrom:
            secretKeyRef:
              key: db-password
              name: wordpress-frontend-externaldb
        - name: WORDPRESS_USERNAME
          value: admin
        - name: WORDPRESS_PASSWORD
          valueFrom:
            secretKeyRef:
              key: wordpress-password
              name: wordpress-frontend-wordpress
        - name: WORDPRESS_EMAIL
          value: user@example.com
        - name: WORDPRESS_FIRST_NAME
          value: FirstName
        - name: WORDPRESS_LAST_NAME
          value: LastName
        - name: WORDPRESS_BLOG_NAME
          value: User's Blog!
        - name: WORDPRESS_TABLE_PREFIX
          value: wp_
        - name: SMTP_HOST
        - name: SMTP_PORT
        - name: SMTP_USER
        - name: SMTP_PASSWORD
          valueFrom:
            secretKeyRef:
              key: smtp-password
              name: wordpress-frontend-wordpress
        - name: SMTP_USERNAME
        - name: SMTP_PROTOCOL
        image: docker.io/bitnami/wordpress:4.9.8-debian-9
        imagePullPolicy: IfNotPresent
        name: wordpress-frontend-wordpress
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 6
          httpGet:
            path: /wp-login.php
            port: http
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /bitnami/apache
          name: wordpress-data
          subPath: apache
        - mountPath: /bitnami/wordpress
          name: wordpress-data
          subPath: wordpress
        - mountPath: /bitnami/php
          name: wordpress-data
          subPath: php
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: wordpress-data
        persistentVolumeClaim:
          claimName: wordpress-frontend-wordpress
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: 2018-10-01T22:57:07Z
    lastUpdateTime: 2018-10-03T09:28:16Z
    message: ReplicaSet "wordpress-frontend-wordpress-6f7bbfc6c6" has successfully
      progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: 2018-10-03T09:35:29Z
    lastUpdateTime: 2018-10-03T09:35:29Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 107
  readyReplicas: 1
  replicas: 2
  unavailableReplicas: 1
  updatedReplicas: 2

api results

HTTP/1.1 200
content-type: application/json; charset=utf-8

{
  "value": {
    "start": "2018-10-03T10:35:49.153Z",
    "end": "2018-10-03T10:40:49.153Z",
    "interval": "PT1M",
    "segments": [
      {
        "start": "2018-10-03T10:35:49.153Z",
        "end": "2018-10-03T10:36:00.000Z",
        "customMetrics/requestspersecond": {
          "avg": 3
        }
      },
      {
        "start": "2018-10-03T10:36:00.000Z",
        "end": "2018-10-03T10:37:00.000Z",
        "customMetrics/requestspersecond": {
          "avg": 3
        }
      },
      {
        "start": "2018-10-03T10:37:00.000Z",
        "end": "2018-10-03T10:38:00.000Z",
        "customMetrics/requestspersecond": {
          "avg": 3
        }
      },
      {
        "start": "2018-10-03T10:38:00.000Z",
        "end": "2018-10-03T10:39:00.000Z",
        "customMetrics/requestspersecond": {
          "avg": 3
        }
      }
    ]
  }
}

raw result:

{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/customMetrics-requestspersecond"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "default",
        "name": "customMetrics-requestspersecond",
        "apiVersion": "/__internal"
      },
      "metricName": "customMetrics-requestspersecond",
      "timestamp": "2018-10-03T10:42:02Z",
      "value": "3"
    }
  ]
}

describe HPA (scaling up??)

Name:                                         wordpress-frontend-hpa
Namespace:                                    default
Labels:                                       <none>
Annotations:                                  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"wordpress-frontend-hpa","namespace":"default"...
CreationTimestamp:                            Wed, 03 Oct 2018 11:40:51 +0200
Reference:                                    Deployment/wordpress-frontend-wordpress
Metrics:                                      ( current / target )
  "customMetrics-requestspersecond" on pods:  3 / 8
Min replicas:                                 1
Max replicas:                                 10
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     False   BackoffBoth         the time since the previous scale is still within both the downscale and upscale forbidden windows
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric customMetrics-requestspersecond
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  49m   horizontal-pod-autoscaler  New size: 1; reason: All metrics below target
  Normal  SuccessfulRescale  40m   horizontal-pod-autoscaler  New size: 2; reason: pods metric customMetrics-requestspersecond above target
  Normal  SuccessfulRescale  1m    horizontal-pod-autoscaler  New size: 3; reason: pods metric customMetrics-requestspersecond above target

@jsturtevant
Copy link
Collaborator

Thanks for the detail. I am going to attempt to recreate. Looks like you are using the helm wordpress deployment: https://hub.kubeapps.com/charts/stable/wordpress?

@jsturtevant jsturtevant added the bug Something isn't working label Oct 4, 2018
@larsmaes
Copy link
Author

larsmaes commented Oct 4, 2018

Yes its just the helm install of wordpress. I have used the insights plugin to be able to send metrics and created a function to send the requests per second value of apaches mod_status module every time the wp_login.php is requested. This is done by the readinessprobe every minute.

If you need the code snippets let me know. Of you can of course just send a number to simulate it..

@larsmaes
Copy link
Author

larsmaes commented Oct 5, 2018

meanwhile i have set it up with the prometheus custom metrics adapter. There is some difference in the output from the raw get request:

{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/apache_accesses"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "default",
        "name": "wordpress-frontend-wordpress-864bf5f7bb-h6qw8",
        "apiVersion": "/__internal"
      },
      "metricName": "apache_accesses",
      "timestamp": "2018-10-05T16:16:22Z",
      "value": "21770m"
    },
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "default",
        "name": "wordpress-frontend-wordpress-864bf5f7bb-pgx9j",
        "apiVersion": "/__internal"
      },
      "metricName": "apache_accesses",
      "timestamp": "2018-10-05T16:16:22Z",
      "value": "20088m"
    }
  ]
}

here you see the "name" element has the pods name. Maybe the calculation is going wrong because the adpater doesnt give per pod metrics but just one.

@jsturtevant
Copy link
Collaborator

Sorry for the delay here, was traveling last week. We were able to reproduce similar behavior but haven't identified the root cause.

@jsturtevant
Copy link
Collaborator

Quick update: Have identified the issue and have a fix that I am testing.

@larsmaes
Copy link
Author

Hey! Thanks for the update!

@jsturtevant jsturtevant changed the title Strange HPA behaviour with custom insights metric Strange HPA behaviour with custom insights metric (pods don't scale down) Nov 1, 2018
@jsturtevant
Copy link
Collaborator

Fixed by #30. Thanks @larsmaes for the pointer to the pod count!

@mlbloxer
Copy link

That's awesome to see this thread. I experience a similar issue with custom metric scale down. Thanks

@jsturtevant
Copy link
Collaborator

It should be resolved in the latest version. Are you still experiencing it?

@mlbloxer
Copy link

It should be resolved in the latest version. Are you still experiencing it?

Ah... Sorry my bad, I confused this project with the prometheus-adapter.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants