Skip to content

Bug with operator when syncing VMAgent.spec.replicas in statefulmode with hpa enabled. #2190

@vkasljevic-super

Description

@vkasljevic-super

Hi,

In short, the bug happens when operator manager VMAgent runs as StatefulSet and has HPA enabled (recent feature). When running as a Deployment everything works correctly. Operator know not to sync replica count and just let HPA do it's job. However, when running as a StatefulSet, there is a bug where both operator and HPA try to control replica count. This results in pods being created and deleted constantly and HPA not being able to scale with workload.

Operator version: 0.62.1 (Helm chart version), v0.69.0 (app version)
VMAgent image version: v1.136.0

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
  name: dummy
spec:
  replicaCount: 1

  hpa:
    minReplicas: 2
    maxReplicas: 3
    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 80

  statefulMode: true

Operator logs:

{"level":"info","ts":"2026-05-19T10:22:24Z","logger":"controller.VMAgent","msg":"updating Statefulset name=vmcluster/vmagent-dummy, is_prev_nil=false","vmagent":"dummy","namespace":"vmcluster","spec_diff":{"spec.replicas":{"--":2,"++":1}}}
{"level":"info","ts":"2026-05-19T10:22:39Z","logger":"controller.VMAgent","msg":"updating Statefulset name=vmcluster/vmagent-dummy, is_prev_nil=false","vmagent":"dummy","namespace":"vmcluster","spec_diff":{"spec.replicas":{"--":2,"++":1}}}
{"level":"info","ts":"2026-05-19T10:22:54Z","logger":"controller.VMAgent","msg":"updating Statefulset name=vmcluster/vmagent-dummy, is_prev_nil=false","vmagent":"dummy","namespace":"vmcluster","spec_diff":{"spec.replicas":{"--":2,"++":1}}}

HPA events:

Normal   SuccessfulRescale             2m53s (x27 over 19m)  horizontal-pod-autoscaler  New size: 2; reason: Current number of replicas below Spec.MinReplicas

In addition to fixing the bug, would it be possible to add option to disable replicaCount sync by the operator.

It looks like the fix is to have this code look something like this code.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions