Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator loops forever when creating a vmagent or a vmalert #277

Closed
hamelg opened this issue Jul 8, 2021 · 7 comments
Closed

operator loops forever when creating a vmagent or a vmalert #277

hamelg opened this issue Jul 8, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@hamelg
Copy link

hamelg commented Jul 8, 2021

Creating this crd triggers a fast never ending loop.

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
  name: testxyz
  namespace: w6-ops
spec:
  remoteWrite:
    - basicAuth:
        password:
          key: password
          name: credentials
        username:
          key: username
          name: credentials
      url: 'https://victoria/api/v1/write'
  replicaCount: 1
  scrapeInterval: 1m

we see theses logs are producing quickly in the operator log :

{"level":"info","ts":1625776400.8874278,"logger":"controllers.VMAgent","msg":"Reconciling","vmagent":"w6-ops/testxyz"}
{"level":"info","ts":1625776400.9768188,"logger":"factory","msg":"creating default clusterrole for vmagent","controller":"vmagent.crud"}
{"level":"info","ts":1625776400.9845016,"logger":"factory","msg":"selected ServiceScrapes","servicescrapes":"w6-ops/vmagent-testxyz","namespace":"w6-ops","vmagent":"testxyz"}
{"level":"info","ts":1625776400.984525,"logger":"factory","msg":"selected PodScrapes","podscrapes":"","namespace":"w6-ops","vmagent":"testxyz"}
{"level":"info","ts":1625776400.9845345,"logger":"factory","msg":"filtering namespaces to select vmProbes from","namespace":"w6-ops","vmagent":"testxyz"}
{"level":"info","ts":1625776400.9845395,"logger":"factory","msg":"selected VMProbes","vmProbes":"","namespace":"w6-ops","vmagent":"testxyz"}
{"level":"info","ts":1625776400.9845507,"logger":"factory","msg":"selected VMNodeScrapes","vmagent":"testxyz","VMNodeScrapes":""}
{"level":"info","ts":1625776400.984558,"logger":"factory","msg":"selected StaticScrapes","staticScrapes":"","namespace":"w6-ops","vmagent":"testxyz"}
{"level":"info","ts":1625776401.002099,"logger":"factory","msg":"updating VMAgent configuration secret skipped, no configuration change"}
{"level":"info","ts":1625776401.0021315,"logger":"factory","msg":"selected PodScrapes","podscrapes":"","namespace":"w6-ops","vmagent":"testxyz"}
{"level":"info","ts":1625776401.0021732,"logger":"factory","msg":"selected ServiceScrapes","servicescrapes":"w6-ops/vmagent-testxyz","namespace":"w6-ops","vmagent":"testxyz"}
{"level":"info","ts":1625776401.0337205,"logger":"factory","msg":"create or update vm agent deploy","controller":"vmagent.crud"}
{"level":"info","ts":1625776401.0385828,"logger":"factory","msg":"vmagent deploy reconciled","controller":"vmagent.crud","vmagent.deploy.name":"vmagent-testxyz","vmagent.deploy.namespace":"w6-ops"}
{"level":"info","ts":1625776401.0431418,"logger":"controllers.VMAgent","msg":"reconciled vmagent","vmagent":"w6-ops/testxyz"}

The operator updates continuously the service account vmagent-testxyz : the last update time follows the wall clock.


$ oc get sa vmagent-testxyz -o yaml
apiVersion: v1
imagePullSecrets:
- name: vmagent-testxyz-dockercfg-sm86c
kind: ServiceAccount
metadata:
  creationTimestamp: "2021-07-08T20:26:54Z"
  finalizers:
  - apps.victoriametrics.com/finalizer
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/instance: testxyz
    app.kubernetes.io/name: vmagent
    managed-by: vm-operator
  managedFields:
...
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:imagePullSecrets: {}
      f:secrets:
        k:{"name":"vmagent-testxyz-dockercfg-sm86c"}:
          .: {}
          f:name: {}
    manager: openshift-controller-manager
    operation: Update   <<<<###############
    time: "2021-07-08T20:48:23Z"  <<<<############### Follow the wall clock
  name: vmagent-testxyz
  namespace: w6-ops
  ownerReferences:
  - apiVersion: operator.victoriametrics.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: VMAgent
    name: testxyz
    uid: 5937c8f4-44cb-43ff-831b-4d094bef48eb
...

Any ideas how to troubleshooting and help to understand this behavior ?

@f41gh7
Copy link
Collaborator

f41gh7 commented Jul 9, 2021

Can you share your kubernetes version? It will help to catch an issue.

@hamelg
Copy link
Author

hamelg commented Jul 9, 2021

$ oc version
Client Version: openshift-clients-4.6.0-202006250705.p0-168-g02c110006
Server Version: 4.6.25
Kubernetes Version: v1.19.0+a5a0987

@kenankule
Copy link

We are currently seeing the same problem with vmsingle and vmagent.
The operator (v0.15.2) seems to be going in a continuous reconcilation loop and gets OOMKilled.
Same openshift/kubernetes version.

@f41gh7
Copy link
Collaborator

f41gh7 commented Jul 9, 2021

Thanks, will investigate it.

@f41gh7 f41gh7 added the bug Something isn't working label Jul 9, 2021
@hamelg
Copy link
Author

hamelg commented Jul 9, 2021

We have other clusters with the same version where the operator works fine. We haven't been able to find out which cause the issue.

f41gh7 added a commit that referenced this issue Jul 11, 2021
adds /debug/pprof handler from main VM repo
#277
f41gh7 added a commit that referenced this issue Jul 11, 2021
* fixes serviceAccount reconcilation
adds /debug/pprof handler from main VM repo
#277

* updates rbac for vmauth config reloader
@f41gh7
Copy link
Collaborator

f41gh7 commented Jul 11, 2021

Fix for serviceAccount endless loop will be included to the next release.

Also, since next release operator will expose /debug/pprof handlers at 0.0.0.0:8435 port. For OOM issue, its would be usefull to collect memory profile with command curl http://localhost:8435/debug/pprof/heap > operator_heap.pprof and share at at issue.

@hamelg
Copy link
Author

hamelg commented Jul 15, 2021

I have just updated our current VM operator to the latest version (v0.16.0) and all works fine.
Thank you for resolving this issue

@hamelg hamelg closed this as completed Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants