Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmstorage statefulset not correct update when the container of vmBackup has extra "vmstorage-db" mount #366

Closed
wu0407 opened this issue Nov 1, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@wu0407
Copy link

wu0407 commented Nov 1, 2021

If vmBackup container has extra vmstorage-db mount path, next remove this extra mount in vmcluster, the result is that remain old vmstorage-db mount in statefulset when upgrade from 0.19.0 to 0.20.1

old vmcluster:

 vmstorage:
    extraArgs:
      search.maxUniqueTimeseries: "600000"
    image:
      tag: v1.67.0-cluster
    replicaCount: 3
    resources:
      limits:
        cpu: "4"
        memory: 16Gi
      requests:
        cpu: 50m
        memory: 1Gi
    storage:
      volumeClaimTemplate:
        metadata: {}
        spec:
          resources:
            requests:
              storage: 100Gi
          storageClassName: cbs-ssd-prepaid
        status: {}
    storageDataPath: /vm-data
    vmBackup:
      acceptEULA: true
      credentialsSecret:
        key: credentials-backup.yaml
        name: credentials-backup
      customS3Endpoint: https://cos.ap-shanghai.myqcloud.com
      destination: s3://test-thanos/$(NODENAME)
      disableMonthly: true
      disableWeekly: true
      extraArgs:
        keepLastDaily: "30"
        keepLastHourly: "72"
        runOnStart: "true"
      extraEnvs:
      - name: AWS_DEFAULT_REGION
        value: ap-shanghai
      - name: NODENAME
        valueFrom:
          fieldRef:
            fieldPath: metadata.name
      volumeMounts:
      - mountPath: /vm-data
        name: vmstorage-db

old vmstorage statefulset:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: victoria-metrics-k8s-stack
    meta.helm.sh/release-namespace: victoria
  creationTimestamp: "2021-11-01T06:24:32Z"
  finalizers:
  - apps.victoriametrics.com/finalizer
  generation: 3
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/instance: victoria-metrics-k8s-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: vmstorage
    app.kubernetes.io/version: 1.67.0
    helm.sh/chart: victoria-metrics-k8s-stack-0.5.4
    managed-by: vm-operator
  name: vmstorage-victoria-metrics-k8s-stack
  namespace: victoria
  ownerReferences:
  - apiVersion: operator.victoriametrics.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: VMCluster
    name: victoria-metrics-k8s-stack
    uid: 4073bd0e-4a12-4afb-b85f-90776685cca0
  resourceVersion: "333641442"
  selfLink: /apis/apps/v1/namespaces/victoria/statefulsets/vmstorage-victoria-metrics-k8s-stack
  uid: c25e0965-f8b5-4a5a-aab8-dd3018ca39aa
spec:
  podManagementPolicy: Parallel
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: monitoring
      app.kubernetes.io/instance: victoria-metrics-k8s-stack
      app.kubernetes.io/name: vmstorage
      managed-by: vm-operator
  serviceName: vmstorage-victoria-metrics-k8s-stack
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: monitoring
        app.kubernetes.io/instance: victoria-metrics-k8s-stack
        app.kubernetes.io/name: vmstorage
        managed-by: vm-operator
    spec:
      containers:
      - args:
        - -dedup.minScrapeInterval=1ms
        - -httpListenAddr=:8482
        - -retentionPeriod=14d
        - -search.maxUniqueTimeseries=600000
        - -storageDataPath=/vm-data
        - -vminsertAddr=:8400
        - -vmselectAddr=:8401
        image: victoriametrics/vmstorage:v1.67.0-cluster
        imagePullPolicy: IfNotPresent
        name: vmstorage
        ports:
        - containerPort: 8482
          name: http
          protocol: TCP
        - containerPort: 8400
          name: vminsert
          protocol: TCP
        - containerPort: 8401
          name: vmselect
          protocol: TCP
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /health
            port: 8482
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: "4"
            memory: 16Gi
          requests:
            cpu: 50m
            memory: 1Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /vm-data
          name: vmstorage-db
      - args:
        - -credsFilePath=/etc/vm/creds/credentials-backup.yaml
        - -customS3Endpoint=https://cos.ap-shanghai.myqcloud.com
        - -disableMonthly
        - -disableWeekly
        - -dst=s3://test-thanos/$(NODENAME)
        - -envflag.enable=true
        - -eula
        - -keepLastDaily=30
        - -keepLastHourly=72
        - -runOnStart=true
        - -snapshot.createURL=http://localhost:8482/snapshot/create
        - -snapshot.deleteURL=http://localhost:8482/snapshot/delete
        - -storageDataPath=/vm-data
        env:
        - name: AWS_DEFAULT_REGION
          value: ap-shanghai
        - name: NODENAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: victoriametrics/vmbackupmanager:v1.66.2-enterprise
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8300
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 5
        name: vmbackuper
        ports:
        - containerPort: 8300
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /health
            port: 8300
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 500m
            memory: 500Mi
          requests:
            cpu: 150m
            memory: 200Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /victoria-metrics-data
          name: vmstorage-db
          readOnly: true
        - mountPath: /vm-data
          name: vmstorage-db
        - mountPath: /etc/vm/creds
          name: secret-credentials-backup
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: vmcluster-victoria-metrics-k8s-stack
      serviceAccountName: vmcluster-victoria-metrics-k8s-stack
      terminationGracePeriodSeconds: 30
      volumes:
      - name: secret-credentials-backup
        secret:
          defaultMode: 420
          secretName: credentials-backup
  updateStrategy:
    type: OnDelete
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: vmstorage-db
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      storageClassName: cbs-ssd-prepaid
      volumeMode: Filesystem
    status:
      phase: Pending

new vmcluster:

spec:
  replicationFactor: 2
  retentionPeriod: 14d
  vminsert:
    image:
      tag: v1.68.0-cluster
    replicaCount: 2
    resources:
      limits:
        cpu: "2"
        memory: 1000Mi
      requests:
        cpu: 500m
        memory: 500Mi
  vmselect:
    cacheMountPath: /select-cache
    extraArgs:
      search.maxQueryDuration: 120s
    image:
      tag: v1.68.0-cluster
    replicaCount: 2
    resources:
      limits:
        cpu: "4"
        memory: 2Gi
      requests:
        cpu: 500m
        memory: 512Mi
    storage:
      volumeClaimTemplate:
        metadata: {}
        spec:
          resources:
            requests:
              storage: 20Gi
          storageClassName: cbs-ssd-prepaid
        status: {}
  vmstorage:
    extraArgs:
      search.maxUniqueTimeseries: "600000"
    image:
      tag: v1.68.0-cluster
    replicaCount: 3
    resources:
      limits:
        cpu: "4"
        memory: 16Gi
      requests:
        cpu: 50m
        memory: 1Gi
    storage:
      volumeClaimTemplate:
        metadata: {}
        spec:
          resources:
            requests:
              storage: 100Gi
          storageClassName: cbs-ssd-prepaid
        status: {}
    storageDataPath: /vm-data
    vmBackup:
      acceptEULA: true
      credentialsSecret:
        key: credentials-backup.yaml
        name: credentials-backup
      customS3Endpoint: https://cos.ap-shanghai.myqcloud.com
      destination: s3://test-thanos
      disableMonthly: true
      disableWeekly: true
      extraArgs:
        keepLastDaily: "30"
        keepLastHourly: "72"
        runOnStart: "true"
      extraEnvs:
      - name: AWS_DEFAULT_REGION
        value: ap-shanghai

new vmstorage statfulset:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: victoria-metrics-k8s-stack
    meta.helm.sh/release-namespace: victoria
  creationTimestamp: "2021-11-01T06:24:32Z"
  finalizers:
  - apps.victoriametrics.com/finalizer
  generation: 4
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/instance: victoria-metrics-k8s-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: vmstorage
    app.kubernetes.io/version: 1.68.0
    helm.sh/chart: victoria-metrics-k8s-stack-0.5.7
    managed-by: vm-operator
  name: vmstorage-victoria-metrics-k8s-stack
  namespace: victoria
  ownerReferences:
  - apiVersion: operator.victoriametrics.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: VMCluster
    name: victoria-metrics-k8s-stack
    uid: 4073bd0e-4a12-4afb-b85f-90776685cca0
  resourceVersion: "333647449"
  selfLink: /apis/apps/v1/namespaces/victoria/statefulsets/vmstorage-victoria-metrics-k8s-stack
  uid: c25e0965-f8b5-4a5a-aab8-dd3018ca39aa
spec:
  podManagementPolicy: Parallel
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: monitoring
      app.kubernetes.io/instance: victoria-metrics-k8s-stack
      app.kubernetes.io/name: vmstorage
      managed-by: vm-operator
  serviceName: vmstorage-victoria-metrics-k8s-stack
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: monitoring
        app.kubernetes.io/instance: victoria-metrics-k8s-stack
        app.kubernetes.io/name: vmstorage
        managed-by: vm-operator
    spec:
      containers:
      - args:
        - -dedup.minScrapeInterval=1ms
        - -httpListenAddr=:8482
        - -retentionPeriod=14d
        - -search.maxUniqueTimeseries=600000
        - -storageDataPath=/vm-data
        - -vminsertAddr=:8400
        - -vmselectAddr=:8401
        image: victoriametrics/vmstorage:v1.68.0-cluster
        imagePullPolicy: IfNotPresent
        name: vmstorage
        ports:
        - containerPort: 8482
          name: http
          protocol: TCP
        - containerPort: 8400
          name: vminsert
          protocol: TCP
        - containerPort: 8401
          name: vmselect
          protocol: TCP
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /health
            port: 8482
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: "4"
            memory: 16Gi
          requests:
            cpu: 50m
            memory: 1Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /vm-data
          name: vmstorage-db
      - args:
        - -credsFilePath=/etc/vm/creds/credentials-backup.yaml
        - -customS3Endpoint=https://cos.ap-shanghai.myqcloud.com
        - -disableMonthly
        - -disableWeekly
        - -dst=s3://test-thanos
        - -envflag.enable=true
        - -eula
        - -keepLastDaily=30
        - -keepLastHourly=72
        - -runOnStart=true
        - -snapshot.createURL=http://localhost:8482/snapshot/create
        - -snapshot.deleteURL=http://localhost:8482/snapshot/delete
        - -storageDataPath=/vm-data
        env:
        - name: AWS_DEFAULT_REGION
          value: ap-shanghai
        image: victoriametrics/vmbackupmanager:v1.66.2-enterprise
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8300
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 5
        name: vmbackuper
        ports:
        - containerPort: 8300
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /health
            port: 8300
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 500m
            memory: 500Mi
          requests:
            cpu: 150m
            memory: 200Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /victoria-metrics-data
          name: vmstorage-db
          readOnly: true
        - mountPath: /etc/vm/creds
          name: secret-credentials-backup
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: vmcluster-victoria-metrics-k8s-stack
      serviceAccountName: vmcluster-victoria-metrics-k8s-stack
      terminationGracePeriodSeconds: 30
      volumes:
      - name: secret-credentials-backup
        secret:
          defaultMode: 420
          secretName: credentials-backup
  updateStrategy:
    type: OnDelete
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: vmstorage-db
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      storageClassName: cbs-ssd-prepaid
      volumeMode: Filesystem
    status:
      phase: Pending
@wu0407
Copy link
Author

wu0407 commented Nov 1, 2021

#349

@wu0407
Copy link
Author

wu0407 commented Nov 1, 2021

relate logs in operator:

{"level":"info","ts":1635752221.7648149,"logger":"factory","msg":"create or update vmstorage for cluster","controller":"vmstorage","cluster":"victoria-metrics-k8s-stack"}
{"level":"info","ts":1635752221.7701364,"logger":"factory","msg":"vmstorage was found, updating it","controller":"vmstorage","cluster":"victoria-metrics-k8s-stack"}
{"level":"info","ts":1635752223.7734144,"logger":"client_utils","msg":"sts update is needed","sts":"vmstorage-victoria-metrics-k8s-stack","currentVersion":"vmstorage-victoria-metrics-k8s-stack-64c699b6db","desiredVersion":"vmstorage-victoria-metrics-k8s-stack-d8bdd7f48"}
{"level":"info","ts":1635752223.7734485,"logger":"client_utils","msg":"checking if update needed","controller":"sts.rollingupdate","desiredVersion":"vmstorage-victoria-metrics-k8s-stack-d8bdd7f48","wasRecreated":false}
{"level":"info","ts":1635752223.777197,"logger":"client_utils","msg":"starting pods update, checking updated, by not ready pods","controller":"sts.rollingupdate","desiredVersion":"vmstorage-victoria-metrics-k8s-stack-d8bdd7f48","wasRecreated":false,"updated pods count":1,"desired version":"vmstorage-victoria-metrics-k8s-stack-d8bdd7f48"}
{"level":"info","ts":1635752223.777214,"logger":"client_utils","msg":"checking ready status for already updated pod to desired version","controller":"sts.rollingupdate","desiredVersion":"vmstorage-victoria-metrics-k8s-stack-d8bdd7f48","wasRecreated":false,"pod":"vmstorage-victoria-metrics-k8s-stack-0"}
{"level":"error","ts":1635752313.7865117,"logger":"client_utils","msg":"cannot get ready status for already updated pod","controller":"sts.rollingupdate","desiredVersion":"vmstorage-victoria-metrics-k8s-stack-d8bdd7f48","wasRecreated":false,"pod":"vmstorage-victoria-metrics-k8s-stack-0","error":"timed out waiting for the condition","stacktrace":"github.com/VictoriaMetrics/operator/controllers/factory/k8stools.HandleSTSUpdate\n\tgithub.com/VictoriaMetrics/operator/controllers/factory/k8stools/sts.go:37\ngithub.com/VictoriaMetrics/operator/controllers/factory.createOrUpdateVMStorage\n\tgithub.com/VictoriaMetrics/operator/controllers/factory/vmcluster.go:396\ngithub.com/VictoriaMetrics/operator/controllers/factory.CreateOrUpdateVMCluster\n\tgithub.com/VictoriaMetrics/operator/controllers/factory/vmcluster.go:95\ngithub.com/VictoriaMetrics/operator/controllers.(*VMClusterReconciler).Reconcile\n\tgithub.com/VictoriaMetrics/operator/controllers/vmcluster_controller.go:68\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:214"}
 kubectl get pod -n victoria vmstorage-victoria-metrics-k8s-stack-0 -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: ephemeral-storage request
      for container vmstorage; ephemeral-storage limit for container vmstorage; ephemeral-storage
      request for container vmbackuper; ephemeral-storage limit for container vmbackuper'
  creationTimestamp: "2021-11-01T07:33:24Z"
  generateName: vmstorage-victoria-metrics-k8s-stack-
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/instance: victoria-metrics-k8s-stack
    app.kubernetes.io/name: vmstorage
    controller-revision-hash: vmstorage-victoria-metrics-k8s-stack-d8bdd7f48
    managed-by: vm-operator
    statefulset.kubernetes.io/pod-name: vmstorage-victoria-metrics-k8s-stack-0
  name: vmstorage-victoria-metrics-k8s-stack-0
  namespace: victoria
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: vmstorage-victoria-metrics-k8s-stack
    uid: c25e0965-f8b5-4a5a-aab8-dd3018ca39aa
  resourceVersion: "333714826"
  selfLink: /api/v1/namespaces/victoria/pods/vmstorage-victoria-metrics-k8s-stack-0
  uid: c3355c16-8127-4824-acaf-c909a6fa9149
spec:
  containers:
  - args:
    - -dedup.minScrapeInterval=1ms
    - -httpListenAddr=:8482
    - -retentionPeriod=14d
    - -search.maxUniqueTimeseries=600000
    - -storageDataPath=/vm-data
    - -vminsertAddr=:8400
    - -vmselectAddr=:8401
    image: victoriametrics/vmstorage:v1.68.0-cluster
    imagePullPolicy: IfNotPresent
    name: vmstorage
    ports:
    - containerPort: 8482
      name: http
      protocol: TCP
    - containerPort: 8400
      name: vminsert
      protocol: TCP
    - containerPort: 8401
      name: vmselect
      protocol: TCP
    readinessProbe:
      failureThreshold: 10
      httpGet:
        path: /health
        port: 8482
        scheme: HTTP
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        cpu: "4"
        ephemeral-storage: 7Gi
        memory: 16Gi
      requests:
        cpu: 50m
        ephemeral-storage: 256Mi
        memory: 1Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /vm-data
      name: vmstorage-db
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: vmcluster-victoria-metrics-k8s-stack-token-d6g9v
      readOnly: true
  - args:
    - -credsFilePath=/etc/vm/creds/credentials-backup.yaml
    - -customS3Endpoint=https://cos.ap-shanghai.myqcloud.com
    - -disableMonthly
    - -disableWeekly
    - -dst=s3://test-thanos
    - -envflag.enable=true
    - -eula
    - -keepLastDaily=30
    - -keepLastHourly=72
    - -runOnStart=true
    - -snapshot.createURL=http://localhost:8482/snapshot/create
    - -snapshot.deleteURL=http://localhost:8482/snapshot/delete
    - -storageDataPath=/vm-data
    env:
    - name: AWS_DEFAULT_REGION
      value: ap-shanghai
    image: victoriametrics/vmbackupmanager:v1.66.2-enterprise
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /health
        port: 8300
        scheme: HTTP
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 5
    name: vmbackuper
    ports:
    - containerPort: 8300
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 10
      httpGet:
        path: /health
        port: 8300
        scheme: HTTP
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        cpu: 500m
        ephemeral-storage: 7Gi
        memory: 500Mi
      requests:
        cpu: 150m
        ephemeral-storage: 256Mi
        memory: 200Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /victoria-metrics-data
      name: vmstorage-db
      readOnly: true
    - mountPath: /etc/vm/creds
      name: secret-credentials-backup
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: vmcluster-victoria-metrics-k8s-stack-token-d6g9v
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: vmstorage-victoria-metrics-k8s-stack-0
  nodeName: 10.12.97.39
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: vmcluster-victoria-metrics-k8s-stack
  serviceAccountName: vmcluster-victoria-metrics-k8s-stack
  subdomain: vmstorage-victoria-metrics-k8s-stack
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 120
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 120
  volumes:
  - name: vmstorage-db
    persistentVolumeClaim:
      claimName: vmstorage-db-vmstorage-victoria-metrics-k8s-stack-0
  - name: secret-credentials-backup
    secret:
      defaultMode: 420
      secretName: credentials-backup
  - name: vmcluster-victoria-metrics-k8s-stack-token-d6g9v
    secret:
      defaultMode: 420
      secretName: vmcluster-victoria-metrics-k8s-stack-token-d6g9v
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-11-01T07:33:24Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-11-01T07:33:24Z"
    message: 'containers with unready status: [vmbackuper]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-11-01T07:33:24Z"
    message: 'containers with unready status: [vmbackuper]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-11-01T07:33:24Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://ef82cea3973932c83a8b90e2413d1f7770ba86f43ceb58f325169dbe34c7ddfb
    image: victoriametrics/vmbackupmanager:v1.66.2-enterprise
    imageID: docker-pullable://victoriametrics/vmbackupmanager@sha256:4e30fe8265afa6ca432ba706c9263ef48c7d7608280b6487aabc3d6c3f791bce
    lastState:
      terminated:
        containerID: docker://ef82cea3973932c83a8b90e2413d1f7770ba86f43ceb58f325169dbe34c7ddfb
        exitCode: 255
        finishedAt: "2021-11-01T09:37:30Z"
        message: ":37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag \"memory.allowedBytes\"=\"0\"
          (is_set=false)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"memory.allowedPercent\"=\"60\" (is_set=false)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"metricsAuthKey\"=\"secret\" (is_set=false)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"pprofAuthKey\"=\"secret\" (is_set=false)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"runOnStart\"=\"true\" (is_set=true)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"snapshot.createURL\"=\"http://localhost:8482/snapshot/create\" (is_set=true)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"snapshot.deleteURL\"=\"http://localhost:8482/snapshot/delete\" (is_set=true)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"storageDataPath\"=\"/vm-data\" (is_set=true)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"tls\"=\"false\" (is_set=false)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"tlsCertFile\"=\"\" (is_set=false)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"tlsKeyFile\"=\"secret\" (is_set=false)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/logger/flag.go:28\tflag
          \"version\"=\"false\" (is_set=false)\n2021-11-01T09:37:30.296Z\tinfo\tVictoriaMetrics/lib/backup/s3remote/s3.go:78\tUsing
          provided custom S3 endpoint: \"https://cos.ap-shanghai.myqcloud.com\"\n2021-11-01T09:37:30.297Z\tinfo\tVictoriaMetrics/lib/httpserver/httpserver.go:82\tstarting
          http server at http://:8300/\n2021-11-01T09:37:30.297Z\tinfo\tVictoriaMetrics/lib/httpserver/httpserver.go:83\tpprof
          handlers are exposed at http://:8300/debug/pprof/\n2021-11-01T09:37:30.524Z\tfatal\tVictoriaMetrics/app/vmbackupmanager/main.go:115\terror
          creating backup: cannot open snapshot at \"/vm-data/snapshots/20211101093730-16B35BD038C906D2\":
          open /vm-data/snapshots/20211101093730-16B35BD038C906D2: no such file or
          directory\n"
        reason: Error
        startedAt: "2021-11-01T09:37:30Z"
    name: vmbackuper
    ready: false
    restartCount: 29
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=vmbackuper pod=vmstorage-victoria-metrics-k8s-stack-0_victoria(c3355c16-8127-4824-acaf-c909a6fa9149)
        reason: CrashLoopBackOff
  - containerID: docker://b5229a587ebca8e110cb079700f29051175371864af9ffb7bcd34dc154dd3086
    image: victoriametrics/vmstorage:v1.68.0-cluster
    imageID: docker-pullable://victoriametrics/vmstorage@sha256:76dacb432381aa48c3e8e4eea8858d829e1d31d13e483830580651ab7edef3f0
    lastState: {}
    name: vmstorage
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-11-01T07:33:59Z"
  hostIP: 10.12.97.39
  phase: Running
  podIP: 10.253.3.220
  podIPs:
  - ip: 10.253.3.220
  qosClass: Burstable
  startTime: "2021-11-01T07:33:24Z"

@wu0407
Copy link
Author

wu0407 commented Nov 1, 2021

I found out the reason is use helm chart simultaneous update operator and vmbackup. 0.19.0 operator first handle vmbackup update, then 0.20.1 operator second handle update, but it stuck in pod crashLoop and then context timeout.

err := waitForPodReady(ctx, rclient, ns, pod.Name, c, nil)

func waitForPodReady(ctx context.Context, rclient client.Client, ns, podName string, c *config.BaseOperatorConf, cb func(pod *corev1.Pod) error) error {
// we need some delay
time.Sleep(c.PodWaitReadyInitDelay)
return wait.Poll(c.PodWaitReadyIntervalCheck, c.PodWaitReadyTimeout, func() (done bool, err error) {
pod := &corev1.Pod{}
err = rclient.Get(ctx, types.NamespacedName{Namespace: ns, Name: podName}, pod)
if err != nil {
if errors.IsNotFound(err) {
return false, nil
}
log.Error(err, "cannot get pod", "pod", podName)
return false, err
}
if PodIsReady(*pod) {
log.Info("pod update finished with revision", "pod", pod.Name, "revision", pod.Labels[podRevisionLabel])
if cb != nil {
if err := cb(pod); err != nil {
return true, fmt.Errorf("errror occured at callback execution: %w", err)
}
}
return true, nil
}
return false, nil
})
}

In this situation, it need update sts first, then delete pod.

@f41gh7 f41gh7 added the bug Something isn't working label Nov 1, 2021
f41gh7 added a commit that referenced this issue Nov 5, 2021
now operator performs update for statefulet before checking pod status with rolling update
#366
f41gh7 added a commit that referenced this issue Nov 5, 2021
@f41gh7
Copy link
Collaborator

f41gh7 commented Nov 5, 2021

Sorry for delay, it must be fixed at related PR. It was a regression at statefulset update mechanism. Can you verify it with docker image: victoriametrics/operator:gh-366 ?

Thanks for investigation, it helps a lot!

@f41gh7
Copy link
Collaborator

f41gh7 commented Dec 1, 2021

Changes was included to the 0.21.0 release.

@f41gh7 f41gh7 closed this as completed Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants