Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus-operator repeatedly deletes prometheus StatefulSet once pods reach ContainerCreating #2950

Open
chaosaffe opened this issue Jan 11, 2020 · 6 comments
Labels

Comments

@chaosaffe
Copy link
Contributor

@chaosaffe chaosaffe commented Jan 11, 2020

What happened?
On upgrading to v0.34.0 the prometheus-operator started deleting the prometheus-k8s StatefulSet once the pods reached the ContainerCreating status.

When the operator is scaled to 0 (terminated) after the StatefulSet is created but before the pods enter the ContainerCreating status the pods are able to sucessfully start and run the prometheus-k8s pods

Did you expect to see some different?
Yes, I expected that the StatefulSet would be created and not be repeatedly deleted by the operator.

How to reproduce it (as minimally and precisely as possible):
Unknown. The issue recurs in this environment but is unseen in other environments

Environment

  • Prometheus Operator version:

    v0.34.0

  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-10T03:03:57Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

Kubeadm

  • Manifests:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: prometheus
              operator: In
              values:
              - k8s
          namespaces:
          - monitoring
          topologyKey: kubernetes.io/hostname
        weight: 100
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  replicas: 1
  resources:
    requests:
      memory: 400Mi
  retention: 30d
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  storage:
    volumeClaimTemplate:
      apiVersion: v1
      kind: PersistentVolumeClaim
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: standard
  version: v2.11.0

  • Prometheus Operator Logs:
level=debug ts=2020-01-10T15:31:26.377784425Z caller=operator.go:1151 component=prometheusoperator msg="updating current Prometheus statefulset"
level=debug ts=2020-01-10T15:31:26.381955209Z caller=operator.go:1157 component=prometheusoperator msg="resolving illegal update of Prometheus StatefulSet"
level=debug ts=2020-01-10T15:31:26.676005869Z caller=operator.go:1016 component=prometheusoperator msg="StatefulSet delete"

This recurs repeatedly within the sync loop

Anything else we need to know?:

@chaosaffe chaosaffe added the kind/bug label Jan 11, 2020
@brancz

This comment has been minimized.

Copy link
Member

@brancz brancz commented Jan 13, 2020

That’s indeed odd and shouldn’t happen. It seems like some illegal update is continuously attempted. It would be good if we added what the illegal action was, as I believe the stated upset api does return this information.

@brancz

This comment has been minimized.

Copy link
Member

@brancz brancz commented Jan 13, 2020

cc @pgier @s-urbaniak i think this could potentially have to do with the controller generation tooling changes?

@chaosaffe

This comment has been minimized.

Copy link
Contributor Author

@chaosaffe chaosaffe commented Jan 13, 2020

I should also note that this occurred after I updated kube-prometheus to master as there were several K8s API Alpha/Beta Group removals in 1.17.

@metalmatze

This comment has been minimized.

Copy link
Member

@metalmatze metalmatze commented Jan 15, 2020

It would be great if you could start a kind cluster with 1.17 and check if it happens there too when deploying kube-prometheus master.

@chaosaffe

This comment has been minimized.

Copy link
Contributor Author

@chaosaffe chaosaffe commented Jan 15, 2020

I have tested using image kindest/node:v1.17.0 and deploying my manifests generated by kube-prometheus I see the same behaviour with the StatefulSet being repeatedly deleted once it reaches the Pending phase.

It appears that this behaviour may be related to the changes in Kubernetes 1.17.

@juzhao

This comment has been minimized.

Copy link

@juzhao juzhao commented Jan 21, 2020

checked, no issue with OCP 4.4
# kubectl -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | more level=info ts=2020-01-21T08:25:39.355Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=rhaos-4.4-rhel-7, revision=26eb549e3fbb176c890dffd3d2ac6b4ebed2ae44)"

# oc -n openshift-monitoring logs prometheus-operator-54c74f6797-zjvlt ts=2020-01-21T02:33:18.399798794Z caller=main.go:199 msg="Starting Prometheus Operator version '0.34.0'." ts=2020-01-21T02:33:18.409227432Z caller=main.go:96 msg="Staring insecure server on :8080" level=info ts=2020-01-21T02:33:18.415414693Z caller=operator.go:441 component=prometheusoperator msg="connection established" cluster-version=v1.17.0 level=info ts=2020-01-21T02:33:18.415426254Z caller=operator.go:219 component=alertmanageroperator msg="connection established" cluster-version=v1.17.0

# kubectl version Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0-4-g38212b5", GitCommit:"e17af88a81bab82a47aa161fb0db99f6e9424661", GitTreeState:"clean", BuildDate:"2020-01-17T09:20:01Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.0", GitCommit:"12de527", GitTreeState:"clean", BuildDate:"2020-01-20T12:51:02Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.