Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/etcd] Pods unhealthy #24582

Closed
ujala-singh opened this issue Mar 20, 2024 · 3 comments
Closed

[bitnami/etcd] Pods unhealthy #24582

ujala-singh opened this issue Mar 20, 2024 · 3 comments
Assignees
Labels
solved tech-issues The user has a technical issue about an application triage Triage is needed

Comments

@ujala-singh
Copy link

ujala-singh commented Mar 20, 2024

Name and Version

bitnami/etcd 9.15.2

What architecture are you using?

None

What steps will reproduce the bug?

I followed the steps mentioned here on my minikube cluster:

helm install etcd bitnami/etcd -n etcd

After installing the chart, In etcd pod events readiness probes are failing continuously:

  Warning  Unhealthy  42s (x93 over 34m)  kubelet  (combined from similar events): Readiness probe failed: {"level":"warn","ts":"2024-03-20T18:04:41.503292Z","logger":"client","caller":"v3@v3.5.12/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x40000eca80/etcd-0.etcd-headless.etcd.svc.cluster.local:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.244.1.15:2379: connect: connection refused\""}
etcd-0.etcd-headless.etcd.svc.cluster.local:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
^[[38;5;6metcd ^[[38;5;5m18:04:41.51 ^[[0m^[[38;5;1mERROR^[[0m ==> Unhealthy endpoint!

Are you using any custom parameters or values?

None, I am using the default values defined in the helm chart.

What is the expected behavior?

Pods should start healthy and I should be able to put and get data from my etcd datastore.

What do you see instead?

Pods unhealthy

➜  ~ kubectl describe pod etcd-0 -n etcd       
Name:             etcd-0
Namespace:        etcd
Priority:         0
Service Account:  etcd
Node:             minikube-m02/192.168.49.3
Start Time:       Wed, 20 Mar 2024 22:58:58 +0530
Labels:           app.kubernetes.io/component=etcd
                  app.kubernetes.io/instance=etcd
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=etcd
                  app.kubernetes.io/version=3.5.12
                  controller-revision-hash=etcd-58f66d4956
                  helm.sh/chart=etcd-9.15.2
                  statefulset.kubernetes.io/pod-name=etcd-0
Annotations:      checksum/token-secret: 5e48d4aff2180cd84249e2a450997196d7bb7f53a03ae81d84a9b53b655b5a1f
Status:           Running
IP:               10.244.1.15
IPs:
  IP:           10.244.1.15
Controlled By:  StatefulSet/etcd
Containers:
  etcd:
    Container ID:    docker://b0cba46e0ea80727e0f9412b5a1b3738156118a7402903d43aba89bd03bd251f
    Image:           docker.io/bitnami/etcd:3.5.12-debian-12-r10
    Image ID:        docker-pullable://bitnami/etcd@sha256:10f0765efd483345ef6f6e43c00008402096f9a553b795519c616eb9d6e4cb9e
    Ports:           2379/TCP, 2380/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    State:           Running
      Started:       Wed, 20 Mar 2024 23:31:21 +0530
    Last State:      Terminated
      Reason:        Error
      Exit Code:     137
      Started:       Wed, 20 Mar 2024 23:09:35 +0530
      Finished:      Wed, 20 Mar 2024 23:31:21 +0530
    Ready:           False
    Restart Count:   4
    Liveness:        exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:       exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     false
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       etcd-0 (v1:metadata.name)
      MY_STS_NAME:                       etcd
      ETCDCTL_API:                       3
      ETCD_ON_K8S:                       yes
      ETCD_START_FROM_SNAPSHOT:          no
      ETCD_DISASTER_RECOVERY:            no
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_LOG_LEVEL:                    info
      ALLOW_NONE_AUTHENTICATION:         no
      ETCD_ROOT_PASSWORD:                <set to the key 'etcd-root-password' in secret 'etcd'>  Optional: false
      ETCD_AUTH_TOKEN:                   jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).etcd-headless.etcd.svc.cluster.local:2379,http://etcd.etcd.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  http://$(MY_POD_NAME).etcd-headless.etcd.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             http://0.0.0.0:2380
      ETCD_CLUSTER_DOMAIN:               etcd-headless.etcd.svc.cluster.local
    Mounts:
      /bitnami/etcd from data (rw)
      /opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro)
      /opt/bitnami/etcd/conf/ from empty-dir (rw,path="app-conf-dir")
      /tmp from empty-dir (rw,path="tmp-dir")
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-etcd-0
    ReadOnly:   false
  empty-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  etcd-jwt-token:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcd-jwt-token
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  36m   default-scheduler  Successfully assigned etcd/etcd-0 to minikube-m02
  Normal   Pulled     36m   kubelet            Container image "docker.io/bitnami/etcd:3.5.12-debian-12-r10" already present on machine
  Normal   Created    36m   kubelet            Created container etcd
  Normal   Started    36m   kubelet            Started container etcd
  Warning  Unhealthy  35m   kubelet            Readiness probe failed: {"level":"warn","ts":"2024-03-20T17:30:10.975504Z","logger":"client","caller":"v3@v3.5.12/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x400016ca80/etcd-0.etcd-headless.etcd.svc.cluster.local:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.244.1.15:2379: connect: connection refused\""}
etcd-0.etcd-headless.etcd.svc.cluster.local:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
^[[38;5;6metcd ^[[38;5;5m17:30:10.98 ^[[0m^[[38;5;1mERROR^[[0m ==> Unhealthy endpoint!
  Warning  Unhealthy  35m  kubelet  Readiness probe failed: {"level":"warn","ts":"2024-03-20T17:30:20.994494Z","logger":"client","caller":"v3@v3.5.12/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x400015a380/etcd-0.etcd-headless.etcd.svc.cluster.local:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.244.1.15:2379: connect: connection refused\""}
etcd-0.etcd-headless.etcd.svc.cluster.local:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
^[[38;5;6metcd ^[[38;5;5m17:30:21.00 ^[[0m^[[38;5;1mERROR^[[0m ==> Unhealthy endpoint!

Additional information

➜  ~ kubectl get pvc -A                        
NAMESPACE   NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
etcd        data-etcd-0      Bound    pvc-84ef1bc2-3f51-4e94-bc11-adaef44cad74   1Gi        RWO            standard       32h
mysql       mysql-pv-claim   Bound    mysql-pv-volume                            1Gi        RWO            manual         6d7h
➜  ~ kubectl get pv -A                         
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                  STORAGECLASS   REASON   AGE
mysql-pv-volume                            1Gi        RWO            Retain           Bound    mysql/mysql-pv-claim   manual                  6d7h
pvc-84ef1bc2-3f51-4e94-bc11-adaef44cad74   1Gi        RWO            Delete           Bound    etcd/data-etcd-0       standard                32h
➜  ~ kubectl describe svc etcd-headless -n etcd
Name:              etcd-headless
Namespace:         etcd
Labels:            app.kubernetes.io/component=etcd
                   app.kubernetes.io/instance=etcd
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=etcd
                   app.kubernetes.io/version=3.5.12
                   helm.sh/chart=etcd-9.15.2
Annotations:       meta.helm.sh/release-name: etcd
                   meta.helm.sh/release-namespace: etcd
                   service.alpha.kubernetes.io/tolerate-unready-endpoints: true
Selector:          app.kubernetes.io/component=etcd,app.kubernetes.io/instance=etcd,app.kubernetes.io/name=etcd
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                None
IPs:               None
Port:              client  2379/TCP
TargetPort:        client/TCP
Endpoints:         10.244.1.15:2379
Port:              peer  2380/TCP
TargetPort:        peer/TCP
Endpoints:         10.244.1.15:2380
Session Affinity:  None
Events:            <none>
@ujala-singh ujala-singh added the tech-issues The user has a technical issue about an application label Mar 20, 2024
@github-actions github-actions bot added the triage Triage is needed label Mar 20, 2024
@ujala-singh
Copy link
Author

@javsalgar Please help me on this.

@gecube
Copy link
Contributor

gecube commented Mar 22, 2024

@ujala-singh did you try the latest chart version? I see that it is not bitnami/etcd 9.15.2 but 10.0.0 with appVersion: 3.5.12

@ujala-singh
Copy link
Author

Was able to create the etcd cluster with the below config:

helm install mycertm bitnami/cert-manager --set installCRDs=true

cat <<EOF> user.yaml
initialClusterState: "new"
replicaCount: 3

image:
  debug: true

persistence:
  accessModes:
    - ReadWriteMany
  size: 2Gi

volumePermissions:
  enabled: true

auth:
  rbac:
    enabled: true
    create: true
    allowNoneAuthentication: false
    rootPassword: "secret"
  token:
    enabled: true
    type: jwt
  peer:
    secureTransport: true
    useAutoTLS: false
    existingSecret: etcd-peer-tls
    enableAuthentication: true
    certFilename: tls.crt
    certKeyFilename: tls.key
    caFilename: ca.crt
  client:
    secureTransport: true
    useAutoTLS: false
    existingSecret: etcd-client-tls
    enableAuthentication: true
    certFilename: tls.crt
    certKeyFilename: tls.key
    caFilename: ca.crt
EOF

helm install myetcd bitnami/etcd -f user.yaml 

sleep 60

cat <<EOF>certs.yaml 
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: etcd-client
spec:
  secretName: etcd-client-tls
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: selfsigned-issuer
  usages:
  - server auth
  - client auth
  dnsNames:
  - 'myetcd.default.svc.cluster.local'
  - '*.myetcd.default.svc.cluster.local'
  - 'myetcd-headless.default.svc.cluster.local'
  - '*.myetcd-headless.default.svc.cluster.local'
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: etcd-peer
spec:
  secretName: etcd-peer-tls
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: selfsigned-issuer
  usages:
  - server auth
  - client auth
  dnsNames:
  - 'myetcd.default.svc.cluster.local'
  - '*.myetcd.default.svc.cluster.local'
  - 'myetcd-headless.default.svc.cluster.local'
  - '*.myetcd-headless.default.svc.cluster.local'
EOF

kubectl apply -f certs.yaml

cat <<EOF>issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: selfsigned-issuer
  namespace: default
spec:
  selfSigned: {}
EOF

kubectl apply -f issuer.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved tech-issues The user has a technical issue about an application triage Triage is needed
Projects
None yet
Development

No branches or pull requests

3 participants