Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGBouncer Liveness Probe Failure Minikube #558

Closed
2 tasks done
razilevin opened this issue Apr 13, 2022 · 6 comments · Fixed by #560
Closed
2 tasks done

PGBouncer Liveness Probe Failure Minikube #558

razilevin opened this issue Apr 13, 2022 · 6 comments · Fixed by #560
Labels
kind/bug kind - things not working properly
Milestone

Comments

@razilevin
Copy link

Checks

Chart Version

airflow-8.5.3

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-17T03:51:43Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}

Helm Version

version.BuildInfo{Version:"v3.7.0", GitCommit:"eeac83883cb4014fe60267ec6373570374ce770b", GitTreeState:"clean", GoVersion:"go1.16.8"}

Description

For some reason the liveness probe for PGBouncer fails causing restarts. I have execed into the pod and tried to run the command emulating the liveness probe,

It just hangs when using the command as is but replacing the DNS name with localhost the command succeeds.

I have tested K8s DNS and it does resolve to the correct pod IP.

I have tried to follow along with the existing issues related to pgbouncer no luck.

Not sure what else to try.

Relevant Logs

No response

Custom Helm Values

No response

@razilevin razilevin added the kind/bug kind - things not working properly label Apr 13, 2022
@razilevin razilevin changed the title PGBouncer Liveness Probe Failure Miniikube PGBouncer Liveness Probe Failure Minikube Apr 13, 2022
@razilevin
Copy link
Author

razilevin commented Apr 13, 2022

I believe this to be a minikube issue.

nslookup -type=a airflow-pgbouncer
Server: 10.96.0.10
Address: 10.96.0.10:53

Name: airflow-pgbouncer.airflow.svc.cluster.local
Address: 10.104.178.111`

nc -vz 10.104.178.111 6432
nc: 10.104.178.111 (10.104.178.111:6432): Operation timed out

nc -vz localhost 6432
localhost (127.0.0.1:6432) open

Not sure how to tackle this one. What am I missing

@thesuperzapper
Copy link
Member

@razilevin have you tried with the newly released version 8.6.0 of the chart?

I wonder if your problem was fixed by #547.

Also, can you do a kubectl describe on the pgbouncer pods?

@razilevin
Copy link
Author

Yes as of last night. No luck. Here are some command outputs. I do see the startup probe as defined in the chart.

helm list -n airflow

NAME   	NAMESPACE	REVISION	UPDATED                                	STATUS  	CHART        	APP VERSION
airflow	airflow  	1       	2022-04-14 10:09:15.498261421 -0400 EDT	deployed	airflow-8.6.0	2.2.5 

kubectl describe -n airflow pod airflow-pgbouncer-5cbcc9dfc7-ttpzd

Name:         airflow-pgbouncer-5cbcc9dfc7-ttpzd
Namespace:    airflow
Priority:     0
Node:         airflow/192.168.49.2
Start Time:   Thu, 14 Apr 2022 10:09:16 -0400
Labels:       app=airflow
              component=pgbouncer
              pod-template-hash=5cbcc9dfc7
              release=airflow
Annotations:  checksum/secret-config-envs: 158f6fcdcf922f7ea6926ed1a3c17c06ac343ea0c400f3711c21a4f98ef0fad7
              checksum/secret-pgbouncer: 7c131dbfff550340d4411a29115ae828ea04cee5c1aaf93f097d8a9f8de6bf2e
              cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status:       Running
IP:           172.17.0.3
IPs:
  IP:           172.17.0.3
Controlled By:  ReplicaSet/airflow-pgbouncer-5cbcc9dfc7
Containers:
  pgbouncer:
    Container ID:  docker://8036a1803897e2f59fef88490ab7563209b06fa3f05f4c532779555ab50982d5
    Image:         ghcr.io/airflow-helm/pgbouncer:1.17.0-patch.0
    Image ID:      docker-pullable://ghcr.io/airflow-helm/pgbouncer@sha256:e95a323f99e31fc9a8cd5d98b14c6dc2f2edda6aea13f4f548ffc48537a5f25e
    Port:          6432/TCP
    Host Port:     0/TCP
    Command:
      /usr/bin/dumb-init
      --rewrite=15:2
      --
    Args:
      /bin/sh
      -c
      /home/pgbouncer/config/gen_auth_file.sh && \
      exec pgbouncer /home/pgbouncer/config/pgbouncer.ini
    State:          Running
      Started:      Thu, 14 Apr 2022 10:16:16 -0400
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 14 Apr 2022 10:12:46 -0400
      Finished:     Thu, 14 Apr 2022 10:16:16 -0400
    Ready:          True
    Restart Count:  2
    Liveness:       exec [/bin/sh -c psql $(eval $DATABASE_PSQL_CMD) --tuples-only --command="SELECT 1;" | grep -q "1"] delay=5s timeout=60s period=30s #success=1 #failure=3
    Startup:        tcp-socket :6432 delay=5s timeout=15s period=10s #success=1 #failure=30
    Environment Variables from:
      airflow-config-envs  Secret  Optional: false
    Environment:
      DATABASE_USER:               postgres
      DATABASE_PASSWORD:           <set to the key 'postgresql-password' in secret 'airflow-postgresql'>  Optional: false
      REDIS_PASSWORD:              <set to the key 'redis-password' in secret 'airflow-redis'>            Optional: false
      CONNECTION_CHECK_MAX_COUNT:  0
    Mounts:
      /home/pgbouncer/certs from pgbouncer-certs (ro)
      /home/pgbouncer/config from pgbouncer-config (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9v5tg (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  pgbouncer-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  airflow-pgbouncer
    Optional:    false
  pgbouncer-certs:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          airflow-pgbouncer-certs
    SecretOptionalName:  <nil>
  kube-api-access-9v5tg:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  9m42s                  default-scheduler  Successfully assigned airflow/airflow-pgbouncer-5cbcc9dfc7-ttpzd to airflow
  Normal   Pulling    9m40s                  kubelet            Pulling image "ghcr.io/airflow-helm/pgbouncer:1.17.0-patch.0"
  Normal   Pulled     9m36s                  kubelet            Successfully pulled image "ghcr.io/airflow-helm/pgbouncer:1.17.0-patch.0" in 3.841402294s
  Normal   Created    2m42s (x3 over 9m36s)  kubelet            Created container pgbouncer
  Normal   Started    2m42s (x3 over 9m36s)  kubelet            Started container pgbouncer
  Normal   Killing    2m42s (x2 over 6m12s)  kubelet            Container pgbouncer failed liveness probe, will be restarted
  Normal   Pulled     2m42s (x2 over 6m12s)  kubelet            Container image "ghcr.io/airflow-helm/pgbouncer:1.17.0-patch.0" already present on machine
  Warning  Unhealthy  12s (x8 over 8m12s)    kubelet            Liveness probe failed: command "/bin/sh -c psql $(eval $DATABASE_PSQL_CMD) --tuples-only --command=\"SELECT 1;\" | grep -q \"1\"" timed out

@thesuperzapper
Copy link
Member

thesuperzapper commented Apr 16, 2022

@razilevin I just tested this myself, and I CAN reproduce this with minikube 1.25.2, but it seems like it's caused by a very long-stanging issue with minkube kubernetes/minikube#1568 (which is that a Pod cant access its own service unless --cni=true is set).

We can fix this for chart version 8.6.1, by making the liveness probe use localhost as you suggest (or probably 127.0.0.1 is safer), so everything works out-of-the-box for minikube.

Either way, I recommend using something like k3d (what I currently use) or kind, they are both much easier to use than minikube, and don't have this problem.

@thesuperzapper thesuperzapper added this to Unsorted in Issue Triage and PR Tracking via automation Apr 16, 2022
@thesuperzapper thesuperzapper added this to the airflow-8.6.1 milestone Apr 16, 2022
@thesuperzapper thesuperzapper moved this from Unsorted to Triage | Work Started in Issue Triage and PR Tracking Apr 16, 2022
@thesuperzapper
Copy link
Member

@razilevin I have raised PR #560 to fix this issue, which will be part of 8.6.1 (I am going to wait a while to see if any other issues come up before cutting)

Issue Triage and PR Tracking automation moved this from Triage | Work Started to Done Jun 22, 2022
@thesuperzapper
Copy link
Member

@razilevin the long-awaited 8.6.1 is out now, and includes the fix for minikube!

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind - things not working properly
Development

Successfully merging a pull request may close this issue.

2 participants