Skip to content

postgres active/standby deployment - standby unhealthy #2014

@zposloncec

Description

@zposloncec

Which example are you working with?
Postgres active/standby deployment on the same Kubernetes cluster

What is the current behavior?
Standby cluster is in Unhealthy state

What is the expected behavior?
Running both clusters with full set of replicas

Other information (e.g. detailed explanation, related issues, etc)

Commands used

pgo create cluster cluster-dc1 --node-label=DC=us-east-1a \
  --pgbouncer --replica-count=2 \
  --pgbackrest-storage-type=local,s3 \
  --pgbackrest-s3-key=... \
  --pgbackrest-s3-key-secret=... \
  --pgbackrest-s3-bucket=... \
  --pgbackrest-s3-endpoint=s3.us-east-1.amazonaws.com \
  --pgbackrest-s3-region=us-east-1 \
  --password-superuser=... \
  --password-replication=... \
  --password=...


pgo create cluster cluster-dc2 --standby --node-label=DC=us-east-1b \
  --pgbouncer --replica-count=2 \
  --pgbackrest-storage-type=s3 \
  --pgbackrest-s3-key=... \
  --pgbackrest-s3-key-secret=... \
  --pgbackrest-s3-bucket=... \
  --pgbackrest-s3-endpoint=s3.us-east-1.amazonaws.com \
  --pgbackrest-s3-region=us-east-1 \
  --pgbackrest-repo-path=/backrestrepo/cluster-dc1-backrest-shared-repo \
  --secret-from=cluster-dc1

Please tell us about your environment:

  • Operating System:
    rancheros

  • Where is this running ( Local , Cloud Provider)
    AWS

  • Storage being used (NFS, Hostpath, Gluster, etc):
    AWS EBS, S3

  • PostgreSQL Version:
    12,4

  • Platform (Docker, Kubernetes, OpenShift):
    Kubernetes

  • Platform Version:
    RKE rancher 1.19.3

If possible please run the following on the kubernetes or OpenShift (oc) commands and provide the result:

Name:         cluster-dc2-5fb8c9565-cqskp
Namespace:    cluster-postgres
Priority:     0
Node:         .../...
Start Time:   Mon, 02 Nov 2020 14:28:07 +0100
Labels:       NodeLabelKey=DC
              NodeLabelValue=us-east-1b
              backrest-storage-type=s3
              crunchy-pgha-scope=cluster-dc2
              crunchy-postgres-exporter=false
              deployment-name=cluster-dc2
              name=cluster-dc2
              pg-cluster=cluster-dc2
              pg-cluster-id=f5c3396a-29d5-43f3-a8e8-8094598ab9a5
              pg-pod-anti-affinity=
              pgo-pg-database=true
              pgo-version=4.5.0
              pgouser=admin
              pod-template-hash=5fb8c9565
              role=master
              service-name=cluster-dc2
              vendor=crunchydata
              workflowid=23a013fa-855e-49a3-87ff-957518e6a5c9
Annotations:  status:
                {"conn_url":"postgres://10.42.7.10:5432/postgres","api_url":"http://10.42.7.10:8009/patroni","state":"running","role":"standby_leader","ve...
Status:       Running
IP:           10.42.7.10
IPs:
  IP:           10.42.7.10
Controlled By:  ReplicaSet/cluster-dc2-5fb8c9565
Containers:
  database:
    Container ID:   docker://3a26efc462f194cb1e7e83b2435f2164053f35ff810b4debf63e6db81a5f4967
    Image:          registry.developers.crunchydata.com/crunchydata/crunchy-postgres-ha:centos7-12.4-4.5.0
    Image ID:       docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-postgres-ha@sha256:8f8c3e0a385f5d5185ea5fff18f1b875665e1639fde90bd40501ea1e211aa44f
    Ports:          5432/TCP, 8009/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 02 Nov 2020 14:28:17 +0100
    Ready:          False
    Restart Count:  0
    Requests:
      memory:   8Gi
    Liveness:   exec [/opt/cpm/bin/health/pgha-liveness.sh] delay=30s timeout=10s period=15s #success=1 #failure=3
    Readiness:  exec [/opt/cpm/bin/health/pgha-readiness.sh] delay=15s timeout=1s period=10s #success=1 #failure=3
    Environment:
      PGHA_PG_PORT:                       5432
      PGHA_USER:                          postgres
      PGHA_INIT:                          <set to the key 'init' of config map 'cluster-dc2-pgha-config'>  Optional: false
      PATRONI_POSTGRESQL_DATA_DIR:        /pgdata/cluster-dc2
      PGBACKREST_REPO1_S3_BUCKET:         ...
      PGBACKREST_REPO1_S3_ENDPOINT:       s3.us-east-1.amazonaws.com
      PGBACKREST_REPO1_S3_REGION:         us-east-1
      PGBACKREST_REPO1_S3_KEY:            <set to the key 'aws-s3-key' in secret 'cluster-dc2-backrest-repo-config'>         Optional: false
      PGBACKREST_REPO1_S3_KEY_SECRET:     <set to the key 'aws-s3-key-secret' in secret 'cluster-dc2-backrest-repo-config'>  Optional: false
      PGBACKREST_REPO1_S3_CA_FILE:        /sshd/aws-s3-ca.crt
      PGBACKREST_REPO1_HOST_CMD:          /usr/local/bin/archive-push-s3.sh
      PGBACKREST_REPO1_S3_URI_STYLE:      host
      PGHA_PGBACKREST_S3_VERIFY_TLS:      true
      PGBACKREST_STANZA:                  db
      PGBACKREST_REPO1_HOST:              cluster-dc2-backrest-shared-repo
      BACKREST_SKIP_CREATE_STANZA:        true
      PGHA_PGBACKREST:                    true
      PGBACKREST_REPO1_PATH:              /backrestrepo/cluster-dc1-backrest-shared-repo
      PGBACKREST_DB_PATH:                 /pgdata/cluster-dc2
      ENABLE_SSHD:                        true
      PGBACKREST_LOG_PATH:                /tmp
      PGBACKREST_PG1_SOCKET_PATH:         /tmp
      PGBACKREST_PG1_PORT:                5432
      PGBACKREST_REPO1_TYPE:              s3
      PGHA_PGBACKREST_LOCAL_S3_STORAGE:   false
      PGHA_DATABASE:                      cluster-db
      PGHA_CRUNCHYADM:                    true
      PGHA_REPLICA_REINIT_ON_START_FAIL:  true
      PGHA_SYNC_REPLICATION:              false
      PGHA_TLS_ENABLED:                   false
      PGHA_TLS_ONLY:                      false
      PGHA_STANDBY:                       true
      PATRONI_KUBERNETES_NAMESPACE:       cluster-postgres (v1:metadata.namespace)
      PATRONI_KUBERNETES_SCOPE_LABEL:     crunchy-pgha-scope
      PATRONI_SCOPE:                       (v1:metadata.labels['crunchy-pgha-scope'])
      PATRONI_KUBERNETES_LABELS:          {vendor: "crunchydata"}
      PATRONI_LOG_LEVEL:                  INFO
      PGHOST:                             /tmp
    Mounts:
      /crunchyadm from crunchyadm (rw)
      /dev/shm from dshm (rw)
      /etc/pgbackrest/conf.d from pgbackrest-config (rw)
      /etc/podinfo from podinfo (rw)
      /pgconf from pgconf-volume (rw)
      /pgconf/pgreplicator from primary-volume (rw)
      /pgconf/pgsuper from root-volume (rw)
      /pgconf/pguser from user-volume (rw)
      /pgdata from pgdata (rw)
      /recover from recover-volume (rw)
      /sshd from sshd (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from pgo-pg-token-vlxcs (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  pgdata:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  cluster-dc2
    ReadOnly:   false
  user-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-dc2-cluster-secret
    Optional:    false
  primary-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-dc2-primaryuser-secret
    Optional:    false
  sshd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-dc2-backrest-repo-config
    Optional:    false
  root-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-dc2-postgres-secret
    Optional:    false
  recover-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  report:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  crunchyadm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  pgbackrest-config:
    Type:                Projected (a volume that contains injected data from multiple sources)
    ConfigMapName:       cluster-dc2-config-backrest
    ConfigMapOptional:   0xc00011e47d
    SecretName:          cluster-dc2-config-backrest
    SecretOptionalName:  0xc00011e47e
  pgconf-volume:
    Type:               Projected (a volume that contains injected data from multiple sources)
    ConfigMapName:      cluster-dc2-pgha-config
    ConfigMapOptional:  0xc00011e4b8
  podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      limits.cpu -> cpu_limit
      requests.cpu -> cpu_request
      limits.memory -> mem_limit
      requests.memory -> mem_request
      metadata.labels -> labels
      metadata.annotations -> annotations
  pgo-pg-token-vlxcs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pgo-pg-token-vlxcs
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                    From                                    Message
  ----     ------                  ----                   ----                                    -------
  Normal   Scheduled               <unknown>                                                      Successfully assigned cluster-postgres/cluster-dc2-5fb8c9565-cqskp to ip-...
  Warning  FailedScheduling        <unknown>                                                      pod ed029658-f1bb-4fdd-af71-6d032965f868 is in the cache, so can't be assumed
  Normal   SuccessfulAttachVolume  24m                    attachdetach-controller                 AttachVolume.Attach succeeded for volume "pvc-bc9e35cc-18bd-4b00-b150-871837c58874"
  Normal   Pulled                  24m                    kubelet, ip-...                         Container image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-ha:centos7-12.4-4.5.0" already present on machine
  Normal   Created                 24m                    kubelet, ip-...                         Created container database
  Normal   Started                 24m                    kubelet, ip-...                         Started container database
  Warning  Unhealthy               4m34s (x120 over 24m)  kubelet, ip-...                         Readiness probe failed:
Mon Nov  2 13:28:17 UTC 2020 INFO: postgres-ha pre-bootstrap starting...
Mon Nov  2 13:28:17 UTC 2020 INFO: pgBackRest auto-config disabled
Mon Nov  2 13:28:17 UTC 2020 INFO: PGHA_PGBACKREST_LOCAL_S3_STORAGE and PGHA_PGBACKREST_INITIALIZE will be ignored if provided
Mon Nov  2 13:28:17 UTC 2020 INFO: Defaults have been set for the following postgres-ha auto-configuration env vars: PGHA_DEFAULT_CONFIG, PGHA_BASE_BOOTSTRAP_CONFIG, PGHA_BASE_PG_CONFIG
Mon Nov  2 13:28:17 UTC 2020 INFO: Defaults have been set for the following postgres-ha env vars: PGHA_PATRONI_PORT
Mon Nov  2 13:28:17 UTC 2020 INFO: Defaults have been set for the following Patroni env vars: PATRONI_NAME, PATRONI_RESTAPI_LISTEN, PATRONI_RESTAPI_CONNECT_ADDRESS, PATRONI_POSTGRESQL_LISTEN, PATRONI_POSTGRESQL_CONNECT_ADDRESS
Mon Nov  2 13:28:17 UTC 2020 INFO: Setting postgres-ha configuration for database user credentials
Mon Nov  2 13:28:17 UTC 2020 INFO: Setting 'pguser' credentials using file system
Mon Nov  2 13:28:17 UTC 2020 INFO: Setting 'superuser' credentials using file system
Mon Nov  2 13:28:17 UTC 2020 INFO: Setting 'replicator' credentials using file system
ls: cannot access /pgdata/cluster-dc2: No such file or directory
Mon Nov  2 13:28:17 UTC 2020 INFO: Applying base bootstrap config to postgres-ha configuration
Mon Nov  2 13:28:17 UTC 2020 INFO: Applying base postgres config to postgres-ha configuration
Mon Nov  2 13:28:17 UTC 2020 INFO: Applying pgbackrest config to postgres-ha configuration
Mon Nov  2 13:28:17 UTC 2020 INFO: PGDATA directory is empty on node identifed as Primary
Mon Nov  2 13:28:17 UTC 2020 INFO: initdb configuration will be applied to intitilize a new database
Mon Nov  2 13:28:17 UTC 2020 INFO: Applying configuration to bootstrap a standby cluster
Mon Nov  2 13:28:17 UTC 2020 INFO: Applying standard (non-TLS) remote connection configuration to pg_hba.conf
Mon Nov  2 13:28:17 UTC 2020 INFO: Custom postgres-ha configuration file not detected
Mon Nov  2 13:28:17 UTC 2020 INFO: Finished building postgres-ha configuration file '/tmp/postgres-ha-bootstrap.yaml'
Mon Nov  2 13:28:17 UTC 2020 INFO: postgres-ha pre-bootstrap complete!  The following configuration will be utilized to initialize
******************************
postgres-ha (PGHA) env vars:
******************************
PGHA_PGBACKREST_S3_VERIFY_TLS=true
PGHA_DEFAULT_CONFIG=true
PGHA_REPLICA_REINIT_ON_START_FAIL=true
PGHA_PGBACKREST_LOCAL_S3_STORAGE=false
PGHA_PGBACKREST=true
PGHA_TLS_ENABLED=false
PGHA_STANDBY=true
PGHA_PATRONI_PORT=8009
PGHA_TLS_ONLY=false
PGHA_USER=postgres
PGHA_PG_PORT=5432
PGHA_CRUNCHYADM=true
PGHA_BASE_BOOTSTRAP_CONFIG=true
PGHA_DATABASE=cluster-db
PGHA_BASE_PG_CONFIG=true
PGHA_SYNC_REPLICATION=false
PGHA_INIT=true
******************************
Patroni env vars:
******************************
PATRONI_LOG_LEVEL=INFO
PATRONI_KUBERNETES_SCOPE_LABEL=crunchy-pgha-scope
PATRONI_SCOPE=cluster-dc2
PATRONI_KUBERNETES_NAMESPACE=cluster-postgres
PATRONI_POSTGRESQL_DATA_DIR=/pgdata/cluster-dc2
PATRONI_POSTGRESQL_LISTEN=0.0.0.0:5432
PATRONI_RESTAPI_LISTEN=0.0.0.0:8009
PATRONI_KUBERNETES_LABELS={vendor: "crunchydata"}
PATRONI_POSTGRESQL_CONNECT_ADDRESS=10.42.7.10:5432
PATRONI_RESTAPI_CONNECT_ADDRESS=10.42.7.10:8009
PATRONI_NAME=cluster-dc2-5fb8c9565-cqskp
******************************
Patroni bootstrap method: initdb
******************************
Patroni configuration file:
******************************
bootstrap:
  method: initdb
  pgbackrest_init:
    command: '/opt/cpm/bin/pgbackrest/pgbackrest-create-replica.sh primary'
    keep_existing_recovery_conf: true
  existing_init:
    command: '/opt/cpm/bin/bootstrap/create-from-existing.sh'
    keep_existing_recovery_conf: true
  dcs:
    postgresql:
      parameters:
        unix_socket_directories: /tmp,/crunchyadm
        wal_level: logical
        archive_mode: on
        archive_command: 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh &&
          pgbackrest archive-push "%p"'
        archive_timeout: 60
        log_directory: pg_log
        shared_buffers: 128MB
        temp_buffers: 8MB
        log_min_duration_statement: 60000
        log_statement: none
        work_mem: 4MB
        max_wal_senders: 6
        shared_preload_libraries: pgaudit.so,pg_stat_statements.so,pgnodemx.so
      use_slots: false
      recovery_conf:
        restore_command: 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh &&
          pgbackrest archive-get %f "%p"'
      use_pg_rewind: true
    standby_cluster:
      create_replica_methods:
      - pgbackrest_standby
      restore_command: 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh && pgbackrest
        archive-get %f "%p"'
  post_bootstrap: /opt/cpm/bin/bootstrap/post-bootstrap.sh
  initdb:
  - encoding: UTF8
  - data-checksums
postgresql:
  use_unix_socket: true
  pgpass: /tmp/.pgpass
  create_replica_methods:
  - pgbackrest
  - basebackup
  pgbackrest:
    command: '/opt/cpm/bin/pgbackrest/pgbackrest-create-replica.sh replica'
    keep_data: true
    no_params: true
  pgbackrest_standby:
    command: '/opt/cpm/bin/pgbackrest/pgbackrest-create-replica.sh standby'
    keep_data: true
    no_params: true
    no_master: 1
  remove_data_directory_on_rewind_failure: true
  callbacks:
    on_role_change: /opt/cpm/bin/callbacks/pgha-on-role-change.sh
  pg_hba:
  - local all postgres peer
  - local all crunchyadm peer
  - host replication primaryuser 0.0.0.0/0 md5
  - host all primaryuser 0.0.0.0/0 reject
  - host all all 0.0.0.0/0 md5
Mon Nov  2 13:28:17 UTC 2020 INFO: Applying SSHD..
Mon Nov  2 13:28:17 UTC 2020 INFO: Checking for SSH Host Keys in /sshd..
Mon Nov  2 13:28:17 UTC 2020 INFO: Checking for authorized_keys in /sshd
Mon Nov  2 13:28:17 UTC 2020 INFO: Checking for sshd_config in /sshd
Mon Nov  2 13:28:17 UTC 2020 INFO: setting up .ssh directory
Mon Nov  2 13:28:17 UTC 2020 INFO: Starting SSHD..
Mon Nov  2 13:28:17 UTC 2020 INFO: Starting background process to monitor Patroni initization and restart the database if needed
Mon Nov  2 13:28:17 UTC 2020 INFO: Initializing cluster bootstrap with command: '/usr/local/bin/patroni /tmp/postgres-ha-bootstrap.yaml'
Mon Nov  2 13:28:17 UTC 2020 INFO: Running Patroni as PID 1
2020-11-02 13:28:17,670 INFO: No PostgreSQL configuration items changed, nothing to reload.
2020-11-02 13:28:17,674 INFO: Lock owner: None; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:28:17,791 INFO: trying to bootstrap a new standby leader
Mon Nov  2 13:28:17 UTC 2020 INFO: Empty PGDATA dir found for standby, a non-delta restore will be peformed
2020-11-02 13:28:28,174 INFO: Lock owner: None; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:28:28,174 INFO: not healthy enough for leader race
2020-11-02 13:28:28,233 INFO: bootstrap_standby_leader in progress
2020-11-02 13:28:38,174 INFO: Lock owner: None; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:28:38,174 INFO: not healthy enough for leader race
2020-11-02 13:28:38,174 INFO: bootstrap_standby_leader in progress
Mon Nov  2 13:28:44 UTC 2020 INFO: standby pgBackRest restore complete
2020-11-02 13:28:44,059 INFO: replica has been created using pgbackrest_standby
2020-11-02 13:28:44,060 INFO: bootstrapped clone from remote master None
2020-11-02 13:28:44,067 WARNING: Removing enum parameter=recovery_target from the config due to the invalid value=
2020-11-02 13:28:44.250 UTC [295] LOG:  pgaudit extension initialized
2020-11-02 13:28:44.251 UTC [295] LOG:  starting PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-11-02 13:28:44.251 UTC [295] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-11-02 13:28:44.256 UTC [295] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-11-02 13:28:44,259 INFO: postmaster pid=295
2020-11-02 13:28:44.261 UTC [295] LOG:  listening on Unix socket "/crunchyadm/.s.PGSQL.5432"
2020-11-02 13:28:44.275 UTC [295] LOG:  redirecting log output to logging collector process
2020-11-02 13:28:44.275 UTC [295] HINT:  Future log output will appear in directory "pg_log".
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - accepting connections
2020-11-02 13:28:45,303 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2020-11-02 13:28:45,307 INFO: Reaped pid=327, exit status=0
2020-11-02 13:28:45,405 INFO: establishing a new patroni connection to the postgres cluster
2020-11-02 13:28:45,532 INFO: initialized a new cluster
Mon Nov  2 13:28:45 UTC 2020 INFO: PGHA_INIT is 'true', waiting to initialize as primary
2020-11-02 13:28:55,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:28:55,913 INFO: no action.  i am the standby leader with the lock
2020-11-02 13:29:05,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:29:05,856 INFO: no action.  i am the standby leader with the lock
2020-11-02 13:29:15,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:29:15,857 INFO: no action.  i am the standby leader with the lock
2020-11-02 13:29:25,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:29:25,856 INFO: no action.  i am the standby leader with the lock
2020-11-02 13:29:35,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:29:35,857 INFO: no action.  i am the standby leader with the lock

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions