Which example are you working with?
Postgres active/standby deployment on the same Kubernetes cluster
What is the current behavior?
Standby cluster is in Unhealthy state
What is the expected behavior?
Running both clusters with full set of replicas
Other information (e.g. detailed explanation, related issues, etc)
Commands used
pgo create cluster cluster-dc1 --node-label=DC=us-east-1a \
--pgbouncer --replica-count=2 \
--pgbackrest-storage-type=local,s3 \
--pgbackrest-s3-key=... \
--pgbackrest-s3-key-secret=... \
--pgbackrest-s3-bucket=... \
--pgbackrest-s3-endpoint=s3.us-east-1.amazonaws.com \
--pgbackrest-s3-region=us-east-1 \
--password-superuser=... \
--password-replication=... \
--password=...
pgo create cluster cluster-dc2 --standby --node-label=DC=us-east-1b \
--pgbouncer --replica-count=2 \
--pgbackrest-storage-type=s3 \
--pgbackrest-s3-key=... \
--pgbackrest-s3-key-secret=... \
--pgbackrest-s3-bucket=... \
--pgbackrest-s3-endpoint=s3.us-east-1.amazonaws.com \
--pgbackrest-s3-region=us-east-1 \
--pgbackrest-repo-path=/backrestrepo/cluster-dc1-backrest-shared-repo \
--secret-from=cluster-dc1
Please tell us about your environment:
-
Operating System:
rancheros
-
Where is this running ( Local , Cloud Provider)
AWS
-
Storage being used (NFS, Hostpath, Gluster, etc):
AWS EBS, S3
-
PostgreSQL Version:
12,4
-
Platform (Docker, Kubernetes, OpenShift):
Kubernetes
-
Platform Version:
RKE rancher 1.19.3
If possible please run the following on the kubernetes or OpenShift (oc) commands and provide the result:
Name: cluster-dc2-5fb8c9565-cqskp
Namespace: cluster-postgres
Priority: 0
Node: .../...
Start Time: Mon, 02 Nov 2020 14:28:07 +0100
Labels: NodeLabelKey=DC
NodeLabelValue=us-east-1b
backrest-storage-type=s3
crunchy-pgha-scope=cluster-dc2
crunchy-postgres-exporter=false
deployment-name=cluster-dc2
name=cluster-dc2
pg-cluster=cluster-dc2
pg-cluster-id=f5c3396a-29d5-43f3-a8e8-8094598ab9a5
pg-pod-anti-affinity=
pgo-pg-database=true
pgo-version=4.5.0
pgouser=admin
pod-template-hash=5fb8c9565
role=master
service-name=cluster-dc2
vendor=crunchydata
workflowid=23a013fa-855e-49a3-87ff-957518e6a5c9
Annotations: status:
{"conn_url":"postgres://10.42.7.10:5432/postgres","api_url":"http://10.42.7.10:8009/patroni","state":"running","role":"standby_leader","ve...
Status: Running
IP: 10.42.7.10
IPs:
IP: 10.42.7.10
Controlled By: ReplicaSet/cluster-dc2-5fb8c9565
Containers:
database:
Container ID: docker://3a26efc462f194cb1e7e83b2435f2164053f35ff810b4debf63e6db81a5f4967
Image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-ha:centos7-12.4-4.5.0
Image ID: docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-postgres-ha@sha256:8f8c3e0a385f5d5185ea5fff18f1b875665e1639fde90bd40501ea1e211aa44f
Ports: 5432/TCP, 8009/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Mon, 02 Nov 2020 14:28:17 +0100
Ready: False
Restart Count: 0
Requests:
memory: 8Gi
Liveness: exec [/opt/cpm/bin/health/pgha-liveness.sh] delay=30s timeout=10s period=15s #success=1 #failure=3
Readiness: exec [/opt/cpm/bin/health/pgha-readiness.sh] delay=15s timeout=1s period=10s #success=1 #failure=3
Environment:
PGHA_PG_PORT: 5432
PGHA_USER: postgres
PGHA_INIT: <set to the key 'init' of config map 'cluster-dc2-pgha-config'> Optional: false
PATRONI_POSTGRESQL_DATA_DIR: /pgdata/cluster-dc2
PGBACKREST_REPO1_S3_BUCKET: ...
PGBACKREST_REPO1_S3_ENDPOINT: s3.us-east-1.amazonaws.com
PGBACKREST_REPO1_S3_REGION: us-east-1
PGBACKREST_REPO1_S3_KEY: <set to the key 'aws-s3-key' in secret 'cluster-dc2-backrest-repo-config'> Optional: false
PGBACKREST_REPO1_S3_KEY_SECRET: <set to the key 'aws-s3-key-secret' in secret 'cluster-dc2-backrest-repo-config'> Optional: false
PGBACKREST_REPO1_S3_CA_FILE: /sshd/aws-s3-ca.crt
PGBACKREST_REPO1_HOST_CMD: /usr/local/bin/archive-push-s3.sh
PGBACKREST_REPO1_S3_URI_STYLE: host
PGHA_PGBACKREST_S3_VERIFY_TLS: true
PGBACKREST_STANZA: db
PGBACKREST_REPO1_HOST: cluster-dc2-backrest-shared-repo
BACKREST_SKIP_CREATE_STANZA: true
PGHA_PGBACKREST: true
PGBACKREST_REPO1_PATH: /backrestrepo/cluster-dc1-backrest-shared-repo
PGBACKREST_DB_PATH: /pgdata/cluster-dc2
ENABLE_SSHD: true
PGBACKREST_LOG_PATH: /tmp
PGBACKREST_PG1_SOCKET_PATH: /tmp
PGBACKREST_PG1_PORT: 5432
PGBACKREST_REPO1_TYPE: s3
PGHA_PGBACKREST_LOCAL_S3_STORAGE: false
PGHA_DATABASE: cluster-db
PGHA_CRUNCHYADM: true
PGHA_REPLICA_REINIT_ON_START_FAIL: true
PGHA_SYNC_REPLICATION: false
PGHA_TLS_ENABLED: false
PGHA_TLS_ONLY: false
PGHA_STANDBY: true
PATRONI_KUBERNETES_NAMESPACE: cluster-postgres (v1:metadata.namespace)
PATRONI_KUBERNETES_SCOPE_LABEL: crunchy-pgha-scope
PATRONI_SCOPE: (v1:metadata.labels['crunchy-pgha-scope'])
PATRONI_KUBERNETES_LABELS: {vendor: "crunchydata"}
PATRONI_LOG_LEVEL: INFO
PGHOST: /tmp
Mounts:
/crunchyadm from crunchyadm (rw)
/dev/shm from dshm (rw)
/etc/pgbackrest/conf.d from pgbackrest-config (rw)
/etc/podinfo from podinfo (rw)
/pgconf from pgconf-volume (rw)
/pgconf/pgreplicator from primary-volume (rw)
/pgconf/pgsuper from root-volume (rw)
/pgconf/pguser from user-volume (rw)
/pgdata from pgdata (rw)
/recover from recover-volume (rw)
/sshd from sshd (ro)
/var/run/secrets/kubernetes.io/serviceaccount from pgo-pg-token-vlxcs (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
pgdata:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: cluster-dc2
ReadOnly: false
user-volume:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-dc2-cluster-secret
Optional: false
primary-volume:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-dc2-primaryuser-secret
Optional: false
sshd:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-dc2-backrest-repo-config
Optional: false
root-volume:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-dc2-postgres-secret
Optional: false
recover-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
report:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
crunchyadm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
pgbackrest-config:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: cluster-dc2-config-backrest
ConfigMapOptional: 0xc00011e47d
SecretName: cluster-dc2-config-backrest
SecretOptionalName: 0xc00011e47e
pgconf-volume:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: cluster-dc2-pgha-config
ConfigMapOptional: 0xc00011e4b8
podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
limits.cpu -> cpu_limit
requests.cpu -> cpu_request
limits.memory -> mem_limit
requests.memory -> mem_request
metadata.labels -> labels
metadata.annotations -> annotations
pgo-pg-token-vlxcs:
Type: Secret (a volume populated by a Secret)
SecretName: pgo-pg-token-vlxcs
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned cluster-postgres/cluster-dc2-5fb8c9565-cqskp to ip-...
Warning FailedScheduling <unknown> pod ed029658-f1bb-4fdd-af71-6d032965f868 is in the cache, so can't be assumed
Normal SuccessfulAttachVolume 24m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-bc9e35cc-18bd-4b00-b150-871837c58874"
Normal Pulled 24m kubelet, ip-... Container image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-ha:centos7-12.4-4.5.0" already present on machine
Normal Created 24m kubelet, ip-... Created container database
Normal Started 24m kubelet, ip-... Started container database
Warning Unhealthy 4m34s (x120 over 24m) kubelet, ip-... Readiness probe failed:
Mon Nov 2 13:28:17 UTC 2020 INFO: postgres-ha pre-bootstrap starting...
Mon Nov 2 13:28:17 UTC 2020 INFO: pgBackRest auto-config disabled
Mon Nov 2 13:28:17 UTC 2020 INFO: PGHA_PGBACKREST_LOCAL_S3_STORAGE and PGHA_PGBACKREST_INITIALIZE will be ignored if provided
Mon Nov 2 13:28:17 UTC 2020 INFO: Defaults have been set for the following postgres-ha auto-configuration env vars: PGHA_DEFAULT_CONFIG, PGHA_BASE_BOOTSTRAP_CONFIG, PGHA_BASE_PG_CONFIG
Mon Nov 2 13:28:17 UTC 2020 INFO: Defaults have been set for the following postgres-ha env vars: PGHA_PATRONI_PORT
Mon Nov 2 13:28:17 UTC 2020 INFO: Defaults have been set for the following Patroni env vars: PATRONI_NAME, PATRONI_RESTAPI_LISTEN, PATRONI_RESTAPI_CONNECT_ADDRESS, PATRONI_POSTGRESQL_LISTEN, PATRONI_POSTGRESQL_CONNECT_ADDRESS
Mon Nov 2 13:28:17 UTC 2020 INFO: Setting postgres-ha configuration for database user credentials
Mon Nov 2 13:28:17 UTC 2020 INFO: Setting 'pguser' credentials using file system
Mon Nov 2 13:28:17 UTC 2020 INFO: Setting 'superuser' credentials using file system
Mon Nov 2 13:28:17 UTC 2020 INFO: Setting 'replicator' credentials using file system
ls: cannot access /pgdata/cluster-dc2: No such file or directory
Mon Nov 2 13:28:17 UTC 2020 INFO: Applying base bootstrap config to postgres-ha configuration
Mon Nov 2 13:28:17 UTC 2020 INFO: Applying base postgres config to postgres-ha configuration
Mon Nov 2 13:28:17 UTC 2020 INFO: Applying pgbackrest config to postgres-ha configuration
Mon Nov 2 13:28:17 UTC 2020 INFO: PGDATA directory is empty on node identifed as Primary
Mon Nov 2 13:28:17 UTC 2020 INFO: initdb configuration will be applied to intitilize a new database
Mon Nov 2 13:28:17 UTC 2020 INFO: Applying configuration to bootstrap a standby cluster
Mon Nov 2 13:28:17 UTC 2020 INFO: Applying standard (non-TLS) remote connection configuration to pg_hba.conf
Mon Nov 2 13:28:17 UTC 2020 INFO: Custom postgres-ha configuration file not detected
Mon Nov 2 13:28:17 UTC 2020 INFO: Finished building postgres-ha configuration file '/tmp/postgres-ha-bootstrap.yaml'
Mon Nov 2 13:28:17 UTC 2020 INFO: postgres-ha pre-bootstrap complete! The following configuration will be utilized to initialize
******************************
postgres-ha (PGHA) env vars:
******************************
PGHA_PGBACKREST_S3_VERIFY_TLS=true
PGHA_DEFAULT_CONFIG=true
PGHA_REPLICA_REINIT_ON_START_FAIL=true
PGHA_PGBACKREST_LOCAL_S3_STORAGE=false
PGHA_PGBACKREST=true
PGHA_TLS_ENABLED=false
PGHA_STANDBY=true
PGHA_PATRONI_PORT=8009
PGHA_TLS_ONLY=false
PGHA_USER=postgres
PGHA_PG_PORT=5432
PGHA_CRUNCHYADM=true
PGHA_BASE_BOOTSTRAP_CONFIG=true
PGHA_DATABASE=cluster-db
PGHA_BASE_PG_CONFIG=true
PGHA_SYNC_REPLICATION=false
PGHA_INIT=true
******************************
Patroni env vars:
******************************
PATRONI_LOG_LEVEL=INFO
PATRONI_KUBERNETES_SCOPE_LABEL=crunchy-pgha-scope
PATRONI_SCOPE=cluster-dc2
PATRONI_KUBERNETES_NAMESPACE=cluster-postgres
PATRONI_POSTGRESQL_DATA_DIR=/pgdata/cluster-dc2
PATRONI_POSTGRESQL_LISTEN=0.0.0.0:5432
PATRONI_RESTAPI_LISTEN=0.0.0.0:8009
PATRONI_KUBERNETES_LABELS={vendor: "crunchydata"}
PATRONI_POSTGRESQL_CONNECT_ADDRESS=10.42.7.10:5432
PATRONI_RESTAPI_CONNECT_ADDRESS=10.42.7.10:8009
PATRONI_NAME=cluster-dc2-5fb8c9565-cqskp
******************************
Patroni bootstrap method: initdb
******************************
Patroni configuration file:
******************************
bootstrap:
method: initdb
pgbackrest_init:
command: '/opt/cpm/bin/pgbackrest/pgbackrest-create-replica.sh primary'
keep_existing_recovery_conf: true
existing_init:
command: '/opt/cpm/bin/bootstrap/create-from-existing.sh'
keep_existing_recovery_conf: true
dcs:
postgresql:
parameters:
unix_socket_directories: /tmp,/crunchyadm
wal_level: logical
archive_mode: on
archive_command: 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh &&
pgbackrest archive-push "%p"'
archive_timeout: 60
log_directory: pg_log
shared_buffers: 128MB
temp_buffers: 8MB
log_min_duration_statement: 60000
log_statement: none
work_mem: 4MB
max_wal_senders: 6
shared_preload_libraries: pgaudit.so,pg_stat_statements.so,pgnodemx.so
use_slots: false
recovery_conf:
restore_command: 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh &&
pgbackrest archive-get %f "%p"'
use_pg_rewind: true
standby_cluster:
create_replica_methods:
- pgbackrest_standby
restore_command: 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh && pgbackrest
archive-get %f "%p"'
post_bootstrap: /opt/cpm/bin/bootstrap/post-bootstrap.sh
initdb:
- encoding: UTF8
- data-checksums
postgresql:
use_unix_socket: true
pgpass: /tmp/.pgpass
create_replica_methods:
- pgbackrest
- basebackup
pgbackrest:
command: '/opt/cpm/bin/pgbackrest/pgbackrest-create-replica.sh replica'
keep_data: true
no_params: true
pgbackrest_standby:
command: '/opt/cpm/bin/pgbackrest/pgbackrest-create-replica.sh standby'
keep_data: true
no_params: true
no_master: 1
remove_data_directory_on_rewind_failure: true
callbacks:
on_role_change: /opt/cpm/bin/callbacks/pgha-on-role-change.sh
pg_hba:
- local all postgres peer
- local all crunchyadm peer
- host replication primaryuser 0.0.0.0/0 md5
- host all primaryuser 0.0.0.0/0 reject
- host all all 0.0.0.0/0 md5
Mon Nov 2 13:28:17 UTC 2020 INFO: Applying SSHD..
Mon Nov 2 13:28:17 UTC 2020 INFO: Checking for SSH Host Keys in /sshd..
Mon Nov 2 13:28:17 UTC 2020 INFO: Checking for authorized_keys in /sshd
Mon Nov 2 13:28:17 UTC 2020 INFO: Checking for sshd_config in /sshd
Mon Nov 2 13:28:17 UTC 2020 INFO: setting up .ssh directory
Mon Nov 2 13:28:17 UTC 2020 INFO: Starting SSHD..
Mon Nov 2 13:28:17 UTC 2020 INFO: Starting background process to monitor Patroni initization and restart the database if needed
Mon Nov 2 13:28:17 UTC 2020 INFO: Initializing cluster bootstrap with command: '/usr/local/bin/patroni /tmp/postgres-ha-bootstrap.yaml'
Mon Nov 2 13:28:17 UTC 2020 INFO: Running Patroni as PID 1
2020-11-02 13:28:17,670 INFO: No PostgreSQL configuration items changed, nothing to reload.
2020-11-02 13:28:17,674 INFO: Lock owner: None; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:28:17,791 INFO: trying to bootstrap a new standby leader
Mon Nov 2 13:28:17 UTC 2020 INFO: Empty PGDATA dir found for standby, a non-delta restore will be peformed
2020-11-02 13:28:28,174 INFO: Lock owner: None; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:28:28,174 INFO: not healthy enough for leader race
2020-11-02 13:28:28,233 INFO: bootstrap_standby_leader in progress
2020-11-02 13:28:38,174 INFO: Lock owner: None; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:28:38,174 INFO: not healthy enough for leader race
2020-11-02 13:28:38,174 INFO: bootstrap_standby_leader in progress
Mon Nov 2 13:28:44 UTC 2020 INFO: standby pgBackRest restore complete
2020-11-02 13:28:44,059 INFO: replica has been created using pgbackrest_standby
2020-11-02 13:28:44,060 INFO: bootstrapped clone from remote master None
2020-11-02 13:28:44,067 WARNING: Removing enum parameter=recovery_target from the config due to the invalid value=
2020-11-02 13:28:44.250 UTC [295] LOG: pgaudit extension initialized
2020-11-02 13:28:44.251 UTC [295] LOG: starting PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-11-02 13:28:44.251 UTC [295] LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-11-02 13:28:44.256 UTC [295] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-11-02 13:28:44,259 INFO: postmaster pid=295
2020-11-02 13:28:44.261 UTC [295] LOG: listening on Unix socket "/crunchyadm/.s.PGSQL.5432"
2020-11-02 13:28:44.275 UTC [295] LOG: redirecting log output to logging collector process
2020-11-02 13:28:44.275 UTC [295] HINT: Future log output will appear in directory "pg_log".
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - accepting connections
2020-11-02 13:28:45,303 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2020-11-02 13:28:45,307 INFO: Reaped pid=327, exit status=0
2020-11-02 13:28:45,405 INFO: establishing a new patroni connection to the postgres cluster
2020-11-02 13:28:45,532 INFO: initialized a new cluster
Mon Nov 2 13:28:45 UTC 2020 INFO: PGHA_INIT is 'true', waiting to initialize as primary
2020-11-02 13:28:55,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:28:55,913 INFO: no action. i am the standby leader with the lock
2020-11-02 13:29:05,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:29:05,856 INFO: no action. i am the standby leader with the lock
2020-11-02 13:29:15,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:29:15,857 INFO: no action. i am the standby leader with the lock
2020-11-02 13:29:25,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:29:25,856 INFO: no action. i am the standby leader with the lock
2020-11-02 13:29:35,803 INFO: Lock owner: cluster-dc2-5fb8c9565-cqskp; I am cluster-dc2-5fb8c9565-cqskp
2020-11-02 13:29:35,857 INFO: no action. i am the standby leader with the lock
Which example are you working with?
Postgres active/standby deployment on the same Kubernetes cluster
What is the current behavior?
Standby cluster is in Unhealthy state
What is the expected behavior?
Running both clusters with full set of replicas
Other information (e.g. detailed explanation, related issues, etc)
Commands used
Please tell us about your environment:
Operating System:
rancheros
Where is this running ( Local , Cloud Provider)
AWS
Storage being used (NFS, Hostpath, Gluster, etc):
AWS EBS, S3
PostgreSQL Version:
12,4
Platform (Docker, Kubernetes, OpenShift):
Kubernetes
Platform Version:
RKE rancher 1.19.3
If possible please run the following on the kubernetes or OpenShift (oc) commands and provide the result: