Is there an existing issue already for this bug?
I have read the troubleshooting guide
I am running a supported version of CloudNativePG
Contact Details
wupengxiang14@gmail.com
Version
1.25 (latest patch)
What version of Kubernetes are you using?
1.29
What is your Kubernetes environment?
Self-managed: k3s
How did you install the operator?
Helm
What happened?
I set up a cluster and found that the disk was insufficient after running for a while. After cleaning the disk, I deleted the Primary instance pod and found that the Primary instance pod was not recreated.
After kubectl get pod, I found that there is no cloud-ssd-pg-1 pod
cloud-ssd-pg-2 0/1 Running 2318 (122m ago)
cloud-ssd-pg-3 0/1 Running 2323 (73m ago)
The status of the cluster is as follows:
Cluster Summary
Name cloud-ssd-pg
PostgreSQL Image: ghcr.io/cloudnative-pg/postgresql:16.8
Primary instance: cloud-ssd-pg-1
Primary start time: 2025-04-21 16:04:57 +0800 CST (uptime 388h20m42s)
Status: Not enough disk space Insufficient disk space detected in one or more pods is preventing PostgreSQL from running.Please verify your storage settings. Further information inside .status.instancesReportedState
Instances: 3
Ready instances: 0
Size: container not found
Continuous Backup status
Not configured
Physical backups
Primary instance not found
Streaming Replication status
Primary instance not found
Instances status
Name Current LSN Replication role Status QoS Manager Version Node
---- ----------- ---------------- ------ --- --------------- ----
cloud-ssd-pg-2 - - - InternalError Burstable - 178.10.4.220
cloud-ssd-pg-3 - - - InternalError Burstable - 178.10.4.162
Error(s) extracting status
-----------------------------------
failed to get status by proxying to the pod, you might lack permissions to get pods/proxy: an error on the server ("failed to connect to `user=postgres database=postgres`: /controller/run/.s.PGSQL.5432 (/controller/run): server error: FATAL: the database system is not yet accepting connections (SQLSTATE 57P03)") has prevented the request from succeeding (get pods https:cloud-ssd-pg-2:8000)
failed to get status by proxying to the pod, you might lack permissions to get pods/proxy: an error on the server ("failed to connect to `user=postgres database=postgres`: /controller/run/.s.PGSQL.5432 (/controller/run): server error: FATAL: the database system is starting up (SQLSTATE 57P03)") has prevented the request from succeeding (get pods https:cloud-ssd-pg-3:8000)
Why can't I recreate the master pod after deleting it? What should I do? Please help me. Thanks.
Cluster resource
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: cloud-ssd-pg
spec:
description: "cloud-ssd-pg cluster"
imageName: ghcr.io/cloudnative-pg/postgresql:16.8
instances: 3
startDelay: 300
stopDelay: 300
failoverDelay: 10
switchoverDelay: 30
primaryUpdateStrategy: unsupervised
postgresql:
parameters:
max_parallel_workers: "16"
max_parallel_maintenance_workers: "4"
max_parallel_workers_per_gather: "4"
max_connections: "500"
shared_buffers: "4GB"
work_mem: "512MB"
maintenance_work_mem: "5GB"
max_worker_processes: "19"
wal_buffers: "128MB"
checkpoint_completion_target: "0.9"
synchronous_commit: "on"
random_page_cost: "1.1"
effective_cache_size: "8GB"
pg_stat_statements.max: "10000"
pg_stat_statements.track: "all"
auto_explain.log_min_duration: "10s"
hnsw.iterative_scan: "strict_order"
hnsw.ef_search: "400"
autovacuum: "off"
ssl_min_protocol_version: "TLSv1.2"
ssl_max_protocol_version: "TLSv1.3"
max_wal_size: "80GB"
min_wal_size: "20GB"
wal_keep_size: "60GB"
checkpoint_timeout: "30min"
idle_session_timeout: "10min"
idle_in_transaction_session_timeout: "5min"
reserved_connections: "10"
superuser_reserved_connections: "5"
tcp_keepalives_idle: "60"
tcp_keepalives_interval: "60"
tcp_keepalives_count: "5"
tcp_user_timeout: "30000"
client_connection_check_interval: "10000"
effective_io_concurrency: "2"
bootstrap:
initdb:
database: app
owner: app
secret:
name: cloud-ssd-pg-user-app
enableSuperuserAccess: true
superuserSecret:
name: cloud-ssd-pg-superuser
storage:
storageClass: local-storage
size: 500Gi
resources:
requests:
memory: "8Gi"
cpu: "4"
limits:
memory: "16Gi"
cpu: "8"
affinity:
enablePodAntiAffinity: true
topologyKey: kubernetes.io/hostname
podAntiAffinityType: required
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud-ssd-pg-app
operator: In
values:
- cluster1
- cluster2
- cluster3
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: cloud-ssd-pg-app
operator: In
values: ["cluster1"]
- weight: 80
preference:
matchExpressions:
- key: cloud-ssd-pg-app
operator: In
values: ["cluster2"]
- weight: 10
preference:
matchExpressions:
- key: cloud-ssd-pg-app
operator: In
values: ["cluster3"]
monitoring:
enablePodMonitor: true
Relevant log output
Code of Conduct
Is there an existing issue already for this bug?
I have read the troubleshooting guide
I am running a supported version of CloudNativePG
Contact Details
wupengxiang14@gmail.com
Version
1.25 (latest patch)
What version of Kubernetes are you using?
1.29
What is your Kubernetes environment?
Self-managed: k3s
How did you install the operator?
Helm
What happened?
I set up a cluster and found that the disk was insufficient after running for a while. After cleaning the disk, I deleted the Primary instance pod and found that the Primary instance pod was not recreated.
After kubectl get pod, I found that there is no cloud-ssd-pg-1 pod
The status of the cluster is as follows:
Why can't I recreate the master pod after deleting it? What should I do? Please help me. Thanks.
Cluster resource
Relevant log output
Code of Conduct