Skip to content

[Bug]: When the master node disk is insufficient, the disk space is increased, the master node pod is deleted, and the master node pod is lost. #7505

@wupengxiang23

Description

@wupengxiang23

Is there an existing issue already for this bug?

  • I have searched for an existing issue, and could not find anything. I believe this is a new bug.

I have read the troubleshooting guide

  • I have read the troubleshooting guide and I think this is a new bug.

I am running a supported version of CloudNativePG

  • I have read the troubleshooting guide and I think this is a new bug.

Contact Details

wupengxiang14@gmail.com

Version

1.25 (latest patch)

What version of Kubernetes are you using?

1.29

What is your Kubernetes environment?

Self-managed: k3s

How did you install the operator?

Helm

What happened?

I set up a cluster and found that the disk was insufficient after running for a while. After cleaning the disk, I deleted the Primary instance pod and found that the Primary instance pod was not recreated.

After kubectl get pod, I found that there is no cloud-ssd-pg-1 pod

cloud-ssd-pg-2                                        0/1     Running                      2318 (122m ago)
cloud-ssd-pg-3                                        0/1     Running                      2323 (73m ago)

The status of the cluster is as follows:

Cluster Summary
Name                cloud-ssd-pg
PostgreSQL Image:    ghcr.io/cloudnative-pg/postgresql:16.8
Primary instance:    cloud-ssd-pg-1
Primary start time:  2025-04-21 16:04:57 +0800 CST (uptime 388h20m42s)
Status:              Not enough disk space Insufficient disk space detected in one or more pods is preventing PostgreSQL from running.Please verify your storage settings. Further information inside .status.instancesReportedState
Instances:           3
Ready instances:     0
Size:                container not found

Continuous Backup status
Not configured

Physical backups
Primary instance not found

Streaming Replication status
Primary instance not found

Instances status
Name            Current LSN  Replication role  Status  QoS            Manager Version  Node
----            -----------  ----------------  ------  ---            ---------------  ----
cloud-ssd-pg-2  -            -                 -       InternalError  Burstable        -  178.10.4.220
cloud-ssd-pg-3  -            -                 -       InternalError  Burstable        -  178.10.4.162


Error(s) extracting status
-----------------------------------
failed to get status by proxying to the pod, you might lack permissions to get pods/proxy: an error on the server ("failed to connect to `user=postgres database=postgres`: /controller/run/.s.PGSQL.5432 (/controller/run): server error: FATAL: the database system is not yet accepting connections (SQLSTATE 57P03)") has prevented the request from succeeding (get pods https:cloud-ssd-pg-2:8000)
failed to get status by proxying to the pod, you might lack permissions to get pods/proxy: an error on the server ("failed to connect to `user=postgres database=postgres`: /controller/run/.s.PGSQL.5432 (/controller/run): server error: FATAL: the database system is starting up (SQLSTATE 57P03)") has prevented the request from succeeding (get pods https:cloud-ssd-pg-3:8000)

Why can't I recreate the master pod after deleting it? What should I do? Please help me. Thanks.

Cluster resource

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cloud-ssd-pg
spec:
  description: "cloud-ssd-pg cluster"
  imageName:  ghcr.io/cloudnative-pg/postgresql:16.8
  instances: 3
  startDelay: 300
  stopDelay: 300
  failoverDelay: 10    
  switchoverDelay: 30 
  primaryUpdateStrategy: unsupervised

  postgresql:
    parameters:
      max_parallel_workers: "16" 
      max_parallel_maintenance_workers: "4"
      max_parallel_workers_per_gather: "4"
      max_connections: "500"
      shared_buffers: "4GB"  
      work_mem: "512MB"
      maintenance_work_mem: "5GB" 
      max_worker_processes: "19"
      wal_buffers: "128MB"
      checkpoint_completion_target: "0.9"
      synchronous_commit: "on"
      random_page_cost: "1.1"
      effective_cache_size: "8GB"
      pg_stat_statements.max: "10000"
      pg_stat_statements.track: "all"
      auto_explain.log_min_duration: "10s"
      hnsw.iterative_scan: "strict_order"
      hnsw.ef_search: "400"
      autovacuum: "off"
      ssl_min_protocol_version: "TLSv1.2"
      ssl_max_protocol_version: "TLSv1.3"
      max_wal_size: "80GB"
      min_wal_size: "20GB"
      wal_keep_size: "60GB"
      checkpoint_timeout: "30min"
      idle_session_timeout: "10min"
      idle_in_transaction_session_timeout: "5min"
      reserved_connections: "10"
      superuser_reserved_connections: "5"
      tcp_keepalives_idle: "60"
      tcp_keepalives_interval: "60"
      tcp_keepalives_count: "5"
      tcp_user_timeout: "30000"
      client_connection_check_interval: "10000"
      effective_io_concurrency: "2" 
  bootstrap:
    initdb:
      database: app
      owner: app
      secret:
        name: cloud-ssd-pg-user-app
  enableSuperuserAccess: true
  superuserSecret:
    name: cloud-ssd-pg-superuser
  storage:
    storageClass: local-storage
    size: 500Gi


  resources:
    requests:
      memory: "8Gi"
      cpu: "4"
    limits:
      memory: "16Gi"
      cpu: "8"

  affinity:
    enablePodAntiAffinity: true
    topologyKey: kubernetes.io/hostname
    podAntiAffinityType: required
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: cloud-ssd-pg-app
                operator: In
                values:
                  - cluster1
                  - cluster2
                  - cluster3
      preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: cloud-ssd-pg-app
                operator: In
                values: ["cluster1"]
          - weight: 80
            preference:
              matchExpressions:
              - key: cloud-ssd-pg-app
                operator: In
                values: ["cluster2"]
          - weight: 10
            preference:
              matchExpressions:
              - key: cloud-ssd-pg-app
                operator: In
                values: ["cluster3"]

  monitoring:
    enablePodMonitor: true

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions