Primary & Replicas continuously restarting #3626

vaigau6g · 2023-04-09T13:29:23Z

Please ensure you do the following when reporting a bug:

Provide a concise description of what the bug is.
Primary & Replicas continuously restarting in postgresql operator on Openshift 4.10
Provide information about your environment.
Openshift - 4.10
Openshift Data Foundation - odf-operator.v4.10.10
Crunchydata postgresql Operator - 5.3.0
pgo-version: 5.3.0
postgresVersion: 14
Provide clear steps to reproduce the bug.

On Openshift 4.10 with ODF 4.10.10 installed with local storage cluster configured, install Crunchydata postgresql Operator - 5.3.0
Create cluster with replicas
Start performance testing / or add lots of transactions
Continuously restarting primary and replicas

Attach applicable logs. Please do not attach screenshots showing logs unless you are unable to copy and paste the log data.
mdsp-psql-lpc-primary-8csb-0-database.log

Overview

Add a concise description of what the bug is.

Environment

Please provide the following details:

Platform: (Kubernetes, OpenShift, Rancher, GKE, EKS, AKS etc.)
Platform Version: (e.g. 1.20.3, 4.7.0)
PGO Image Tag: (e.g. ubi8-5.3.0-0)
Postgres Version (e.g. 14)
Storage: (e.g. hostpath, nfs, or the name of your storage class)

Steps to Reproduce

REPRO

Provide steps to get to the error condition:

Run ...
Do ...
Try ...

EXPECTED

Provide the behavior that you expected.

ACTUAL

Describe what actually happens

Logs

Please provided appropriate log output or any configuration files that may help troubleshoot the issue. DO NOT include sensitive information, such as passwords.

Additional Information

Please provide any additional information that may be helpful.

The text was updated successfully, but these errors were encountered:

vaigau6g · 2023-04-09T14:36:24Z

Exception in primary/replicas pods =======

2023-04-09 14:34:01,752 WARNING: Exception happened during processing of request from ::ffff:10.128.12.1:48184
2023-04-09 14:34:01,754 WARNING: Traceback (most recent call last):
File "/usr/lib64/python3.6/socketserver.py", line 320, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib64/python3.6/socketserver.py", line 669, in process_request
t.start()
File "/usr/lib64/python3.6/threading.py", line 867, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
2023-04-09 14:34:01,755 WARNING: Exception happened during processing of request from ::ffff:10.128.12.1:48182
2023-04-09 14:34:01,755 WARNING: Traceback (most recent call last):
File "/usr/lib64/python3.6/socketserver.py", line 320, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib64/python3.6/socketserver.py", line 669, in process_request
t.start()
File "/usr/lib64/python3.6/threading.py", line 867, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
2023-04-09 14:34:10,034 INFO: no action. I am (mdsp-psql-lpc-primary-8csb-0), the leader with the lock
2023-04-09 14:34:20,031 INFO: no action. I am (mdsp-psql-lpc-primary-8csb-0), the leader with the lock
2023-04-09 14:34:21,753 WARNING: Exception happened during processing of request from ::ffff:10.128.12.1:59482
2023-04-09 14:34:21,754 WARNING: Traceback (most recent call last):
File "/usr/lib64/python3.6/socketserver.py", line 320, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib64/python3.6/socketserver.py", line 669, in process_request
t.start()
File "/usr/lib64/python3.6/threading.py", line 867, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

vaigau6g · 2023-04-10T03:52:28Z

pg_hba configuration ============================

patronictl show-config
loop_wait: 10
postgresql:
parameters:
archive_command: pgbackrest --stanza=db archive-push "%p"
archive_mode: 'on'
archive_timeout: 60s
default_pool_size: '100'
jit: 'off'
max_connections: '2500'
password_encryption: scram-sha-256
restore_command: pgbackrest --stanza=db archive-get %f "%p"
shared_buffers: 4GB
shared_preload_libraries: pgaudit
ssl: 'on'
ssl_ca_file: /pgconf/tls/ca.crt
ssl_cert_file: /pgconf/tls/tls.crt
ssl_key_file: /pgconf/tls/tls.key
synchronous_commit: 'on'
unix_socket_directories: /tmp/postgres
wal_level: logical
pg_hba:

local all "postgres" peer
hostssl replication "_crunchyrepl" all cert
hostssl "postgres" "_crunchyrepl" all cert
host all "_crunchyrepl" all reject
hostssl all "_crunchypgbouncer" all scram-sha-256
host all "_crunchypgbouncer" all reject
local all all trust
host all all all md5
hostssl all all all md5
use_pg_rewind: true
use_slots: false
synchronous_commit: 'on'
synchronous_mode: true
ttl: 30

vaigau6g · 2023-04-11T09:07:02Z

issue is regarding application requests in psql
getting below error on psql pod and restarting continuously

Exception happened during processing of request from ::ffff:10.128.10.1:43576
1192023-04-11 06:58:03,453 WARNING: Traceback (most recent call last):
120File "/usr/lib64/python3.6/socketserver.py", line 320, in _handle_request_noblock
121self.process_request(request, client_address)
122File "/usr/lib64/python3.6/socketserver.py", line 669, in process_request
123t.start()
124File "/usr/lib64/python3.6/threading.py", line 867, in start
125_start_new_thread(self._bootstrap, ())
126RuntimeError: can't start new thread

vaigau6g · 2023-04-13T04:17:06Z

postgres user credentials working with connection string

vaigau6g · 2023-04-13T09:27:33Z

Is it due to restoring the older version database to the newer version pgo cluster ?
Previous version before upgrade
pgo version = 4.7.3
postgresql version = 13

andrewlecuyer · 2024-06-13T17:32:12Z

Considering this issue is for an older version of CPK that is no longer actively maintained via the Crunchy Developer Program, I am proceeding with closing (see the Supported Plaforms page for additional information about supported versions of CPK).

For information about upgrading from CPK v4 to v5, please see the upgrade guide:

https://access.crunchydata.com/documentation/postgres-operator/latest/upgrade/v4tov5

And if you still require support for CPK v4.7.3, I recommend recaching out to info@crunchydata.com to discuss your requirements/needs further.

dsessler7 added the triaged label Apr 11, 2023

andrewlecuyer closed this as completed Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primary & Replicas continuously restarting #3626

Primary & Replicas continuously restarting #3626

vaigau6g commented Apr 9, 2023

vaigau6g commented Apr 9, 2023

vaigau6g commented Apr 10, 2023

vaigau6g commented Apr 11, 2023

vaigau6g commented Apr 13, 2023 •

edited

Loading

vaigau6g commented Apr 13, 2023

andrewlecuyer commented Jun 13, 2024

Primary & Replicas continuously restarting #3626

Primary & Replicas continuously restarting #3626

Comments

vaigau6g commented Apr 9, 2023

Overview

Environment

Steps to Reproduce

REPRO

EXPECTED

ACTUAL

Logs

Additional Information

vaigau6g commented Apr 9, 2023

vaigau6g commented Apr 10, 2023

vaigau6g commented Apr 11, 2023

vaigau6g commented Apr 13, 2023 • edited Loading

vaigau6g commented Apr 13, 2023

andrewlecuyer commented Jun 13, 2024

vaigau6g commented Apr 13, 2023 •

edited

Loading