Description
Environment
- PostgreSQL version: PostgreSQL 14.1
- PostgREST version: 10.2.0
- Operating system: Ubuntu
Problem
A running instance with a db-pool = 60
reports the following problems.
1. The authenticator role surpases db-pool connections.
Here it can be seen it reaches 71 connections.
select usename as rolname, count(*) as total_connections, sum(case when state = 'active' then 1 else 0 end) as active_connections, sum(case when state = 'idle' then 1 else 0 end) as idle_connections from pg_stat_activity group by usename;
rolname | total_connections | active_connections | idle_connections
---------------------+-------------------+--------------------+------------------
authenticator | 71 | 0 | 71
At some other time it reports 130 connections and surpasses pg max_connections
:
01/Feb/2024:17:43:04 +0000: {"code":"PGRST000","details":"FATAL: remaining connection slots are reserved for non-replication superuser connections\n","hint":null,"message":"Database connection error. Retrying the connection."}
2. PostgREST replies with an empty error message.
$ curl localhost:3000/x -i
400 Bad Request
{ code: '', details: null, hint: null, message: '' }
A subsequent request succeeds since PostgREST recovers though.
I wasn't able to find the root cause of this. This instance in particular just went through a migration from ipv4 to ipv6 only.
However it's clear that the hasql pool is not recycling the pool connections. An empty error message shouldn't happen too.
Proposal
We need to increase the pool log traces. Log whenever the hasql pool reaper works (ref). This way at least we can find out if the pool is misbehaving. Currently it's not possible to know if the reaper fails.
Workarounds
A connection limit can be set on the authenticator role so the db-pool
max is guaranteed.
alter role authenticator connection limit <db-pool + 1 for listen channel>
An idle timeout can be set in case the pool doesn't recycle idle connections.
alter role authenticator set idle_session_timeout to 1800000;