fix: Limit concurrent schema cache loads by mkleczek · Pull Request #4643 · PostgREST/postgrest

mkleczek · 2026-02-10T20:18:16Z

DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.

Triggering schema cache reload immediately upon receival of notification by the listener leads to thundering herd problem in PostgREST cluster.

This change adds limiting of number of concurrent schema cache loading queries using advisory locks.

Fixes #4642

steve-chavez · 2026-02-11T20:25:11Z

src/PostgREST/AppState.hs

+        -- Allow 10 concurrent schema cache loads, guarded by advisory locks.
+        -- This is to prevent thundering herd problem on startup or when many PostgREST instances receive "reload schema" notifications at the same time
+        lockId <- getRandomR (50168275::Int64, 50168275 + 10)
+        let stmt = SQL.Statement "SELECT pg_catalog.pg_advisory_xact_lock($1)" (HE.param $ HE.nonNullable HE.int8) HD.noResult configDbPreparedStatements


These locks would be released automatically at the end of the transaction right? It does look like it would work for #4642.

I guess one drawback is that these advisory locks would run and leave a log trace even if the user will never run into #4642, which are most cases.

WDYT of the solution on #4642 (comment)? Would that be preferable?

I guess one drawback

Also that it's a bit more operational overhead, we would also have to recommend setting lock_timeout in addition to statement_timeout to avoid waiting for too long? (like on a schema cache load that takes too long due to pg_catalog bloat)

I guess one drawback

Also that it's a bit more operational overhead, we would also have to recommend setting lock_timeout in addition to statement_timeout to avoid waiting for too long? (like on a schema cache load that takes too long due to pg_catalog bloat)

Is it really an issue? If we get lock timeout we are going to retry anyway.

These locks would be released automatically at the end of the transaction right? It does look like it would work for #4642.

Yeah, they are tx scoped.

I guess one drawback is that these advisory locks would run and leave a log trace even if the user will never run into #4642, which are most cases.

Maybe we should introduce a config property that activates it then?

WDYT of the solution on #4642 (comment)? Would that be preferable?

See #4642 (comment)

I think for now mitigation of thundering herd problem is way more feasible. At least in the short to medium term.

I think for now mitigation of thundering herd problem is way more feasible. At least in the short to medium term.
Maybe we should introduce a config property that activates it then?

Yes, agree... a config sounds good. Should we parametrize the number of locks? Or should we just hardcode to 10 and expose a boolean config? (Also how do we know 10 is the right number?)

We would also need to test this, seems doable to ensure only 10 connections can exist at a time when say 20 postgREST instances of db-pool=1 + with a PGRST_INTERNAL_SCHEMA_CACHE_QUERY_SLEEP are spawned.

Agree, I think 1 is a good number. Given that I think we should avoid a config and just set it.

Hmm... but why? What harm is in allowing some level of concurrency? Especially that right now there is no limit at all?

@steve-chavez @wolfgangwalther
I think we can actually do better.

How about we adjust the number of locks based on the (estimated) number of nodes connected to the same database?
There are two issues to solve:

How do we estimate the number of cluster nodes?

What should be the algorithm to calculate the number of locks?

Node number estimation

The idea is to estimate that based on:

number of active db sessions opened by the same user as session_user

number of open connections in the pool

The estimate would be: active_sessions_number / connections_in_the_pool

This assumes the load is spread evenly among cluster members so all nodes should have the same number of open connections.

Number of locks calculation

We need a sublinear function and it seems to me logarithm is well fit. The number of locks would be round(log2(estimated_number_of_nodes))

That way we can allow concurrent schema loads while protecting from thundering herd issue in large cluster.

I've committed implementation of this idea for you to review. If you don't like it we can easily delete the commit. If you think it is OK, we can split into coherent pieces.
Added the test that verifies the level of concurrency for various cluster sizes and the results are as follows:

Nodes Locks

2 2

4 3

6 4

8 4

16 5

WDYT?

Hmm... but why? What harm is in allowing some level of concurrency? Especially that right now there is no limit at all?
If the limit is so low (ie. 1) I am strongly against forcing it on users without a way to opt-out.

Right, so if only one lock can be taken we would be forcing all scache loads to be sequential. If we consider the case of 100 instances (#4642) then all first 99 have to be loaded before the 100th can even begin right? And yes that doesn't look good for the last instance since it will prolong the time it will have a stale schema cache.

Node number estimation
Number of locks calculation

Seems complicated. One simpler idea that ocurrs to me:

Each postgREST instance has a corresponding LISTEN channel. So we know how many concurrent scache loads will happen.

We can also know the time the latest scache query took, since we have the metric

postgrest/src/PostgREST/Metrics.hs

Line 29 in 66fda76

schemaCacheQueryTime :: Gauge,

Perhaps we can sample the first scache query time (2) and combined with 1 calculate the right number of locks?

Hmm... but why? What harm is in allowing some level of concurrency? Especially that right now there is no limit at all?
If the limit is so low (ie. 1) I am strongly against forcing it on users without a way to opt-out.

Right, so if only one lock can be taken we would be forcing all scache loads to be sequential. If we consider the case of 100 instances (#4642) then all first 99 have to be loaded before the 100th can even begin right? And yes that doesn't look good for the last instance since it will prolong the time it will have a stale schema cache.

Node number estimation
Number of locks calculation

Seems complicated. One simpler idea that ocurrs to me:

Each postgREST instance has a corresponding LISTEN channel. So we know how many concurrent scache loads will happen.

I am afraid I don't understand. How do you want to count the number of Pgrst instances based on LISTEN channel?

We can also know the time the latest scache query took, since we have the metric

postgrest/src/PostgREST/Metrics.hs

Line 29 in 66fda76

schemaCacheQueryTime :: Gauge,

Perhaps we can sample the first scache query time (2) and combined with 1 calculate the right number of locks?

OK, but what should be the formula to calculate the number of locks?

IMHO, If you already have the estimated number of nodes, calculating log2(number_of_nodes) is as simple as it gets - no additional information required.

The main reason why the proposal in this PR has merit IMHO is all required information is available locally to the node (ie. it is only the number of its opened connections). So acquiring the lock can be done with a single SELECT query taking a single parameter.

Also I believe this should be a feature instead of a fix.

I think we're looking at two separate issues here:

It's a bug that multiple PostgREST instances just end up as a thundering herd.

There's a performance problem in reloading multiple PostgREST instances at the same time.

We should fix the bug by limiting to 1 concurrent schema cache reloader. We should then discuss how we can best improve performance. This discussion is inflating both.

It's a bug that multiple PostgREST instances just end up as a thundering herd.

There's a performance problem in reloading multiple PostgREST instances at the same time.

We should fix the bug by limiting to 1 concurrent schema cache reloader.

Such a "fix" would introduce another (major) bug and hence is not acceptable IMO.

We should then discuss how we can best improve performance. This discussion is inflating both.

Given the above, we must discuss both to come up with the right solution, I'm afraid.

steve-chavez · 2026-02-14T19:05:55Z

src/PostgREST/AppState.hs

+      withTxLock <- do
+        -- Allow 10 concurrent schema cache loads, guarded by advisory locks.
+        -- This is to prevent thundering herd problem on startup or when many PostgREST instances receive "reload schema" notifications at the same time
+        lockId <- getRandomR (50168275::Int64, 50168275 + 10)


What's the reasoning behind this magic number?

What's the reasoning behind this magic number?

It is just randomly generated large number. We need something with low probability of being used by something else than PostgREST here (but it needs to be hardcoded constant so that there is no risk of instances using different locks).

mkleczek · 2026-02-25T20:00:38Z

@steve-chavez @wolfgangwalther rebased on top of #4672 to make review easier.

DISCLAIMER: This commit was authored entirely by a human without the assistance of LLMs. Some helpers are provided for introspecting metrics already (used in JWT cache tests). This change provides facilities to additionally validate emited Observation events. A new Spec module is also implemented, adding basic tests of schema cache reloading - their main goal is to excercise the new infrastructure.

DISCLAIMER: This commit was authored entirely by a human without the assistance of LLMs. Right now metrics observation handler does not track database connections but updates a single Gauge based on HasqlPoolObs events. This is problematic because Hasql pool reports various connection events in multiple phases. The connection state machine is not simple and to precisely report the number of connections in various states, it is necessary to track their lifecycles. This change adds a ConnTrack data structure and logic to track database connections lifecycles. At the moment it supports "connected" and "inUse" connection counts precisely. The "pgrst_db_pool_available" metric is implemented on top of ConnTrack instead of a simple Gauge.

DISCLAIMER: This commit was authored entirely by a human without the assistance of LLMs. Triggering schema cache reload immediately upon receival of notification by the listener leads to thundering herd problem in PostgREST cluster. This change adds limiting of number of concurrent schema cache loading queries using advisory locks.

mkleczek mentioned this pull request Feb 10, 2026

fix: Add random delay in Listener.cacheReloader #4639

Closed

mkleczek force-pushed the use-advisory-locks-to-throttle-schema-cache-loading branch 2 times, most recently from 92cf79e to b63457e Compare February 10, 2026 21:51

steve-chavez reviewed Feb 11, 2026

View reviewed changes

steve-chavez reviewed Feb 14, 2026

View reviewed changes

mkleczek mentioned this pull request Feb 15, 2026

Introduce experimental-config-options configuration property #4654

Open

mkleczek force-pushed the use-advisory-locks-to-throttle-schema-cache-loading branch 11 times, most recently from c72cfeb to 8b8ef5c Compare February 20, 2026 16:56

mkleczek mentioned this pull request Feb 25, 2026

refactor(test): provide means to validate metrics and observations #4671

Open

mkleczek force-pushed the use-advisory-locks-to-throttle-schema-cache-loading branch from 8b8ef5c to f3e0692 Compare February 25, 2026 19:58

mkleczek mentioned this pull request Feb 26, 2026

refactor: implement connection tracking in metrics #4672

Open

mkleczek force-pushed the use-advisory-locks-to-throttle-schema-cache-loading branch 3 times, most recently from 50f945f to 0d02360 Compare March 4, 2026 10:03

mkleczek force-pushed the use-advisory-locks-to-throttle-schema-cache-loading branch 3 times, most recently from ed98903 to 2017326 Compare March 13, 2026 05:27

mkleczek added 3 commits March 13, 2026 07:47

mkleczek force-pushed the use-advisory-locks-to-throttle-schema-cache-loading branch from 2017326 to a2876fe Compare March 13, 2026 06:49

Uh oh!

Conversation

mkleczek commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Node number estimation

Number of locks calculation

Uh oh!

steve-chavez Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

mkleczek commented Feb 10, 2026 •

edited

Loading

mkleczek Feb 11, 2026 •

edited

Loading

mkleczek Feb 20, 2026 •

edited

Loading

steve-chavez Feb 20, 2026 •

edited

Loading

mkleczek Feb 20, 2026 •

edited

Loading