Skip to content

[postgres] Prevent incidents by detecting idle sessions holding locks#21182

Merged
sethsamuel merged 22 commits intomasterfrom
nenadnoveljic/pg-long-locks
Sep 17, 2025
Merged

[postgres] Prevent incidents by detecting idle sessions holding locks#21182
sethsamuel merged 22 commits intomasterfrom
nenadnoveljic/pg-long-locks

Conversation

@nenadnoveljic
Copy link
Copy Markdown
Member

@nenadnoveljic nenadnoveljic commented Aug 27, 2025

What does this PR do?

Adding postgresql.locks.idle_in_transaction_age metric.

Motivation

Locks can sit around quietly for a while — until someone else needs the same data. At that point, blocking kicks in, and while DBM will show you the blocking tree, it’s already too late: the incident has happened.

The idea behind this PR is to spot potentially problematic locks early and avoid those incidents. Imagine someone starts a transaction, updates some rows (which takes a lock), and then gets a call to head out for a beer. They forget to commit, and the lock just sits there. Nothing breaks… until a nightly batch job tries to touch the same data, and suddenly that forgotten lock causes a blocking cascade.

It doesn’t take a bar-goer to cause this, of course. A stuck or abandoned connection from the pool, or a software bug leaving a session idle with locks, can have the same effect. That’s why we’re adding a metric for idle sessions holding locks. The metric includes DB name, relation owner and name, user, app, and PID, so users can create monitors and act before the issue turns into an incident.

image image

Trade-offs

  • Emitting PIDs increases cardinality, so we cap results at 100 rows and only return sessions older than a minute.
  • Since Postgres doesn’t record the exact lock acquisition time, we use the transaction start time as an upper bound. To reduce false negatives, we set the age threshold high in the monitor (e.g. 30 minutes). If a transaction has been idle that long and holds locks, it likely warrants attention.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.23%. Comparing base (6005911) to head (b70d5f3).
⚠️ Report is 119 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nenadnoveljic nenadnoveljic force-pushed the nenadnoveljic/pg-long-locks branch from 0bd58a5 to 6bc6c1c Compare August 28, 2025 09:58
@nenadnoveljic nenadnoveljic changed the title [postgres][APM RnD week] Monitor idle transactions holding locks [postgres][APM RnD week] Prevent incidents by detecting idle sessions holding locks Aug 28, 2025
@nenadnoveljic nenadnoveljic changed the title [postgres][APM RnD week] Prevent incidents by detecting idle sessions holding locks [postgres] Prevent incidents by detecting idle sessions holding locks Sep 2, 2025
@nenadnoveljic nenadnoveljic marked this pull request as ready for review September 2, 2025 14:40
@nenadnoveljic nenadnoveljic requested review from a team as code owners September 2, 2025 14:40
Comment thread postgres/changelog.d/21182.added Outdated
Co-authored-by: Eric Weaver <eweaver755@gmail.com>
@sethsamuel sethsamuel added this pull request to the merge queue Sep 17, 2025
Merged via the queue into master with commit a049391 Sep 17, 2025
112 of 115 checks passed
@sethsamuel sethsamuel deleted the nenadnoveljic/pg-long-locks branch September 17, 2025 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants