|
| 1 | +Originally from: [tweet](https://twitter.com/samokhvalov/status/1721799840886387097), [LinkedIn post](). |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +# How to analyze heavyweight locks, part 2: Lock trees (a.k.a. "lock queues", "wait queues", "blocking chains") |
| 6 | + |
| 7 | +> I post a new PostgreSQL "howto" article every day. Join me in this |
| 8 | +> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share! |
| 9 | +
|
| 10 | +See also [Part 1](0022_how_to_analyze_heavyweight_locks_part_1.md). |
| 11 | + |
| 12 | +Good sources of knowledge: |
| 13 | + |
| 14 | +- [13.3. Explicit Locking](https://postgresql.org/docs/current/explicit-locking.html) – the docs (despite the title, |
| 15 | + it's only about the explicit locking). |
| 16 | +- [PostgreSQL rocks, except when it blocks: Understanding locks (2018)](https://citusdata.com/blog/2018/02/15/when-postgresql-blocks/), |
| 17 | + a blog post by [@marcoslot](https://twitter.com/marcoslot) |
| 18 | +- Egor Rogov's book [PostgreSQL 14 Internals](https://postgrespro.com/community/books/internals), Part III "Locks". |
| 19 | +- [PostgreSQL Lock Conflicts](https://postgres-locks.husseinnasser.com) – a reference-like tool by |
| 20 | + [@hnasr](https://twitter.com/hnasr) to study the relationships between various lock types and what types of locks |
| 21 | + various SQL commands acquire. |
| 22 | + |
| 23 | +When locking issues occur, we usually need to: |
| 24 | + |
| 25 | +1. Understand the nature and the scale of the problem. |
| 26 | +2. Consider terminating the initial "offending" sessions |
| 27 | + – tree roots – to stop the storm ASAP (usually, using `select pg_terminate_backend(<pid>);`). |
| 28 | + |
| 29 | +Here is an advanced query that, in general case, shows the "forest of lock trees" (since there might be several "root" |
| 30 | +sessions, from which multiple "trees" grow): |
| 31 | + |
| 32 | +```sql |
| 33 | +\timing on |
| 34 | +set statement_timeout to '100ms'; |
| 35 | + |
| 36 | +with recursive activity as ( |
| 37 | + select |
| 38 | + pg_blocking_pids(pid) blocked_by, |
| 39 | + *, |
| 40 | + age(clock_timestamp(), xact_start)::interval(0) as tx_age, |
| 41 | + -- "pg_locks.waitstart" – PG14+ only; for older versions: age(clock_timestamp(), state_change) as wait_age |
| 42 | + age(clock_timestamp(), (select max(l.waitstart) from pg_locks l where http://a.pid = http://l.pid))::interval(0) as wait_age |
| 43 | + from pg_stat_activity a |
| 44 | + where state is distinct from 'idle' |
| 45 | +), blockers as ( |
| 46 | + select |
| 47 | + array_agg(distinct c order by c) as pids |
| 48 | + from ( |
| 49 | + select unnest(blocked_by) |
| 50 | + from activity |
| 51 | + ) as dt(c) |
| 52 | +), tree as ( |
| 53 | + select |
| 54 | + activity.*, |
| 55 | + 1 as level, |
| 56 | + http://activity.pid as top_blocker_pid, |
| 57 | + array[http://activity.pid] as path, |
| 58 | + array[http://activity.pid]::int[] as all_blockers_above |
| 59 | + from activity, blockers |
| 60 | + where |
| 61 | + array[pid] <@ blockers.pids |
| 62 | + and blocked_by = '{}'::int[] |
| 63 | + union all |
| 64 | + select |
| 65 | + activity.*, |
| 66 | + tree.level + 1 as level, |
| 67 | + http://tree.top_blocker_pid, |
| 68 | + path || array[http://activity.pid] as path, |
| 69 | + tree.all_blockers_above || array_agg(http://activity.pid) over () as all_blockers_above |
| 70 | + from activity, tree |
| 71 | + where |
| 72 | + not array[http://activity.pid] <@ tree.all_blockers_above |
| 73 | + and activity.blocked_by <> '{}'::int[] |
| 74 | + and activity.blocked_by <@ tree.all_blockers_above |
| 75 | +) |
| 76 | +select |
| 77 | + pid, |
| 78 | + blocked_by, |
| 79 | + case when wait_event_type <> 'Lock' then replace(state, 'idle in transaction', 'idletx') else 'waiting' end as state, |
| 80 | + wait_event_type || ':' || wait_event as wait, |
| 81 | + wait_age, |
| 82 | + tx_age, |
| 83 | + to_char(age(backend_xid), 'FM999,999,999,990') as xid_age, |
| 84 | + to_char(2147483647 - age(backend_xmin), 'FM999,999,999,990') as xmin_ttf, |
| 85 | + datname, |
| 86 | + usename, |
| 87 | + (select count(distinct http://t1.pid) from tree t1 where array[http://tree.pid] <@ t1.path and http://t1.pid <> http://tree.pid) as blkd, |
| 88 | + format( |
| 89 | + '%s %s%s', |
| 90 | + lpad('[' || pid::text || ']', 9, ' '), |
| 91 | + repeat('.', level - 1) || case when level > 1 then ' ' end, |
| 92 | + left(query, 1000) |
| 93 | + ) as query |
| 94 | +from tree |
| 95 | +order by top_blocker_pid, level, pid |
| 96 | + |
| 97 | +\watch 10 |
| 98 | +``` |
| 99 | + |
| 100 | +Notes: |
| 101 | + |
| 102 | +1) It is present in the for ready to be executed in `psql`. For other clients, remove backslash commands; instead |
| 103 | + of `\watch`, use `;`. |
| 104 | + |
| 105 | +2) The function `pg_blocking_pids(...)`, according to the docs, should be used with care: |
| 106 | + |
| 107 | + > Frequent calls to this function could have some impact on database performance, because it needs exclusive access to |
| 108 | + > the lock manager's shared state for a short time. |
| 109 | +
|
| 110 | +It is not recommended to use it in an automated fashion (e.g., putting into monitoring). And this is why we have a low |
| 111 | +value for `statement_timeout` above – as protection. |
| 112 | + |
| 113 | +Example output: |
| 114 | + |
| 115 | + |
| 116 | + |
| 117 | +Notes: |
| 118 | + |
| 119 | +- Two trees with two root sessions – those with PIDs 46015 and 46081. |
| 120 | +- Both are waiting on client (`wait_event_type:wait_event` pair is `Client:ClientRead`), acquired some locks (last |
| 121 | + queries in session 46015 being an `UPDATE`, in session 46081 – `DROP TABLE`) and holding them. |
| 122 | +- The first tree (with root 46015) is bigger (11 blocked sessions) and reached the `height=4` (or the depth, depending |
| 123 | + on the point of view/terminology). This is exactly that an unfortunate situation when an `ALTER TABLE`, attempting to |
| 124 | + modify some table but being blocked by another session, starts blocking any session that tries to work with that |
| 125 | + table – even `SELECT`s (the problem discussed |
| 126 | + in [Zero-downtime Postgres schema migrations need this: lock_timeout and retries](https://postgres.ai/blog/20210923-zero-downtime-postgres-schema-migrations-lock-timeout-and-retries)). |
| 127 | +- While we're analyzing this, the situation might quickly change, so it might make sense to add timestamps or |
| 128 | + intervals (e.g., based on `xact_start`, `state_change` from `pg_stat_acitivty`). Also note, that since the results |
| 129 | + might have inconsistencies – when we read from `pg_stat_statements`, we deal with some dynamic data, not a snapshot, |
| 130 | + so there having some skews in the results is normal, if session states change quickly. Usually, it makes sense to |
| 131 | + analyze several sample results of the query before making conclusions and decisions. |
0 commit comments