Skip to content

Commit 79e427d

Browse files
committed
Day 40
1 parent 1982b11 commit 79e427d

File tree

1 file changed

+153
-0
lines changed

1 file changed

+153
-0
lines changed
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
Originally from: [tweet](https://twitter.com/samokhvalov/status/1721036531580973494), [LinkedIn post]().
2+
3+
---
4+
5+
# How to break a database, Part 2: Simulate infamous transaction ID wraparound
6+
7+
> I post a new PostgreSQL "howto" article every day. Join me in this
8+
> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share!
9+
10+
See also [Part 1: How to Corrupt](0039_how_to_break_a_database_part_1_how_to_corrupt.md).
11+
12+
## Straightforward simulation method
13+
14+
The method requires some time but works pretty well. It has been discussed in a couple of places:
15+
16+
- My tweet [A question from Telegram chat](https://twitter.com/samokhvalov/status/1415575072081809409)
17+
- Article
18+
[How to simulate the deadly, infamous, misunderstood & complex ‘Transaction Wraparound Problem’ in PostgreSQL](https://fatdba.com/2021/07/20/how-to-simulate-the-deadly-transaction-wraparound-problem-in-postgresql/)
19+
20+
First, open a "long-running transaction" (with normal XID assigned to it, as it would be a writing transaction) – and
21+
keep the session open (we'll use a named pipe a.k.a. FIFO to "hang" our `psql` session):
22+
23+
```bash
24+
mkfifo dummy
25+
26+
psql -Xc "
27+
set idle_in_transaction_session_timeout = 0;
28+
begin;
29+
select pg_current_xact_id()
30+
" \
31+
-f dummy &
32+
```
33+
34+
Make sure the session is open and in transaction:
35+
36+
```bash
37+
❯ psql -Xc "select state
38+
from pg_stat_activity
39+
where
40+
pid <> pg_backend_pid()
41+
and query ~ 'idle_in_tran'
42+
"
43+
state
44+
---------------------
45+
idle in transaction
46+
(1 row)
47+
```
48+
49+
Now, having an open long-running transaction with assigned XID, we just need to shift transaction ID at high speed – we
50+
can use the same function for it, and `pgbench` with multiple connections:
51+
52+
```bash
53+
pgbench -c8 -j8 -P60 -T36000 -rn \
54+
-f - <<< 'select pg_current_xact_id()'
55+
```
56+
57+
This should move the current XID at very high pace (100-200k TPS). And since we have an open a long-running
58+
transaction, `autovacuum` cannot operate.
59+
60+
While `pgbench` is running, we can observe the state of the database using a monitoring tool that includes XID/multiXID
61+
wraparound checks (every tool must, but not every tool does), or just some snippet – e.g.
62+
from [Managing Transaction ID Exhaustion (Wraparound) in PostgreSQL](https://crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql).
63+
64+
After a few hours:
65+
66+
```
67+
WARNING: database "nik" must be vacuumed within 39960308 transactions
68+
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
69+
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
70+
```
71+
72+
And if we proceed, then this:
73+
74+
```bash
75+
nik=# create table t3();
76+
ERROR: database is not accepting commands to avoid wraparound data loss in database "nik"
77+
HINT: Stop the postmaster and vacuum that database in single-user mode.
78+
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
79+
```
80+
81+
Good, the harm is done! It's time to release FIFO (this will close our long-lasting idle-in-transaction session):
82+
83+
```bash
84+
exec 3>dummy && exec 3>&-
85+
```
86+
87+
## How to escape – the basics
88+
89+
How to escape? We'll probably discuss this in a separate howto, but let's talk about some basics.
90+
91+
See real-life use cases to learn from the other people's mistakes:
92+
93+
- [Sentry's case](https://blog.sentry.io/transaction-id-wraparound-in-postgres/),
94+
- [Mailchimp's one](https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/)
95+
96+
are good lessons.
97+
98+
Escaping from this takes, in general case, takes a lot of time. Traditionally, as the `HINT` above suggests, it is done
99+
in the single-user mode. But I very much like the ideas by
100+
[@PostSQL](https://twitter.com/PostSQL)
101+
from the GCP team – see the talk [Do you vacuum every day?](https://youtube.com/watch?v=JcRi8Z7rkPg) and discussion in
102+
the `pgsql-hackers` mailing list
103+
[We should stop telling users to "vacuum that database in single-user mode"](https://postgresql.org/message-id/flat/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com).
104+
105+
## Alternative method to simulate wraparound
106+
107+
There is another method for simulation which works much faster – no need to wait many hours, we can use `pg_resetwal`'s
108+
option `-x` ([docs](https://postgresql.org/docs/current/app-pgresetwal.html)) to fast-forward the transaction ID. This
109+
method is described in [Transaction ID wraparound: a walk on the wild
110+
side](https://cybertec-postgresql.com/en/transaction-id-wraparound-a-walk-on-the-wild-side/). It's quite interesting
111+
and looks like this:
112+
113+
```bash
114+
pg_ctl stop -D $PGDATA
115+
116+
pg_resetwal -x 2137483648 -D testdb ### 2^31 - 10000000
117+
118+
dd if=/dev/zero of=$PGDATA/pg_xact/07F6 bs=8192 count=15
119+
```
120+
121+
## MultiXact ID wraparound
122+
123+
MultiXact ID is also possible:
124+
[Multixacts and Wraparound](https://postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND).
125+
This risk is often overlooked. But it exists, especially if you have FKs and/or use
126+
`SELECT ... FOR SHARE` (and `SELECT ... FOR UPDATE`) combined with substransactions).
127+
128+
Some articles to learn about it:
129+
130+
- [A foreign key pathology to avoid](https://thebuild.com/blog/2023/01/18/a-foreign-key-pathology-to-avoid/)
131+
- [Subtransactions considered harmful: Problem 3: unexpected use of Multixact IDs](https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful#problem-3-unexpected-use-of-multixact-ids)
132+
- [Notes on some PostgreSQL implementation details](https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/)
133+
134+
Simulation of MultiXact ID wraparound is also possible – for this, we need to create multiple overlapping transactions
135+
that lock the same rows (for example, using explicit `SELECT ... FOR SHARE`).
136+
137+
> 🎯 **TODO:** working example – left for future edits.
138+
139+
Everyone should not only monitor (with alerting) for traditional XID wraparound risks, but also for MultiXID wraparound
140+
risks. And it should be included in the snippets showing how much of the "capacity" is used.
141+
142+
> 🎯 **TODO:** snippet to show both XID w-d and MultiXID w-d, at both DB and table level
143+
> Obviously, multixid wraparrounds are encountered less often in the wild – I don't see people have `datminmxid` and
144+
> `relminmxid` used in the snippets.
145+
> Basic version:
146+
>
147+
> ```sql
148+
> select
149+
> datname,
150+
> age(datfrozenxid),
151+
> mxid_age(datminmxid)
152+
> from pg_database;
153+
> ```

0 commit comments

Comments
 (0)