|
| 1 | +Originally from: [tweet](https://twitter.com/samokhvalov/status/1720734029207732456), [LinkedIn post](). |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +# How to break a database, Part 1: How to corrupt |
| 6 | + |
| 7 | +> I post a new PostgreSQL "howto" article every day. Join me in this |
| 8 | +> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share! |
| 9 | +
|
| 10 | +Sometimes, you might want to damage a database – for educational purposes, to simulate failures, learn how to deal with |
| 11 | +them, to test mitigation procedures. |
| 12 | + |
| 13 | +Let's discuss some ways to break things. |
| 14 | + |
| 15 | +<span style="padding: 1ex; background-color: yellow"> |
| 16 | +⚠️ Don't do it in production unless you're a chaos engineer ⚠️ |
| 17 | +</span> |
| 18 | + |
| 19 | +## Corruption |
| 20 | + |
| 21 | +There are many types of corruption and there are very simple ways to get a corrupted database, for example: |
| 22 | + |
| 23 | +👉 **Modifying system catalogs directly:** |
| 24 | + |
| 25 | +```sql |
| 26 | +nik=# create table t1(id int8 primary key, val text); |
| 27 | +CREATE TABLE |
| 28 | + |
| 29 | +nik=# delete from pg_attribute where attrelid = 't1'::regclass and attname = 'val'; |
| 30 | +DELETE 1 |
| 31 | + |
| 32 | +nik=# table t1; |
| 33 | +ERROR: pg_attribute catalog is missing 1 attribute(s) for relation OID 107006 |
| 34 | +LINE 1: table t1; |
| 35 | + ^ |
| 36 | +``` |
| 37 | + |
| 38 | +More ways can be found in this article: |
| 39 | +[How to corrupt your PostgreSQL database](https://cybertec-postgresql.com/en/how-to-corrupt-your-postgresql-database/). |
| 40 | +A couple of interesting methods from there: |
| 41 | + |
| 42 | +- `fsync=off` + `kill -9` to Postgres (or `pg_ctl stop -m immediate`) |
| 43 | +- `kill -9` + `pg_resetwal -f` |
| 44 | + |
| 45 | +One useful method is to use `dd` to write to a data file directly. This can be used to simulate a corruption that can be |
| 46 | +detected by checksum verification |
| 47 | +([Day 37: How to enable data checksums without downtime](0037_how_to_enable_data_checksums_without_downtime.md)). This |
| 48 | +is also demonstrated in this article: |
| 49 | +[pg_healer: repairing Postgres problems automatically](https://endpointdev.com/blog/2016/09/pghealer-repairing-postgres-problems/). |
| 50 | + |
| 51 | +First, create a table and see where its data file is located: |
| 52 | + |
| 53 | +```sql |
| 54 | +nik=# show data_checksums; |
| 55 | + data_checksums |
| 56 | +---------------- |
| 57 | + on |
| 58 | +(1 row) |
| 59 | + |
| 60 | +nik=# create table t1 as select i from generate_series(1, 10000) i; |
| 61 | +SELECT 10000 |
| 62 | + |
| 63 | +nik=# select count(*) from t1; |
| 64 | + count |
| 65 | +------- |
| 66 | + 10000 |
| 67 | +(1 row) |
| 68 | + |
| 69 | +nik=# select format('%s/%s', |
| 70 | + current_setting('data_directory'), |
| 71 | + pg_relation_filepath('t1')); |
| 72 | + format |
| 73 | +--------------------------------------------------- |
| 74 | + /opt/homebrew/var/postgresql@15/base/16384/123388 |
| 75 | +(1 row) |
| 76 | +``` |
| 77 | + |
| 78 | +Now, let's write some garbage to this file directly, using `dd` (note that here we use a macOS version, where `dd` has |
| 79 | +the option `oseek` – on Linux, it's `seek_bytes`), and then restart Postgres to make sure the table is not present in |
| 80 | +the buffer pool anymore: |
| 81 | + |
| 82 | +```bash |
| 83 | +❯ echo -n "BOOoo" \ |
| 84 | + | dd conv=notrunc bs=1 \ |
| 85 | + oseek=4000 count=1 \ |
| 86 | + of=/opt/homebrew/var/postgresql@15/base/16384/123388 |
| 87 | + 1+0 records in |
| 88 | + 1+0 records out |
| 89 | + 1 bytes transferred in 0.000240 secs (4167 bytes/sec) |
| 90 | + |
| 91 | +❯ brew services stop postgresql@15 |
| 92 | + Stopping `postgresql@15`... (might take a while) |
| 93 | + ==> Successfully stopped `postgresql@15` (label: homebrew.mxcl.postgresql@15) |
| 94 | + |
| 95 | +❯ brew services start postgresql@15 |
| 96 | + ==> Successfully started `postgresql@15` (label: homebrew.mxcl.postgresql@15) |
| 97 | +``` |
| 98 | + |
| 99 | +Successfully corrupted – the data checksums mechanism complains about it: |
| 100 | + |
| 101 | +```sql |
| 102 | +nik=# table t1; |
| 103 | +WARNING: page verification failed, calculated checksum 52379 but expected 35499 |
| 104 | +ERROR: invalid page in block 0 of relation base/16384/123388 |
| 105 | +``` |
| 106 | + |
| 107 | +**🔜 To be continued ...** |
0 commit comments