digoal
2024-01-07
PostgreSQL , PolarDB , DuckDB , 表空间时间点恢复 , Selectivity Tablespace PITR
PostgreSQL如何支持可选择性表空间(Selectivity Tablespace)备份和时间点(PITR)恢复?
2012年甚至更早就支持了: 《PostgreSQL Selectivity Tablespace PITR - 部分表空间恢复》
前置条件:
- 需要备份全局数据(
$PGDATA
里除base表空间/用户自定义表空间之外的东西.(包括控制文件)), - 需要恢复的表空间,
- 以及所需的wal归档文件.
- 恢复时需要配置跳过错误的数据块(因为其他表空间的数据文件没有备份.)
下面将带你在PostgreSQL 14上做个测试, 如何实现? 使用如下docker环境:
PostgreSQL表空间备份和完全恢复参考另一篇: 《PostgreSQL基于表空间的在线备份和完全恢复》
创建docker容器
docker run -d -it -P --cap-add=SYS_PTRACE --cap-add SYS_ADMIN --privileged=true --name pg --shm-size=1g --entrypoint /bin/bash registry.cn-hangzhou.aliyuncs.com/digoal/opensource_database:pg14_with_exts
进入容器
docker exec -ti pg bash
创建归档目录
su - postgres
mkdir /tmp/arch
配置归档
su - postgres
vi $PGDATA/postgresql.conf
archive_mode = on
archive_command = 'test ! -f /tmp/arch/%f && cp %p /tmp/arch/%f'
启动数据库
pg_ctl start
创建a,b表空间
mkdir /tmp/tbsa
mkdir /tmp/tbsb
postgres@dcdaa877dadd:~/14/pgdata$ psql
psql (14.10 (Debian 14.10-1.pgdg110+1))
Type "help" for help.
postgres=# create tablespace tbsa location '/tmp/tbsa';
CREATE TABLESPACE
postgres=# create tablespace tbsb location '/tmp/tbsb';
CREATE TABLESPACE
创建a,b数据库
postgres=# create database a tablespace tbsa;
CREATE DATABASE
postgres=# create database b tablespace tbsb;
CREATE DATABASE
pgbench a,b数据库, 写入数据, 执行DML压测
postgres@dcdaa877dadd:~/14/pgdata$ pgbench -i -s 10 a
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
creating tables...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 1.35 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 2.65 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.39 s, vacuum 0.28 s, primary keys 0.96 s).
postgres@dcdaa877dadd:~/14/pgdata$ pgbench -i -s 10 b
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
creating tables...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 1.30 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 2.59 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.33 s, vacuum 0.28 s, primary keys 0.96 s).
postgres@dcdaa877dadd:~/14/pgdata$ pgbench -M prepared -n -r -P 5 -c 4 -j 4 -T 30 a
pgbench (14.10 (Debian 14.10-1.pgdg110+1))
progress: 5.0 s, 4232.7 tps, lat 0.944 ms stddev 0.138
progress: 10.0 s, 4370.0 tps, lat 0.915 ms stddev 0.119
progress: 15.0 s, 4198.2 tps, lat 0.952 ms stddev 0.198
progress: 20.0 s, 3990.6 tps, lat 1.002 ms stddev 0.178
progress: 25.0 s, 4015.2 tps, lat 0.996 ms stddev 0.153
progress: 30.0 s, 3964.6 tps, lat 1.008 ms stddev 0.173
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 4
duration: 30 s
number of transactions actually processed: 123863
latency average = 0.968 ms
latency stddev = 0.165 ms
initial connection time = 4.305 ms
tps = 4129.040009 (without initial connection time)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.117 BEGIN;
0.150 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.131 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.148 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.154 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.129 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.137 END;
postgres@dcdaa877dadd:~/14/pgdata$ pgbench -M prepared -n -r -P 5 -c 4 -j 4 -T 30 b
pgbench (14.10 (Debian 14.10-1.pgdg110+1))
progress: 5.0 s, 4034.0 tps, lat 0.990 ms stddev 0.185
progress: 10.0 s, 3840.7 tps, lat 1.041 ms stddev 0.167
progress: 15.0 s, 3736.8 tps, lat 1.070 ms stddev 0.253
progress: 20.0 s, 3826.7 tps, lat 1.045 ms stddev 0.206
progress: 25.0 s, 3840.7 tps, lat 1.041 ms stddev 0.178
progress: 30.0 s, 3772.4 tps, lat 1.060 ms stddev 0.171
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 4
duration: 30 s
number of transactions actually processed: 115260
latency average = 1.040 ms
latency stddev = 0.197 ms
initial connection time = 4.525 ms
tps = 3842.482744 (without initial connection time)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.126 BEGIN;
0.161 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.141 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.157 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.167 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.139 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.148 END;
start backup
postgres=# select pg_start_backup(now()::text, true);
pg_start_backup
-----------------
0/17000110
(1 row)
创建备份目录
root@dcdaa877dadd:~# su - postgres
postgres@dcdaa877dadd:~$ mkdir /tmp/backup
拷贝system目录(含控制文件)
postgres@dcdaa877dadd:~$ cp -r $PGDATA /tmp/backup/
拷贝b表空间
postgres@dcdaa877dadd:~$ cp -r /tmp/tbsb /tmp/backup/
拷贝归档(略, 但是请知道:恢复时需要归档)
stop backup
postgres=# select pg_stop_backup();
NOTICE: all required WAL segments have been archived
pg_stop_backup
----------------
0/17000220
(1 row)
stop backup后, 创建c表空间
su - postgres
postgres@dcdaa877dadd:~/14/pgdata$ mkdir /tmp/tbsc
postgres@dcdaa877dadd:~/14/pgdata$ psql
psql (14.10 (Debian 14.10-1.pgdg110+1))
Type "help" for help.
postgres=# create tablespace tbsc location '/tmp/tbsc';
CREATE TABLESPACE
创建c数据库
postgres=# create database c tablespace tbsc;
CREATE DATABASE
pgbench a,b,c 对a,b,c库进行压力测试
pgbench -M prepared -n -r -c 4 -j 4 -T 30 a
pgbench -M prepared -n -r -c 4 -j 4 -T 30 b
pgbench -i -s 10 c
pgbench -M prepared -n -r -c 4 -j 4 -T 30 c
在b数据库中创建表, 新建数据.
psql
\c b postgres
create table test (id int, info text, ts timestamp);
insert into test select generate_series(1,10000), md5(random()::text), clock_timestamp();
insert into test values (0,'test',now());
创建还原点, 记住还原点名称, 后面恢复要用到. 恢复到还原点时, 以上数据应该都在.
postgres=# select md5(now()::text);
md5
----------------------------------
551cfacd27dbd7de0de6da6ccfa9486b
(1 row)
postgres=# select pg_create_restore_point('551cfacd27dbd7de0de6da6ccfa9486b');
pg_create_restore_point
-------------------------
0/2A06E330
(1 row)
在b数据库中创建表, 并进行一些修改. 恢复到还原点时, 以下数据应该都不在.
psql
\c b postgres
insert into test values (-1,'test-1',now());
create table tbl (like test);
insert into tbl select generate_series(1,10000), md5(random()::text), clock_timestamp();
insert into tbl values (0,'test',now());
切换wal日志文件. (确保归档)
b=# checkpoint;
CHECKPOINT
b=# select pg_switch_wal();
pg_switch_wal
---------------
0/2A17D408
(1 row)
b=# checkpoint;
CHECKPOINT
b=# select pg_switch_wal();
pg_switch_wal
---------------
0/2B0000F0
(1 row)
停止原库, 删除集群及所有表空间文件
pg_ctl stop -m fast
rm -rf $PGDATA/*
rm -rf /tmp/tbsa/*
rm -rf /tmp/tbsb/*
rm -rf /tmp/tbsc/*
还原system, b表空间文件
cp -r /tmp/backup/pgdata/* $PGDATA/
cp -r /tmp/backup/tbsb/* /tmp/tbsb/
postgres@dcdaa877dadd:~/14/pgdata$ ll $PGDATA/pg_tblspc
total 12K
lrwxrwxrwx 1 postgres postgres 9 Jan 7 12:29 16385 -> /tmp/tbsb
lrwxrwxrwx 1 postgres postgres 9 Jan 7 12:29 16384 -> /tmp/tbsa
drwx------ 2 postgres postgres 4.0K Jan 7 12:29 .
drwx------ 1 postgres postgres 4.0K Jan 7 12:29 ..
配置recovery.conf (12版本开始, 这个配置已经在postgresql.conf中了)
vi $PGDATA/postgresql.conf
# 注释归档
#archive_mode = on
#archive_command = 'test ! -f /tmp/arch/%f && cp %p /tmp/arch/%f'
# 恢复配置
restore_command = 'cp /tmp/arch/%f %p'
recovery_target_name = '551cfacd27dbd7de0de6da6ccfa9486b'
recovery_target_inclusive = on
recovery_target_timeline = 'latest'
recovery_target_action = 'pause'
# 开启standby模式
hot_standby = on
# 配置 zero_damaged_pages 跳过错误的块
zero_damaged_pages = on
配置恢复模式
touch $PGDATA/standby.signal
启动实例, 开始恢复
pg_ctl start
查看是否已恢复
postgres@dcdaa877dadd:~/14/pgdata$ psql
psql (14.10 (Debian 14.10-1.pgdg110+1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+---------+------------+-----------------------
a | postgres | UTF8 | C | en_US.UTF8 |
b | postgres | UTF8 | C | en_US.UTF8 |
c | postgres | UTF8 | C | en_US.UTF8 |
postgres | postgres | UTF8 | C | en_US.UTF8 |
template0 | postgres | UTF8 | C | en_US.UTF8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | C | en_US.UTF8 | =c/postgres +
| | | | | postgres=CTc/postgres
(6 rows)
-- 当前处于恢复模式
b=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
-- 已恢复到目标位置, 处于暂停恢复状态
b=# select * from pg_get_wal_replay_pause_state();
pg_get_wal_replay_pause_state
-------------------------------
paused
(1 row)
b=# select pg_is_wal_replay_paused();
pg_is_wal_replay_paused
-------------------------
t
(1 row)
-- 链接到b数据库正常
postgres=# \c b postgres
You are now connected to database "b" as user "postgres".
b=# \dt
List of relations
Schema | Name | Type | Owner
--------+------------------+-------+----------
public | pgbench_accounts | table | postgres
public | pgbench_branches | table | postgres
public | pgbench_history | table | postgres
public | pgbench_tellers | table | postgres
public | test | table | postgres
(5 rows)
-- 经查询, 恢复位置之前的数据都在, 之后的数据不存在.
b=# select * from test where id=0;
id | info | ts
----+------+----------------------------
0 | test | 2024-01-07 12:24:16.015474
(1 row)
b=# select * from test where id=-1;
id | info | ts
----+------+----
(0 rows)
c库是备份后创建的, 所以有完整的wal, c库被正常恢复出来了.
b=# \c c postgres
You are now connected to database "c" as user "postgres".
c=# \dt
List of relations
Schema | Name | Type | Owner
--------+------------------+-------+----------
public | pgbench_accounts | table | postgres
public | pgbench_branches | table | postgres
public | pgbench_history | table | postgres
public | pgbench_tellers | table | postgres
(4 rows)
c=# select count(*) from pgbench_accounts ;
count
---------
1000000
(1 row)
c=# select count(*) from pgbench_branches;
count
-------
10
(1 row)
但是a库是备份前已经存在的, 原来有文件, 恢复过程需要data page+wal, 因为没有备份所以恢复不了a库
c=# \c a postgres
connection to server on socket "/var/lib/postgresql/14/pgdata/.s.PGSQL.1921" failed: FATAL: "pg_tblspc/16384/PG_14_202107181/16386" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_14_202107181/16386/PG_VERSION" is missing.
Previous connection kept
《PostgreSQL recovery target introduce》 详细介绍了PITR恢复参数的概念.