Skip to content

Latest commit

 

History

History
502 lines (406 loc) · 16 KB

20240107_01.md

File metadata and controls

502 lines (406 loc) · 16 KB

PostgreSQL如何支持可选择性表空间(Selectivity Tablespace)备份和时间点(PITR)恢复?

作者

digoal

日期

2024-01-07

标签

PostgreSQL , PolarDB , DuckDB , 表空间时间点恢复 , Selectivity Tablespace PITR


背景

PostgreSQL如何支持可选择性表空间(Selectivity Tablespace)备份和时间点(PITR)恢复?

2012年甚至更早就支持了: 《PostgreSQL Selectivity Tablespace PITR - 部分表空间恢复》

前置条件:

  • 需要备份全局数据($PGDATA里除base表空间/用户自定义表空间之外的东西.(包括控制文件)),
  • 需要恢复的表空间,
  • 以及所需的wal归档文件.
  • 恢复时需要配置跳过错误的数据块(因为其他表空间的数据文件没有备份.)

下面将带你在PostgreSQL 14上做个测试, 如何实现? 使用如下docker环境:

PostgreSQL表空间备份和完全恢复参考另一篇: 《PostgreSQL基于表空间的在线备份和完全恢复》

准备模拟 可选择性表空间(Selectivity Tablespace)备份和时间点(PITR)恢复

创建docker容器

docker run -d -it -P --cap-add=SYS_PTRACE --cap-add SYS_ADMIN --privileged=true --name pg --shm-size=1g --entrypoint /bin/bash registry.cn-hangzhou.aliyuncs.com/digoal/opensource_database:pg14_with_exts  

进入容器

docker exec -ti pg bash    

创建归档目录

su - postgres  
  
mkdir /tmp/arch  

配置归档

su - postgres  
  
vi $PGDATA/postgresql.conf  
  
  
archive_mode = on   
archive_command = 'test ! -f /tmp/arch/%f && cp %p /tmp/arch/%f'   

启动数据库

pg_ctl start  

创建a,b表空间

mkdir /tmp/tbsa  
mkdir /tmp/tbsb  
postgres@dcdaa877dadd:~/14/pgdata$ psql  
psql (14.10 (Debian 14.10-1.pgdg110+1))  
Type "help" for help.  
postgres=# create tablespace tbsa location '/tmp/tbsa';  
CREATE TABLESPACE  
postgres=# create tablespace tbsb location '/tmp/tbsb';  
CREATE TABLESPACE  

创建a,b数据库

postgres=# create database a tablespace tbsa;  
CREATE DATABASE  
postgres=# create database b tablespace tbsb;  
CREATE DATABASE  

pgbench a,b数据库, 写入数据, 执行DML压测

postgres@dcdaa877dadd:~/14/pgdata$ pgbench -i -s 10 a  
dropping old tables...  
NOTICE:  table "pgbench_accounts" does not exist, skipping  
NOTICE:  table "pgbench_branches" does not exist, skipping  
NOTICE:  table "pgbench_history" does not exist, skipping  
NOTICE:  table "pgbench_tellers" does not exist, skipping  
creating tables...  
generating data (client-side)...  
1000000 of 1000000 tuples (100%) done (elapsed 1.35 s, remaining 0.00 s)  
vacuuming...  
creating primary keys...  
done in 2.65 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.39 s, vacuum 0.28 s, primary keys 0.96 s).  
  
postgres@dcdaa877dadd:~/14/pgdata$ pgbench -i -s 10 b  
dropping old tables...  
NOTICE:  table "pgbench_accounts" does not exist, skipping  
NOTICE:  table "pgbench_branches" does not exist, skipping  
NOTICE:  table "pgbench_history" does not exist, skipping  
NOTICE:  table "pgbench_tellers" does not exist, skipping  
creating tables...  
generating data (client-side)...  
1000000 of 1000000 tuples (100%) done (elapsed 1.30 s, remaining 0.00 s)  
vacuuming...  
creating primary keys...  
done in 2.59 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.33 s, vacuum 0.28 s, primary keys 0.96 s).  
postgres@dcdaa877dadd:~/14/pgdata$ pgbench -M prepared -n -r -P 5 -c 4 -j 4 -T 30 a  
pgbench (14.10 (Debian 14.10-1.pgdg110+1))  
progress: 5.0 s, 4232.7 tps, lat 0.944 ms stddev 0.138  
progress: 10.0 s, 4370.0 tps, lat 0.915 ms stddev 0.119  
progress: 15.0 s, 4198.2 tps, lat 0.952 ms stddev 0.198  
progress: 20.0 s, 3990.6 tps, lat 1.002 ms stddev 0.178  
progress: 25.0 s, 4015.2 tps, lat 0.996 ms stddev 0.153  
progress: 30.0 s, 3964.6 tps, lat 1.008 ms stddev 0.173  
transaction type: <builtin: TPC-B (sort of)>  
scaling factor: 10  
query mode: prepared  
number of clients: 4  
number of threads: 4  
duration: 30 s  
number of transactions actually processed: 123863  
latency average = 0.968 ms  
latency stddev = 0.165 ms  
initial connection time = 4.305 ms  
tps = 4129.040009 (without initial connection time)  
statement latencies in milliseconds:  
         0.001  \set aid random(1, 100000 * :scale)  
         0.000  \set bid random(1, 1 * :scale)  
         0.000  \set tid random(1, 10 * :scale)  
         0.000  \set delta random(-5000, 5000)  
         0.117  BEGIN;  
         0.150  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.131  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.148  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.154  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.129  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.137  END;  
  
postgres@dcdaa877dadd:~/14/pgdata$ pgbench -M prepared -n -r -P 5 -c 4 -j 4 -T 30 b  
pgbench (14.10 (Debian 14.10-1.pgdg110+1))  
progress: 5.0 s, 4034.0 tps, lat 0.990 ms stddev 0.185  
progress: 10.0 s, 3840.7 tps, lat 1.041 ms stddev 0.167  
progress: 15.0 s, 3736.8 tps, lat 1.070 ms stddev 0.253  
progress: 20.0 s, 3826.7 tps, lat 1.045 ms stddev 0.206  
progress: 25.0 s, 3840.7 tps, lat 1.041 ms stddev 0.178  
progress: 30.0 s, 3772.4 tps, lat 1.060 ms stddev 0.171  
transaction type: <builtin: TPC-B (sort of)>  
scaling factor: 10  
query mode: prepared  
number of clients: 4  
number of threads: 4  
duration: 30 s  
number of transactions actually processed: 115260  
latency average = 1.040 ms  
latency stddev = 0.197 ms  
initial connection time = 4.525 ms  
tps = 3842.482744 (without initial connection time)  
statement latencies in milliseconds:  
         0.001  \set aid random(1, 100000 * :scale)  
         0.000  \set bid random(1, 1 * :scale)  
         0.000  \set tid random(1, 10 * :scale)  
         0.000  \set delta random(-5000, 5000)  
         0.126  BEGIN;  
         0.161  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.141  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.157  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.167  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.139  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.148  END;  

start backup

postgres=# select pg_start_backup(now()::text, true);  
 pg_start_backup   
-----------------  
 0/17000110  
(1 row)  

创建备份目录

root@dcdaa877dadd:~# su - postgres  
postgres@dcdaa877dadd:~$ mkdir /tmp/backup  

拷贝system目录(含控制文件)

postgres@dcdaa877dadd:~$ cp -r $PGDATA /tmp/backup/  

拷贝b表空间

postgres@dcdaa877dadd:~$ cp -r /tmp/tbsb /tmp/backup/  

拷贝归档(略, 但是请知道:恢复时需要归档)

stop backup

postgres=# select pg_stop_backup();  
NOTICE:  all required WAL segments have been archived  
 pg_stop_backup   
----------------  
 0/17000220  
(1 row)  

stop backup后, 创建c表空间

su - postgres  
  
postgres@dcdaa877dadd:~/14/pgdata$ mkdir /tmp/tbsc  
postgres@dcdaa877dadd:~/14/pgdata$ psql  
psql (14.10 (Debian 14.10-1.pgdg110+1))  
Type "help" for help.  
  
postgres=# create tablespace tbsc location '/tmp/tbsc';  
CREATE TABLESPACE  

创建c数据库

postgres=# create database c tablespace tbsc;  
CREATE DATABASE  

pgbench a,b,c 对a,b,c库进行压力测试

pgbench -M prepared -n -r -c 4 -j 4 -T 30 a  
pgbench -M prepared -n -r -c 4 -j 4 -T 30 b  
pgbench -i -s 10 c  
pgbench -M prepared -n -r -c 4 -j 4 -T 30 c  

在b数据库中创建表, 新建数据.

psql  
\c b postgres  
create table test (id int, info text, ts timestamp);  
insert into test select generate_series(1,10000), md5(random()::text), clock_timestamp();  
insert into test values (0,'test',now());  

创建还原点, 记住还原点名称, 后面恢复要用到. 恢复到还原点时, 以上数据应该都在.

postgres=# select md5(now()::text);  
               md5                  
----------------------------------  
 551cfacd27dbd7de0de6da6ccfa9486b  
(1 row)  
  
postgres=# select pg_create_restore_point('551cfacd27dbd7de0de6da6ccfa9486b');  
 pg_create_restore_point   
-------------------------  
 0/2A06E330  
(1 row)  

在b数据库中创建表, 并进行一些修改. 恢复到还原点时, 以下数据应该都不在.

psql  
\c b postgres  
insert into test values (-1,'test-1',now());   
  
create table tbl (like test);  
insert into tbl select generate_series(1,10000), md5(random()::text), clock_timestamp();  
insert into tbl values (0,'test',now());  

切换wal日志文件. (确保归档)

b=# checkpoint;  
CHECKPOINT  
b=# select pg_switch_wal();  
 pg_switch_wal   
---------------  
 0/2A17D408  
(1 row)  
  
b=# checkpoint;  
CHECKPOINT  
b=# select pg_switch_wal();  
 pg_switch_wal   
---------------  
 0/2B0000F0  
(1 row)  

停止原库, 删除集群及所有表空间文件

pg_ctl stop -m fast  
rm -rf $PGDATA/*  
rm -rf /tmp/tbsa/*  
rm -rf /tmp/tbsb/*  
rm -rf /tmp/tbsc/*  

当前备份能不能将b表空间恢复到还原点位置?

还原system, b表空间文件

cp -r /tmp/backup/pgdata/* $PGDATA/  
cp -r /tmp/backup/tbsb/* /tmp/tbsb/  
postgres@dcdaa877dadd:~/14/pgdata$ ll $PGDATA/pg_tblspc  
total 12K  
lrwxrwxrwx 1 postgres postgres    9 Jan  7 12:29 16385 -> /tmp/tbsb  
lrwxrwxrwx 1 postgres postgres    9 Jan  7 12:29 16384 -> /tmp/tbsa  
drwx------ 2 postgres postgres 4.0K Jan  7 12:29 .  
drwx------ 1 postgres postgres 4.0K Jan  7 12:29 ..  

配置recovery.conf (12版本开始, 这个配置已经在postgresql.conf中了)

vi $PGDATA/postgresql.conf  
  
# 注释归档  
#archive_mode = on   
#archive_command = 'test ! -f /tmp/arch/%f && cp %p /tmp/arch/%f'  
  
# 恢复配置  
restore_command = 'cp /tmp/arch/%f %p'  
recovery_target_name = '551cfacd27dbd7de0de6da6ccfa9486b'  
recovery_target_inclusive = on  
recovery_target_timeline = 'latest'  
recovery_target_action = 'pause'  
  
# 开启standby模式  
hot_standby = on  
  
# 配置 zero_damaged_pages 跳过错误的块    
zero_damaged_pages = on  

配置恢复模式

touch $PGDATA/standby.signal  

《PostgreSQL 12 preview - recovery.conf 合并入 postgresql.conf后的HOT STANDBY配置(standby.signal , recovery.signal)》

启动实例, 开始恢复

pg_ctl start  

查看是否已恢复

postgres@dcdaa877dadd:~/14/pgdata$ psql  
psql (14.10 (Debian 14.10-1.pgdg110+1))  
Type "help" for help.  
  
postgres=# \l  
                               List of databases  
   Name    |  Owner   | Encoding | Collate |   Ctype    |   Access privileges     
-----------+----------+----------+---------+------------+-----------------------  
 a         | postgres | UTF8     | C       | en_US.UTF8 |   
 b         | postgres | UTF8     | C       | en_US.UTF8 |   
 c         | postgres | UTF8     | C       | en_US.UTF8 |   
 postgres  | postgres | UTF8     | C       | en_US.UTF8 |   
 template0 | postgres | UTF8     | C       | en_US.UTF8 | =c/postgres          +  
           |          |          |         |            | postgres=CTc/postgres  
 template1 | postgres | UTF8     | C       | en_US.UTF8 | =c/postgres          +  
           |          |          |         |            | postgres=CTc/postgres  
(6 rows)  
  
  
-- 当前处于恢复模式  
b=# select pg_is_in_recovery();  
 pg_is_in_recovery   
-------------------  
 t  
(1 row)  
  
-- 已恢复到目标位置, 处于暂停恢复状态  
b=# select * from pg_get_wal_replay_pause_state();  
 pg_get_wal_replay_pause_state   
-------------------------------  
 paused  
(1 row)  
  
  
b=# select pg_is_wal_replay_paused();  
 pg_is_wal_replay_paused   
-------------------------  
 t  
(1 row)  
  
  
-- 链接到b数据库正常  
postgres=# \c b postgres  
You are now connected to database "b" as user "postgres".  
b=# \dt  
              List of relations  
 Schema |       Name       | Type  |  Owner     
--------+------------------+-------+----------  
 public | pgbench_accounts | table | postgres  
 public | pgbench_branches | table | postgres  
 public | pgbench_history  | table | postgres  
 public | pgbench_tellers  | table | postgres  
 public | test             | table | postgres  
(5 rows)  
  
-- 经查询, 恢复位置之前的数据都在, 之后的数据不存在.   
b=# select * from test where id=0;  
 id | info |             ts               
----+------+----------------------------  
  0 | test | 2024-01-07 12:24:16.015474  
(1 row)  
  
b=# select * from test where id=-1;  
 id | info | ts   
----+------+----  
(0 rows)  

c库是备份后创建的, 所以有完整的wal, c库被正常恢复出来了.

b=# \c c postgres  
You are now connected to database "c" as user "postgres".  
c=# \dt  
              List of relations  
 Schema |       Name       | Type  |  Owner     
--------+------------------+-------+----------  
 public | pgbench_accounts | table | postgres  
 public | pgbench_branches | table | postgres  
 public | pgbench_history  | table | postgres  
 public | pgbench_tellers  | table | postgres  
(4 rows)  
  
c=# select count(*) from pgbench_accounts ;  
  count    
---------  
 1000000  
(1 row)  
  
c=# select count(*) from pgbench_branches;  
 count   
-------  
    10  
(1 row)  

但是a库是备份前已经存在的, 原来有文件, 恢复过程需要data page+wal, 因为没有备份所以恢复不了a库

c=# \c a postgres  
connection to server on socket "/var/lib/postgresql/14/pgdata/.s.PGSQL.1921" failed: FATAL:  "pg_tblspc/16384/PG_14_202107181/16386" is not a valid data directory  
DETAIL:  File "pg_tblspc/16384/PG_14_202107181/16386/PG_VERSION" is missing.  
Previous connection kept  

参考

《PostgreSQL recovery target introduce》 详细介绍了PITR恢复参数的概念.

《PostgreSQL Selectivity Tablespace PITR - 部分表空间恢复》

digoal's wechat