Skip to content

Latest commit

 

History

History
567 lines (386 loc) · 21.9 KB

20160823_04.md

File metadata and controls

567 lines (386 loc) · 21.9 KB

PostgreSQL 最佳实践 - 任意时间点恢复源码分析

作者

digoal

日期

2016-08-23

标签

PostgreSQL , PITR , 热备份 , 恢复 , 在线备份 , 恢复 , 时间点恢复 , 恢复到任意时间点 , 源码


背景

我们知道PostgreSQL是支持任意时间点恢复的,那么背后的原理是什么?

本文将对PG的时间点恢复进行详细的讲解,帮助用户理解。

本文涉及源码参考PostgreSQL 9.2.2版本.

时间点恢复涉及的参数

我们知道PostgreSQL 支持PITR, 基于时间点的恢复. 通过配置recovery.conf可以指定3种恢复目标, 如下 :

recovery_target_name (string)    
This parameter specifies the named restore point, created with pg_create_restore_point() to which recovery will proceed.     
At most one of recovery_target_name, recovery_target_time or recovery_target_xid can be specified. The default is to recover to the end of the WAL log.    
    
recovery_target_time (timestamp)    
This parameter specifies the time stamp up to which recovery will proceed.     
At most one of recovery_target_time, recovery_target_name or recovery_target_xid can be specified. The default is to recover to the end of the WAL log.     
The precise stopping point is also influenced by recovery_target_inclusive.    
    
recovery_target_xid (string)    
This parameter specifies the transaction ID up to which recovery will proceed. Keep in mind that while transaction IDs are assigned sequentially at transaction start, transactions can complete in a different numeric order. The transactions that will be recovered are those that committed before (and optionally including) the specified one.     
At most one of recovery_target_xid, recovery_target_name or recovery_target_time can be specified. The default is to recover to the end of the WAL log.     
The precise stopping point is also influenced by recovery_target_inclusive.    

其中recovery_target_time和recovery_target_xid可以指定recovery_target_inclusive参数, 如下 :

recovery_target_inclusive (boolean)    
Specifies whether we stop just after the specified recovery target (true), or just before the recovery target (false).     
Applies to both recovery_target_time and recovery_target_xid, whichever one is specified for this recovery.     
This indicates whether transactions having exactly the target commit time or ID, respectively, will be included in the recovery.     
Default is true.    

默认为true取自src/backend/access/transam/xlog.c :

static bool recoveryTargetInclusive = true;    

为什么recovery_target_name不能指定recovery_target_inclusive参数?

而recovery_target_time和recovery_target_xid可以指定recovery_target_inclusive参数呢?

恢复截至点源码分析

首先要解释一下, 什么情况下恢复可以截止.

只在三种情况恢复可以截止 :

 COMMIT/ABORT/XLOG_RESTORE_POINT,   

然后这些信息从哪里来呢? 它们都取自XLOG的头数据XLogRecord中的sl_rmid和xl_info :

src/include/access/xlog.h

/*    
 * The overall layout of an XLOG record is:    
 *              Fixed-size header (XLogRecord struct)    
 *              rmgr-specific data    
 *              BkpBlock    
 *              backup block data    
 *              BkpBlock    
 *              backup block data    
 *              ...    
 *    
 * where there can be zero to four backup blocks (as signaled by xl_info flag    
 * bits).  XLogRecord structs always start on MAXALIGN boundaries in the WAL    
 * files, and we round up SizeOfXLogRecord so that the rmgr data is also    
 * guaranteed to begin on a MAXALIGN boundary.  However, no padding is added    
 * to align BkpBlock structs or backup block data.    
 *    
 * NOTE: xl_len counts only the rmgr data, not the XLogRecord header,    
 * and also not any backup blocks.      xl_tot_len counts everything.  Neither    
 * length field is rounded up to an alignment boundary.    
 */    
typedef struct XLogRecord    
{    
        pg_crc32        xl_crc;                 /* CRC for this record */    
        XLogRecPtr      xl_prev;                /* ptr to previous record in log */    
        TransactionId xl_xid;           /* xact id */    
        uint32          xl_tot_len;             /* total len of entire record */    
        uint32          xl_len;                 /* total len of rmgr data */    
        uint8           xl_info;                /* flag bits, see below */    
        RmgrId          xl_rmid;                /* resource manager for this record */    
    
        /* Depending on MAXALIGN, there are either 2 or 6 wasted bytes here */    
    
        /* ACTUAL LOG DATA FOLLOWS AT END OF STRUCT */    
    
} XLogRecord;    

只有在这三个状态下, 恢复允许进入截止判断.

COMMIT/ABORT/XLOG_RESTORE_POINT;

这个逻辑来自recoveryStopsHere函数 :

恢复截止的处理函数recoveryStopsHere中包含了这三个状态的判断, 如下 :

src/backend/access/transam/xlog.c

        /* We only consider stopping at COMMIT, ABORT or RESTORE POINT records */    
        if (record->xl_rmid != RM_XACT_ID && record->xl_rmid != RM_XLOG_ID)    
                return false;    
        record_info = record->xl_info & ~XLR_INFO_MASK;    
        if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT_COMPACT)    
        {    
                xl_xact_commit_compact *recordXactCommitData;    
    
                recordXactCommitData = (xl_xact_commit_compact *) XLogRecGetData(record);    
                recordXtime = recordXactCommitData->xact_time;    
        }    
        else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT)    
        {    
                xl_xact_commit *recordXactCommitData;    
    
                recordXactCommitData = (xl_xact_commit *) XLogRecGetData(record);    
                recordXtime = recordXactCommitData->xact_time;    
        }    
        else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_ABORT)    
        {    
                xl_xact_abort *recordXactAbortData;    
    
                recordXactAbortData = (xl_xact_abort *) XLogRecGetData(record);    
                recordXtime = recordXactAbortData->xact_time;    
        }    
        else if (record->xl_rmid == RM_XLOG_ID && record_info == XLOG_RESTORE_POINT)    
        {    
                xl_restore_point *recordRestorePointData;    
    
                recordRestorePointData = (xl_restore_point *) XLogRecGetData(record);    
                recordXtime = recordRestorePointData->rp_time;    
                strncpy(recordRPName, recordRestorePointData->rp_name, MAXFNAMELEN);    
        }    
        else    
                return false;    

COMMIT和ABORT很好理解, 就是事务结束时状态, RESOTRE POINT的信息则来自XLogRestorePoint函数,

src/backend/access/transam/xlog.c

/*    
 * Write a RESTORE POINT record    
 */    
XLogRecPtr    
XLogRestorePoint(const char *rpName)    
{    
        XLogRecPtr      RecPtr;    
        XLogRecData rdata;    
        xl_restore_point xlrec;    
    
        xlrec.rp_time = GetCurrentTimestamp();    
        strncpy(xlrec.rp_name, rpName, MAXFNAMELEN);    
    
        rdata.buffer = InvalidBuffer;    
        rdata.data = (char *) &xlrec;    
        rdata.len = sizeof(xl_restore_point);    
        rdata.next = NULL;    
    
        RecPtr = XLogInsert(RM_XLOG_ID, XLOG_RESTORE_POINT, &rdata);    
    
        ereport(LOG,    
                        (errmsg("restore point \"%s\" created at %X/%X",    
                                        rpName, RecPtr.xlogid, RecPtr.xrecoff)));    
    
        return RecPtr;    
}    

什么是自定义还原点

在使用PostgreSQL内建的pg_create_restore_point函数创建还原点时用到XLogRestorePoint :

src/backend/access/transam/xlogfuncs.c

/*    
 * pg_create_restore_point: a named point for restore    
 */    
Datum    
pg_create_restore_point(PG_FUNCTION_ARGS)    
{    
        text       *restore_name = PG_GETARG_TEXT_P(0);    
        char       *restore_name_str;    
        XLogRecPtr      restorepoint;    
        char            location[MAXFNAMELEN];    
    
        if (!superuser())    
                ereport(ERROR,    
                                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),    
                                 (errmsg("must be superuser to create a restore point"))));    
    
        if (RecoveryInProgress())    
                ereport(ERROR,    
                                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),    
                                 (errmsg("recovery is in progress"),    
                                  errhint("WAL control functions cannot be executed during recovery."))));    
    
        if (!XLogIsNeeded())    
                ereport(ERROR,    
                                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),    
                         errmsg("WAL level not sufficient for creating a restore point"),    
                                 errhint("wal_level must be set to \"archive\" or \"hot_standby\" at server start.")));    
    
        restore_name_str = text_to_cstring(restore_name);    
    
        if (strlen(restore_name_str) >= MAXFNAMELEN)    
                ereport(ERROR,    
                                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),    
                                 errmsg("value too long for restore point (maximum %d characters)", MAXFNAMELEN - 1)));    
    
        restorepoint = XLogRestorePoint(restore_name_str);    
    
        /*    
         * As a convenience, return the WAL location of the restore point record    
         */    
        snprintf(location, sizeof(location), "%X/%X",    
                         restorepoint.xlogid, restorepoint.xrecoff);    
        PG_RETURN_TEXT_P(cstring_to_text(location));    
}    

经过以上介绍以后, 我们知道recoveryStopsHere开头部分的逻辑决定了PITR恢复可以选择截止在:

1. 事务结束时(COMMIT/ABORT);

2. 或者是用户使用pg_create_restore_point创建的还原点;

recoveryStopsHere接下来的部分针对recovery.conf中的配置, 判断是否截止恢复.

截至点的用法解说

在文章开头我们还提到了3个还原目标(target) :

(recovery_target_xid, recovery_target_time, recovery_target_name)

1. 未设置任何截至目标, 只返回false, 所以不会停止

        /* Do we have a PITR target at all? */    
        if (recoveryTarget == RECOVERY_TARGET_UNSET)    
        {    
                /*    
                 * Save timestamp of latest transaction commit/abort if this is a    
                 * transaction record    
                 */    
                if (record->xl_rmid == RM_XACT_ID)    
                        SetLatestXTime(recordXtime);    
                return false;    
        }    

RECOVERY_TARGET_UNSET 取自

src/include/access/xlog.h

/*    
 * Recovery target type.    
 * Only set during a Point in Time recovery, not when standby_mode = on    
 */    
typedef enum    
{    
        RECOVERY_TARGET_UNSET,    
        RECOVERY_TARGET_XID,    
        RECOVERY_TARGET_TIME,    
        RECOVERY_TARGET_NAME    
} RecoveryTargetType;    

2. recovery_target_xid 与 XLogRecord->xl_xid进行比较.

xid作为恢复目标时, recoveryTargetInclusive只影响日志输出(recoveryStopAfter).

原因是xid是按事务启动顺序分配的, 而不是按事务结束顺序分配. 并且这种target下面截止只可能在COMMIT/ABORT.

所以只要达到这个xid并且状态是commit/abort时, 就返回true.

*includeThis = recoveryTargetInclusive;只影响了日志输出. 而不是包含和不包含的意思.

        if (recoveryTarget == RECOVERY_TARGET_XID)    
        {    
                /*    
                 * There can be only one transaction end record with this exact    
                 * transactionid    
                 *    
                 * when testing for an xid, we MUST test for equality only, since    
                 * transactions are numbered in the order they start, not the order    
                 * they complete. A higher numbered xid will complete before you about    
                 * 50% of the time...    
                 */    
                stopsHere = (record->xl_xid == recoveryTargetXid);    
                if (stopsHere)    
                        *includeThis = recoveryTargetInclusive;    
        }    

日志输出时, 判断recoveryStopAfter :

        if (stopsHere)    
        {    
                recoveryStopXid = record->xl_xid;    
                recoveryStopTime = recordXtime;    
                recoveryStopAfter = *includeThis;    
    
                if (record_info == XLOG_XACT_COMMIT_COMPACT || record_info == XLOG_XACT_COMMIT)    
                {    
                        if (recoveryStopAfter)    
                                ereport(LOG,    
                                                (errmsg("recovery stopping after commit of transaction %u, time %s",    
                                                                recoveryStopXid,    
                                                                timestamptz_to_str(recoveryStopTime))));    
                        else    
                                ereport(LOG,    
                                                (errmsg("recovery stopping before commit of transaction %u, time %s",    
                                                                recoveryStopXid,    
                                                                timestamptz_to_str(recoveryStopTime))));    
                }    
                else if (record_info == XLOG_XACT_ABORT)    
                {    
                        if (recoveryStopAfter)    
                                ereport(LOG,    
                                                (errmsg("recovery stopping after abort of transaction %u, time %s",    
                                                                recoveryStopXid,    
                                                                timestamptz_to_str(recoveryStopTime))));    
                        else    
                                ereport(LOG,    
                                                (errmsg("recovery stopping before abort of transaction %u, time %s",    
                                                                recoveryStopXid,    
                                                                timestamptz_to_str(recoveryStopTime))));    
                }    

3. recovery_target_name 与 XLogRecData->data进行比较.

如果数据库中有多个重复命名的还原点, 遇到第一个则停止.

同时因为还原点的信息写在单独的xlog数据块中, 不是一条transaction record块, 所以也没有包含或不包含的概念, 直接截止.

不需要判断recovery_target_inclusive .

        else if (recoveryTarget == RECOVERY_TARGET_NAME)    
        {    
                /*    
                 * There can be many restore points that share the same name, so we    
                 * stop at the first one    
                 */    
                stopsHere = (strcmp(recordRPName, recoveryTargetName) == 0);    
    
                /*    
                 * Ignore recoveryTargetInclusive because this is not a transaction    
                 * record    
                 */    
                *includeThis = false;    
        }    

4. recovery_target_time 与 xl_xact_commit_compact->xact_time进行比较.

因为在同一个时间点, 可能有多个事务COMMIT/ABORT. 所以recovery_target_inclusive 在这里起到的作用是 :

截止于这个时间点的第一个提交的事务后(包含这个时间点第一个遇到的提交/回滚的事务);

或者截止于这个时间点提交的最后一个事务后(包括这个时间点提交/回滚的所有事务).

        else    
        {    
                /*    
                 * There can be many transactions that share the same commit time, so    
                 * we stop after the last one, if we are inclusive, or stop at the    
                 * first one if we are exclusive    
                 */    
                if (recoveryTargetInclusive)    
                        stopsHere = (recordXtime > recoveryTargetTime);    
                else    
                        stopsHere = (recordXtime >= recoveryTargetTime);    
                if (stopsHere)    
                        *includeThis = false;    
        }    

其中事务结束时间来自这个数据结构 :

src/include/access/xact.h

typedef struct xl_xact_commit_compact    
{    
        TimestampTz xact_time;          /* time of commit */    
        int                     nsubxacts;              /* number of subtransaction XIDs */    
        /* ARRAY OF COMMITTED SUBTRANSACTION XIDs FOLLOWS */    
        TransactionId subxacts[1];      /* VARIABLE LENGTH ARRAY */    
} xl_xact_commit_compact;    

从以上逻辑看到, recoveryTargetInclusive只有当恢复目标是xid或者time时可以指定.

目标是target name时不需要指定.

参考

1. src/include/catalog/pg_control.h

/* XLOG info values for XLOG rmgr */    
#define XLOG_CHECKPOINT_SHUTDOWN                0x00    
#define XLOG_CHECKPOINT_ONLINE                  0x10    
#define XLOG_NOOP                                               0x20    
#define XLOG_NEXTOID                                    0x30    
#define XLOG_SWITCH                                             0x40    
#define XLOG_BACKUP_END                                 0x50    
#define XLOG_PARAMETER_CHANGE                   0x60    
#define XLOG_RESTORE_POINT                              0x70    
#define XLOG_FPW_CHANGE                         0x80    

2. src/include/access/xlog.h

/*    
 * XLOG uses only low 4 bits of xl_info.  High 4 bits may be used by rmgr.    
 */    
#define XLR_INFO_MASK                   0x0F    

3. src/include/access/rmgr.h

/*    
 * Built-in resource managers    
 *    
 * Note: RM_MAX_ID could be as much as 255 without breaking the XLOG file    
 * format, but we keep it small to minimize the size of RmgrTable[].    
 */    
#define RM_XLOG_ID                              0    
#define RM_XACT_ID                              1    
#define RM_SMGR_ID                              2    
#define RM_CLOG_ID                              3    
#define RM_DBASE_ID                             4    
#define RM_TBLSPC_ID                    5    
#define RM_MULTIXACT_ID                 6    
#define RM_RELMAP_ID                    7    
#define RM_STANDBY_ID                   8    
#define RM_HEAP2_ID                             9    
#define RM_HEAP_ID                              10    
#define RM_BTREE_ID                             11    
#define RM_HASH_ID                              12    
#define RM_GIN_ID                               13    
#define RM_GIST_ID                              14    
#define RM_SEQ_ID                               15    
#define RM_SPGIST_ID                    16    

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat