You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pegasus currently supports cold backup and restore functions, but both of them have some disadvantages.
For cold backup, pegasus supports periodic backup through policy. Users can create a policy with backup related parameters such as provider, interval time, and apply this policy to sereval tables. Besides, pegasus also supports onetime backup since release 2.3.0.
However, backup function has following disadvantages:
Periodic backup can not start accurately by start time
When Periodic backup interval time is less than 1 day, periodic backup will be triggered unexpectedly.
User defined provider path is not supported for periodic backup.
Once backup is started, it can not be canceled. When backup failed, it will continue to retry until succeed, even restart meta server.
Current backup will cost heavy I/O during copying checkpoint.
The path on provider is hard to find one table's backup.
Backup code is not firendly to read and maintain.
For restore, pegasus supports two data_version. Tables created in release 1.x is V0, and tables created in release 2.x is V1. Restore process will create an empty table, then apply the backup checkpoint. There will be a compatible problem that release 2.x table can not apply V0 checkpoint, which will lead to coredump making cluster useless. As a result, restore need to check table data_version to make it robust.
New backup design
The enhance version of backup, simplify backup v2, will solve all probelms above, providing a simple backup function.
meta server will have a timer to check whether periodic backup should be triggered
for first triggered backup, server will check it by start_time whose format is like "15:00"
for not-first backup, server will compare last backup start time and periodic backup interval
periodic backup is not allowed to be modified, but can be deleted and recreated
Backup service - manage cluster all tables backup, including onetime backup and periodic backup. Besides, it also expose the rpc interface to admin-cli and shell
add table periodic backup policy
query periodic backup policy
disable/enable periodic backup policy
delete periodic backup policy
start onetime backup
query backup (onetime and periodic)
cancel backup (onetime and periodic)
Main flow
when receving start backup, engine will turn its backup status into checkpointing and send request to replica servers
replica will turn its state into checkpointing, and turn to checkpointed after generating checkpoint succeed
when all partitions status is checkpointed, meta will turn status into uploading
replica will turn its state into uploading, and turn to succeed after uploading checkpoint succeed, the backup checkpoint directory will be deleted after a while
when all partitions status is succeed, meta will turn status into succeed and consider backup succeed
if any errors happended during whole process, backup will be failed
if receiving cancel backup, checkpointing or uploading backup will be canceled
Background
Pegasus currently supports cold backup and restore functions, but both of them have some disadvantages.
For cold backup, pegasus supports periodic backup through policy. Users can create a policy with backup related parameters such as provider, interval time, and apply this policy to sereval tables. Besides, pegasus also supports onetime backup since release 2.3.0.
However, backup function has following disadvantages:
For restore, pegasus supports two data_version. Tables created in release 1.x is V0, and tables created in release 2.x is V1. Restore process will create an empty table, then apply the backup checkpoint. There will be a compatible problem that release 2.x table can not apply V0 checkpoint, which will lead to coredump making cluster useless. As a result, restore need to check table data_version to make it robust.
New backup design
The enhance version of backup, simplify backup v2, will solve all probelms above, providing a simple backup function.
Components
Meta backup function is consist of three parts:
Main flow
checkpointing
and send request to replica serverscheckpointing
, and turn tocheckpointed
after generating checkpoint succeedcheckpointed
, meta will turn status intouploading
uploading
, and turn tosucceed
after uploading checkpoint succeed, the backup checkpoint directory will be deleted after a whilesucceed
, meta will turn status intosucceed
and consider backup succeedBackup paths
Path on remote storage (zk)
Path on remote backup provider (such as HDFS)
New restore
Restore v2 won't update design, just add data version check, refactor code and compatible for old backup path on backup provider.
Pull request merge plan
backup-restore-dev
, all pull reuqests will be firstly added into this branch, and finally into master branch.The text was updated successfully, but these errors were encountered: