-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash-Mod sharded backups / restore of replicated tables #639
Comments
root reason why we restore only in one replica in shard is when one replica push ATTACH_PART event to system.replication_queue and second replica can't make ATTACH PART cause part already attached in other replica your idea, looks good as i see maybe better use SELECT |
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639
I've made an attempt here - it is kind of heavy.. Please give me a heads up what you think |
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
I don't like idea of doing this on table level, because tables tends to be very different in size, quite often people have 1 big table and rest is small. Better will be do that on partition level, but it more complex. |
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
I see, based on consistency concerns from Slach, the current implementation includes multiple different modes for the sharding of the backup operation (individual tables, individual databases, first active replica only). I'd like to keep these modes of operation as I can definitely see how splitting the backup of a database across multiple replicas could cause issues in a star schema situation (missing dimension row for a corresponding fact table entry?). This current commit is sort of big - would it be alright if the first iteration only includes the table / database / first active replica sharded operation mode and then for a later PR (also linked with this issue) will implement the "partition" sharded operation mode? FWIW, I agree with you - our clickhouse setup has wide tables without star schema and the tables don't have some sort of informal foreign key / primary key relationship between each other. Our test backups did show the exact backup duration skew you were describing - as a handful of tables are much bigger than our average. |
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API.
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API. backup: Addressing changes for adding sharding support Add in different sharded operation modes to give users the ability to specify granularity of sharding
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API. backup: Addressing changes for adding sharding support Add in different sharded operation modes to give users the ability to specify granularity of sharding
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API. backup: Addressing changes for adding sharding support Add in different sharded operation modes to give users the ability to specify granularity of sharding
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API. backup: Addressing changes for adding sharding support Add in different sharded operation modes to give users the ability to specify granularity of sharding
added to changelog: Add `SHARDED_OPERATION_MODE` option, to easy create backup for sharded cluster, available values none (no sharding), table (table granularity), database (database granularity), first-replica (on the lexicographically sorted first active replica), thanks @mskwon, fix [639](#639), fix [648](#648)
Hi,
As I understand it, the way to back up a replicated table is to take the back up from a single replica. To restore this replicated table, the schema must be restored on all of the replicas and then the data must be restored on a single replica - after which the replication will take over to restore all replicas.
I was wondering what you thought of adding a flag which will make clickhouse-backup query system.replicas and then perform a hash of the table name and then doing a mod with the number of active replicas for the table. If the resulting number is the same as the index of the current replica in a sorted list of active replicas for the table, then this table is selected for backup.
I was thinking that this might be nice as then the backup and the restore could be distributed among all of the replicas, all of the unreplicated tables would also be covered (since N%1 == 0). It would also be the case that the schema-only restore of a replica could be replaced with just a full restore. This behavior could be selected with a flag --sharded-backup perhaps.
While it is true that this strategy might not work so well in the case of a network partition, I'd assume that backups will more often than not occur when the installation is in a good state.
The text was updated successfully, but these errors were encountered: