Describe the unexpected behaviour
It appears that the built in ClickHouse Backup and Restore doesn't write (or restore) a table's metadata version correctly - it gets set back to 0 after a restore:
How to reproduce
Clickhouse Version 24.5.1.1763
First create a table:
CREATE TABLE backup_test_db.base__core__users_sharded
(
`user_id` UInt64,
`shop_id` UInt64,
`first_name` Nullable(String),
`_row_version` DateTime64(3)
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/backup_test_db/{shard}/base__core__users_sharded', '{replica}', _row_version)
PRIMARY KEY (shop_id, user_id)
ORDER BY (shop_id, user_id)
SETTINGS index_granularity = 8192, clean_deleted_rows = 'Never'
Then insert some data into it:
insert into base__core__users_sharded (`user_id`, `shop_id`, `first_name`) VALUES (0, 0, 'Test1')
insert into base__core__users_sharded (`user_id`, `shop_id`, `first_name`) VALUES (1, 0, 'Test2')
insert into base__core__users_sharded (`user_id`, `shop_id`, `first_name`) VALUES (2, 0, 'Test3')
We can see in the output below that the metadata_version of the table is 0:
clickhouse :) select * from system.tables where database='backup_test_db';\G
SELECT *
FROM system.tables
WHERE database = 'backup_test_db'
Query id: 7c6057c8-4dd4-48db-8a41-dd3026f8ffba
Row 1:
──────
database: backup_test_db
name: base__core__users_sharded
uuid: c2eecb57-b7a6-4512-9a6f-04c7b3cb7510
engine: ReplicatedReplacingMergeTree
is_temporary: 0
data_paths: ['/var/lib/clickhouse/disks/data/0/store/c2e/c2eecb57-b7a6-4512-9a6f-04c7b3cb7510/','/var/lib/clickhouse/disks/gcs_metadata/store/c2e/c2eecb57-b7a6-4512-9a6f-04c7b3cb7510/']
metadata_path: /var/lib/clickhouse/store/2cc/2cc7b27f-a516-4adb-ae88-da62b0ab8d35/base__core__users_sharded.sql
metadata_modification_time: 2024-07-22 17:26:30
metadata_version: 0
dependencies_database: []
dependencies_table: []
create_table_query: CREATE TABLE backup_test_db.base__core__users_sharded (`user_id` UInt64, `shop_id` UInt64, `first_name` Nullable(String), `_row_version` DateTime64(3)) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/backup_test_db/{shard}/base__core__users_sharded', '{replica}', _row_version) PRIMARY KEY (shop_id, user_id) ORDER BY (shop_id, user_id) SETTINGS index_granularity = 8192, clean_deleted_rows = 'Never'
engine_full: ReplicatedReplacingMergeTree('/clickhouse/tables/backup_test_db/{shard}/base__core__users_sharded', '{replica}', _row_version) PRIMARY KEY (shop_id, user_id) ORDER BY (shop_id, user_id) SETTINGS index_granularity = 8192, clean_deleted_rows = 'Never'
as_select:
partition_key:
sorting_key: shop_id, user_id
primary_key: shop_id, user_id
sampling_key:
storage_policy: default
total_rows: 3
total_bytes: 1440
total_bytes_uncompressed: 628
parts: 3
active_parts: 3
total_marks: 6
lifetime_rows: ᴺᵁᴸᴸ
lifetime_bytes: ᴺᵁᴸᴸ
comment:
has_own_data: 1
loading_dependencies_database: []
loading_dependencies_table: []
loading_dependent_database: []
loading_dependent_table: []
1 row in set. Elapsed: 0.004 sec.
If we check the metadata_version of one of the parts on disk, it is also 0:
$ cat /var/lib/clickhouse/disks/data/0/store/c2e/c2eecb57-b7a6-4512-9a6f-04c7b3cb7510/all_2_2_0/metadata_version.txt
0
Now we alter the table by adding a new column which will increment it's metadata version:
clickhouse :) ALTER TABLE backup_test_db.base__core__users_sharded ADD COLUMN `last_name` Nullable(String) after `first_name`
clickhouse :) select * from system.tables where database='backup_test_db';\G
SELECT *
FROM system.tables
WHERE database = 'backup_test_db'
Query id: ea4d23d2-65b5-4cbe-bc0d-db96a446a4fb
Row 1:
──────
database: backup_test_db
name: base__core__users_sharded
uuid: c2eecb57-b7a6-4512-9a6f-04c7b3cb7510
engine: ReplicatedReplacingMergeTree
is_temporary: 0
data_paths: ['/var/lib/clickhouse/disks/data/0/store/c2e/c2eecb57-b7a6-4512-9a6f-04c7b3cb7510/','/var/lib/clickhouse/disks/gcs_metadata/store/c2e/c2eecb57-b7a6-4512-9a6f-04c7b3cb7510/']
metadata_path: /var/lib/clickhouse/store/2cc/2cc7b27f-a516-4adb-ae88-da62b0ab8d35/base__core__users_sharded.sql
metadata_modification_time: 2024-07-22 17:43:42
metadata_version: 1
dependencies_database: []
dependencies_table: []
create_table_query: CREATE TABLE backup_test_db.base__core__users_sharded (`user_id` UInt64, `shop_id` UInt64, `first_name` Nullable(String), `last_name` Nullable(String), `_row_version` DateTime64(3)) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/backup_test_db/{shard}/base__core__users_sharded', '{replica}', _row_version) PRIMARY KEY (shop_id, user_id) ORDER BY (shop_id, user_id) SETTINGS index_granularity = 8192, clean_deleted_rows = 'Never'
engine_full: ReplicatedReplacingMergeTree('/clickhouse/tables/backup_test_db/{shard}/base__core__users_sharded', '{replica}', _row_version) PRIMARY KEY (shop_id, user_id) ORDER BY (shop_id, user_id) SETTINGS index_granularity = 8192, clean_deleted_rows = 'Never'
as_select:
partition_key:
sorting_key: shop_id, user_id
primary_key: shop_id, user_id
sampling_key:
storage_policy: default
total_rows: 3
total_bytes: 1440
total_bytes_uncompressed: 628
parts: 3
active_parts: 3
total_marks: 6
lifetime_rows: ᴺᵁᴸᴸ
lifetime_bytes: ᴺᵁᴸᴸ
comment:
has_own_data: 1
loading_dependencies_database: []
loading_dependencies_table: []
loading_dependent_database: []
loading_dependent_table: []
1 row in set. Elapsed: 0.003 sec.
We add some new data into the table which should generate parts with metadata version 1:
insert into backup_test_db.base__core__users_sharded (`user_id`, `shop_id`, `first_name`, `last_name`) VALUES (3, 0, 'Test4', 'Test4')
insert into backup_test_db.base__core__users_sharded (`user_id`, `shop_id`, `first_name`, `last_name`) VALUES (4, 0, 'Test5', 'Test5')
$ cat /var/lib/clickhouse/disks/data/0/store/c2e/c2eecb57-b7a6-4512-9a6f-04c7b3cb7510/all_4_4_0/metadata_version.txt
1
Now we perform a Backup of the tables on both replicas:
On Replica 0:
BACKUP TABLE backup_test_db.base__core__users_sharded TO Disk('gcs_backups', 'clickhouse-2-0/backup_test_db/full/2024-07-22')
On Replica 1:
BACKUP TABLE backup_test_db.base__core__users_sharded TO Disk('gcs_backups', 'clickhouse-2-1/backup_test_db/full/2024-07-22')
Then drop the tables on both replicas:
DROP TABLE backup_test_db.base__core__users_sharded SYNC
and then restore the tables (full restore on replica 0, structure only on replica 1)
On Replica 0:
RESTORE TABLE backup_test_db.base__core__users_sharded FROM Disk('gcs_backups', 'clickhouse-2-0/backup_test_db/full/2024-07-22')
On Replica 1:
RESTORE TABLE backup_test_db.base__core__users_sharded FROM Disk('gcs_backups', 'clickhouse-2-1/backup_test_db/full/2024-07-22') SETTINGS structure_only=true
Now we insert some data which should trigger a merge after a while:
insert into backup_test_db.base__core__users_sharded (`user_id`, `shop_id`, `first_name`, `last_name`) VALUES (5, 0, 'Test6', 'Test6')
insert into backup_test_db.base__core__users_sharded (`user_id`, `shop_id`, `first_name`, `last_name`) VALUES (6, 0, 'Test7', 'Test7')
insert into backup_test_db.base__core__users_sharded (`user_id`, `shop_id`, `first_name`, `last_name`) VALUES (7, 0, 'Test8', 'Test8')
insert into backup_test_db.base__core__users_sharded (`user_id`, `shop_id`, `first_name`, `last_name`) VALUES (8, 0, 'Test9', 'Test9'
And after a while we see this error in the debug logs regarding Merges:
2024.07.22 18:38:47.081610 [ 696 ] {} <Debug> 6384a81f-f127-48e5-8b84-70a5b6780cb8::all_0_4_1 (MergeFromLogEntryTask): Source part metadata version 1 is newer then the table metadata version 0. ALTER_METADATA is still in progress.
2024.07.22 18:38:47.084483 [ 697 ] {} <Debug> 6384a81f-f127-48e5-8b84-70a5b6780cb8::all_0_4_1 (MergeFromLogEntryTask): Source part metadata version 1 is newer then the table metadata version 0. ALTER_METADATA is still in progress.
2024.07.22 18:38:47.087278 [ 679 ] {} <Debug> 6384a81f-f127-48e5-8b84-70a5b6780cb8::all_0_4_1 (MergeFromLogEntryTask): Source part metadata version 1 is newer then the table metadata version 0. ALTER_METADATA is still in progress.
2024.07.22 18:38:47.090295 [ 694 ] {} <Debug> 6384a81f-f127-48e5-8b84-70a5b6780cb8::all_0_4_1 (MergeFromLogEntryTask): Source part metadata version 1 is newer then the table metadata version 0. ALTER_METADATA is still in progress.
2024.07.22 18:38:47.093332 [ 709 ] {} <Debug> 6384a81f-f127-48e5-8b84-70a5b6780cb8::all_0_4_1 (MergeFromLogEntryTask): Source part metadata version 1 is newer then the table metadata version 0. ALTER_METADATA is still in progress.
Expected behavior
A Backup and Restore should set the metadata version in Zookeeper/Keeper correctly.
Additional context
Note that we can work around this problem by manually setting the metadata versions for each table and replica in Zookeeper/Keeper, but it takes a while to sort out the correct metadata versions for all the tables and set them:
SET /clickhouse/tables/<database>/<replica>/<table>/metadata_version "1"
Describe the unexpected behaviour
It appears that the built in ClickHouse Backup and Restore doesn't write (or restore) a table's metadata version correctly - it gets set back to 0 after a restore:
How to reproduce
Clickhouse Version
24.5.1.1763First create a table:
Then insert some data into it:
We can see in the output below that the
metadata_versionof the table is0:If we check the metadata_version of one of the parts on disk, it is also
0:Now we alter the table by adding a new column which will increment it's metadata version:
We add some new data into the table which should generate parts with metadata version 1:
Now we perform a Backup of the tables on both replicas:
On Replica 0:
On Replica 1:
Then drop the tables on both replicas:
and then restore the tables (full restore on replica 0, structure only on replica 1)
On Replica 0:
On Replica 1:
Now we insert some data which should trigger a merge after a while:
And after a while we see this error in the debug logs regarding Merges:
Expected behavior
A Backup and Restore should set the metadata version in Zookeeper/Keeper correctly.
Additional context
Note that we can work around this problem by manually setting the metadata versions for each table and replica in Zookeeper/Keeper, but it takes a while to sort out the correct metadata versions for all the tables and set them: