Use case
In ClickHouse users re-balance the existing shards using FETCH, ATTACH in destination, and DETACH in source commands. But the issue is the writes, reads, and merges that happen during the phase of the ATTACH, DETACH process. This will lead to data inconsistencies which is not acceptable. Users also use this process in many use cases of data movement. In these scenarios, users are happy to see an error while accessing that part rather than receiving duplicated data.
Proposed Solution
If we allow a simple command to Disable Partition, it would be helpful to maintain data consistency. If any query hits the partition for reads or writes we can throw an error. This will allow the users to migrate in the background by ensuring data consistency. We can also ignore these partitions in our MergeSelection Algorithm. We will not need any heavy code changes. Each shard just needs to maintain the partitions that are disabled. We can create a system table so that users get visibility of disabled partitions. Both the ENABLE/DISABLE partition commands are replicated across the replicas of the shard.
a sample command can look like
ALTER TABLE table_name [ON CLUSTER cluster] DISABLE|ENABLE PARTITION|PART partition_expr
Describe alternatives you've considered
There is no alternative to do this in clickhouse as of today. This solution can be used in many use cases, like re-balancing, data movement to S3/Azure/HDFS, and others
I can pick up the implementation. I would like to check community interest in supporting these commands. We can first support this in the ReplicatedMergeTree and extend it
Use case
In ClickHouse users re-balance the existing shards using FETCH, ATTACH in destination, and DETACH in source commands. But the issue is the writes, reads, and merges that happen during the phase of the ATTACH, DETACH process. This will lead to data inconsistencies which is not acceptable. Users also use this process in many use cases of data movement. In these scenarios, users are happy to see an error while accessing that part rather than receiving duplicated data.
Proposed Solution
If we allow a simple command to Disable Partition, it would be helpful to maintain data consistency. If any query hits the partition for reads or writes we can throw an error. This will allow the users to migrate in the background by ensuring data consistency. We can also ignore these partitions in our MergeSelection Algorithm. We will not need any heavy code changes. Each shard just needs to maintain the partitions that are disabled. We can create a system table so that users get visibility of disabled partitions. Both the ENABLE/DISABLE partition commands are replicated across the replicas of the shard.
a sample command can look like
ALTER TABLE table_name [ON CLUSTER cluster] DISABLE|ENABLE PARTITION|PART partition_expr
Describe alternatives you've considered
There is no alternative to do this in clickhouse as of today. This solution can be used in many use cases, like re-balancing, data movement to S3/Azure/HDFS, and others
I can pick up the implementation. I would like to check community interest in supporting these commands. We can first support this in the ReplicatedMergeTree and extend it