-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi zookeeper clusters #15087
Conversation
In production environment, hundreds of machines hold thousands of ReplicatedMergeTree tables. |
@fastio Good feature. We are also suffering from this problem, there are nearly 20 million znodes in our cluster, and high qps often leads But
|
@sundy-li Thanks.
True, we can't copy data of zookeeper cluster to another one. If the multi zookeeper clusters supported at clickhouse cluster level, we can add a new zookeeper cluster, and migrate the meta data of table from old zookeeper to new. When migrating the meta data of one table, the other tables are not affected.
I agree with you, i.e. RAFT protocol can be used to implement the leader election and data replication for ReplicatedMergeTree. |
I guess it will be very useful if somebody can write a documentation article with proper steps how to move table from single zookeeper to another one. |
Hi @sundy-li , in order to avoid copy data, when change zookeper cluster, is it posible to use detatch or atatch opration? |
@qoega Thanks for your reply. Yes you are right.
|
@Akazz Hi, could you review this pr? Thanks |
Hi team, Maybe we have another solution for clickhouse to support multi zookeeper clusters.
If the zookeeper endpoint is not set, we use the configuration of zookeeper in the config.xml, How do you think about this? |
It seems we can extend our current ALTER PARTITION primitives to support cross-zookeeper operations. We already support @fastio btw, I feel that adding another configuration section, a.k.a
|
@fastio I would say that for now (as long as PR #14155 is in ) it is better to integrate with @amosbird's |
@@ -112,7 +112,7 @@ struct Settings; | |||
/** Obsolete settings. Kept for backward compatibility only. */ \ | |||
M(UInt64, min_relative_delay_to_yield_leadership, 120, "Obsolete setting, does nothing.", 0) \ | |||
M(UInt64, check_delay_period, 60, "Obsolete setting, does nothing.", 0) \ | |||
|
|||
M(String, zookeeper_cluster, "default", "Name of zookeeper cluster.", 0) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to extract all references to token "default"
and expose it as a globally defined const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it's better to be a globally const . I'll fix it.
@@ -768,6 +771,7 @@ static StoragePtr create(const StorageFactory::Arguments & args) | |||
std::move(storage_settings), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a separate field for storing zookeeper_cluster when storage_settings
is available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary, should be removed.
Thanks for your suggestion. |
Thanks for your review. I'll fix the issues soon. |
Work to be continued in #17070 |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Now it's possible to make different ReplicatedMergeTree engine with different ZooKeeper clusters.
The auxiliary zookeepers configuration (#14155 ) is reused to support this feature.
We can create table like follow:
CREATE TABLE t1 (a String) ENGINE= ReplicatedMergeTree('test:/tables/t1','{replica}') ORDER BY a;
It means that the metadata of the table named t1 is saved in zookeeper cluster named test, which configurated in
auxiliary_zookeepers.
CREATE TABLE t2 (a String) ENGINE= ReplicatedMergeTree('/tables/t2','{replica}') ORDER BY a;
It means that the metadata of the table named t2 is saved in the default zookeeper cluster, which configurated in
zookeeper.
In fact, the implementation compatibles with the old configuration.
Detailed description / Documentation draft: