Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Use predefined identifier instead of hostname for DDLWorker #57573

Closed
alesapin opened this issue Dec 6, 2023 · 1 comment · Fixed by #57603
Closed

[RFC] Use predefined identifier instead of hostname for DDLWorker #57573

alesapin opened this issue Dec 6, 2023 · 1 comment · Fixed by #57603
Assignees
Labels
comp-dddl Distributed DDL feature unfinished code

Comments

@alesapin
Copy link
Member

alesapin commented Dec 6, 2023

Using hostname as identifier in distributed systems coordination is a error-prone idea. DNS can be unavailable anytime and in this case participants will be unable to identify who is who and who they are. We have an example of such issue in DDLWorker:
https://github.com/ClickHouse/clickhouse/blob/master/src/Interpreters/DDLTask.cpp#L218-L286.

The idea is to introduce identifier for each host participating in cluster in addition to existing fields https://github.com/ClickHouse/clickhouse/blob/master/src/Interpreters/Cluster.h#L93-L109. By default we can use value of <replica>|<shard> macros for this identifier. Other option is sever UUID.

Instead of hostnames https://github.com/ClickHouse/clickhouse/blob/master/src/Interpreters/executeDDLQueryOnCluster.cpp#L116-L124 we will use this unique identifiers and always be able to verify who has to execute which entry without any issues related to DNS.

However this behavior is new and require proper specification of macros values in config. So it should be optional by default.

@alesapin alesapin added unfinished code comp-dddl Distributed DDL feature labels Dec 6, 2023
@evillique
Copy link
Member

evillique commented Dec 6, 2023

To be universal it seems like the proposed solution would require adding <name> to each replica and shard in the cluster definition, <includes_current_node>true</includes_current_node> for each cluster with current node to determine that we are in it, <shard> and <replica> macros must be filled and every cluster must be defined the same way on all the replicas. From the developer side, we would need to add a new version of DDLTask with a new flag at the end on whether we use this optional feature or not and a list of replica|shard names.

What sounds like a much easier solution, we add a macro <hostname>, and if it is specified, we will not try to resolve any of the hostnames, but simply compare it and find out whether we need to execute this query or not. For this, the hostname of the current node needs to be the same in all of the config configurations, but it looks a lot more manageable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-dddl Distributed DDL feature unfinished code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants