Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a small number of hosts down, on cluster ddl will not return timeout exception #25412

Open
marising opened this issue Jun 17, 2021 · 3 comments
Labels

Comments

@marising
Copy link
Contributor

marising commented Jun 17, 2021

Use case
#21574
#23062

When a small number of hosts down, whether it is a permanent or temporary, ON CLUSTER DDL will not return timeout exception.

Describe the solution you'd like

  1. When the host starts, run a background thread to periodically check for 60s, and register the current host as the ephemeral node of zooKeeper, the path is is_active.
    Activate current host as ephemeral node in zookeeper #26269

  2. Distributed_ddl_output_mode is set to QUORUM, when DDLQueryStatusInputStream is output, all active hosts are executed successfully, there are only inactive nodes in unfinished_hosts, and inactive hosts are a minority
    2.1. unfinished_hosts are all inactive hosts
    2.2. Check each shard : the count of inactive host replicas / the total count of shard replicas <= 1/2
    Add quorum mode to the distributed_ddl_output_mode when executing DDL… #27004

Satisfy the above 2.1 and 2.2, will not wait for more than distributed_ddl_task_timeout, directly return the information of the number of successful hosts and the number of failed hosts, otherwise it will wait for timeout.

I plan to implement this feature, can you give me some suggestions? @tavplubix

@marising
Copy link
Contributor Author

@filimonov @alexey-milovidov Do you have any comments?

@alexey-milovidov
Copy link
Member

The idea is to simply add a mode - wait for all active servers.
The server is active if it holds an ephemeral node in Keeper.

@azat
Copy link
Collaborator

azat commented Nov 14, 2022

It seems that the alternative can be to use cluster discovery right now - #31442

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants