New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements for failover of Distributed queries #6399
Merged
alexey-milovidov
merged 3 commits into
ClickHouse:master
from
Enmk:replica_recovery_interval
Sep 7, 2019
Merged
Improvements for failover of Distributed queries #6399
alexey-milovidov
merged 3 commits into
ClickHouse:master
from
Enmk:replica_recovery_interval
Sep 7, 2019
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
alexey-milovidov
changed the title
Replica failover fixes
Improvements for failover of Distributed queries
Aug 8, 2019
Let's merge with master, because we have fixes for integration and performance tests. |
Enmk
force-pushed
the
replica_recovery_interval
branch
3 times, most recently
from
August 22, 2019 09:20
0408f0e
to
77256fd
Compare
akuzm
reviewed
Aug 22, 2019
akuzm
reviewed
Aug 22, 2019
akuzm
reviewed
Aug 23, 2019
Enmk
force-pushed
the
replica_recovery_interval
branch
5 times, most recently
from
August 29, 2019 14:22
c807551
to
c1c1d1d
Compare
* Added a limit on how many errors can replica accumulate * Decreased default error halving time to 60 seconds * Made both configurable via settings * Showing errors count and estimated recovery time for each replica in system.clusters
* Actually using the replica recovery settings for cluster * A bit of doc on DBMS_CONNECTION_POOL_WITH_FAILOVER_MAX_ERROR_COUNT * StorageDistributedDirectoryMonitor using settings for ConnectionPoolWithFailover * Using SettingSeconds instead of SettingUInt64 for replica_error_decrease_period
Enmk
force-pushed
the
replica_recovery_interval
branch
from
September 2, 2019 15:18
c1c1d1d
to
f98c488
Compare
akuzm
reviewed
Sep 4, 2019
The docs should be updated to describe the new settings and the new fields in system.clusters. |
Enmk
force-pushed
the
replica_recovery_interval
branch
from
September 4, 2019 15:20
12f82af
to
f248726
Compare
akuzm
reviewed
Sep 4, 2019
Docs updated, settings renamed. |
Renamed settings, updated docs.
Enmk
force-pushed
the
replica_recovery_interval
branch
from
September 5, 2019 10:36
f248726
to
c2fc71b
Compare
akuzm
approved these changes
Sep 5, 2019
alexey-milovidov
approved these changes
Sep 7, 2019
KochetovNicolai
added
the
pr-improvement
Pull request with some product improvements
label
Sep 19, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
For changelog. Remove if this is non-significant change.
Category (leave one):
Short description (up to few sentences):
Improvements for failover of Distributed queries. Shorten recovery time, also it is now configurable and can be seen in system.clusters.
...
Detailed description (optional):
replica_error_max_count
andreplica_error_decrease_period
system.clusters
Example
errors_count
is number of times this host tried to reach replica but failedestimated_recovery_time
how many seconds are left until replica error count is zeroed and replica is considered to be back to normalPlease note that
errors_count
is updated once per query to the cluster, butestimated_recovery_time
is recalculated on-demand. So there could be a case of non-zeroerrors_count
and zeroestimated_recovery_time
, that means that next query will zero errors count and will try to use replica as if it had no errors....
Fixes #5317