Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check for active replicas when waiting for commands #8314

Conversation

@javisantana
Copy link
Contributor

javisantana commented Dec 20, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Bug Fix

Changelog entry (up to few sentences, required except for Non-significant/Documentation categories):
OPTIMIZE TABLE query will not wait for offline replicas to perform the operation.

Detailed description:

This is more a question than a bugfix. Having a Replicated* table, when a "optimize" query is executed, the optimize commands are sent to the replicas and after that, leader waits for other replicas to finish the command.

The problem is when a replica is down for any reason, the command is still sent to the replica (which probably make sense) but after that the leader waits forever for the answer.

This patch (not tested, it's just to explain the issue) checks wether a replica is active or not and do not wait for the command to finish.

Does it makes sense?

@alexey-milovidov

This comment has been minimized.

Copy link
Member

alexey-milovidov commented Dec 22, 2019

For OPTIMIZE, it is 100% ok.

For DROP/DETACH/REPLACE PARTITION or CLEAR COLUMN IN PARTITION it is questionable, because a race condition exists: it's possible that some replicas will become active and after this query there are active replicas that still didn't execute the command. And it will be unclear what semantic the user should expect for replication_alter_partitions_sync = 2

@javisantana

This comment has been minimized.

Copy link
Contributor Author

javisantana commented Jan 7, 2020

so, it'd ok to just rewrite the PR so only OPTIMIZE queries check for active replicas?

Javi santana bot
@alexey-milovidov alexey-milovidov merged commit 8d9f85c into ClickHouse:master Jan 9, 2020
26 of 29 checks passed
26 of 29 checks passed
Description check Changelog entry is not found
Details
Integration tests (asan) failed:6, passed:384, error:0
Details
Integration tests (release) failed:6, passed:384, error:0
Details
ClickHouse build check 16/16 builds are OK
Details
Code coverage Coverage prepared
Details
Compatibility check Compatibility check passed
Details
Functional stateful tests (address) fail:0, passed:97
Details
Functional stateful tests (debug) fail:0, passed:97
Details
Functional stateful tests (memory) fail:0, passed:97
Details
Functional stateful tests (release) fail:0, passed:97
Details
Functional stateful tests (release, processors) fail:0, passed:97
Details
Functional stateful tests (thread) fail:0, passed:95, skipped:2
Details
Functional stateful tests (ubsan) fail:0, passed:97
Details
Functional stateless tests (address) fail:0, passed:1516, skipped:2
Details
Functional stateless tests (debug) fail:0, passed:1515, skipped:3
Details
Functional stateless tests (memory) fail:20, passed:21
Details
Functional stateless tests (release) fail:0, passed:1518
Details
Functional stateless tests (release, processors) fail:0, passed:1516, skipped:2
Details
Functional stateless tests (thread) fail:0, passed:1515, skipped:3
Details
Functional stateless tests (ubsan) fail:0, passed:1516, skipped:2
Details
Functional stateless tests (unbundled) fail:0, passed:1501, skipped:17
Details
Marker check All checks were started
PVS check New errors 0, total errors 19
Details
Performance test 1495/1498 queries are OK
Details
Split build smoke test Server started and responded
Details
Stress test (address) No errors found
Details
Stress test (thread) No errors found
Details
Style check Style check passed
Details
Unit tests fail: 0, passed: 5097
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.