Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix possible "The local set of parts of X doesn't look like the set of parts in ZooKeeper" error #30826

Merged

Conversation

azat
Copy link
Collaborator

@azat azat commented Oct 28, 2021

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix possible "The local set of parts of X doesn't look like the set of parts in ZooKeeper" error (if DROP fails during removing znodes from zookeeper)

Detailed description / Documentation draft:
If during removing replica_path from zookeeper, some error occurred
(zookeeper goes away), then it may not remove everything from zookeeper.

And on DETACH/ATTACH (or server restart, like stress tests does in the
analysis from this comment 1), it will trigger an error:

The local set of parts of table test_1.alter_table_4 doesn't look like the set of parts in ZooKeeper:

Fix this, by removing "metadata" at first, and only after this
everything else, this will avoid this error, since on ATTACH such table
will be marked as read-only.

Cc: @alesapin

@azat azat marked this pull request as draft October 29, 2021 02:19
@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Oct 29, 2021
@alesapin
Copy link
Member

Fix looks reasonable, but I don't understand why tests fail.

@azat azat force-pushed the replicated-mt-fix-sanity-on-failed-drop branch from b1e2e76 to c6dd365 Compare October 29, 2021 15:23
@azat
Copy link
Collaborator Author

azat commented Oct 29, 2021

Fix looks reasonable, but I don't understand why tests fail.

Forget to remove remote_replica_path itself.

@azat azat marked this pull request as ready for review October 29, 2021 15:29
@azat
Copy link
Collaborator Author

azat commented Oct 29, 2021

Functional stateless tests (ubsan) — Timeout, fail: 0, passed: 0

2021-10-30 00:12:21 clickhouse-test: error: unrecognized arguments: --check-zookeeper-session

CIGithubActions / FunctionalStatelessTestDebug (pull_request) Failing after 59m — FunctionalStatelessTestDebug

Traceback (most recent call last):
  File "functional_test_check.py", line 281, in <module>
    state, description, test_results, additional_logs = process_results(result_path, server_log_path)
  File "functional_test_check.py", line 179, in process_results
    with open(status_path, 'r', encoding='utf-8') as status_file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/actions-runner/_work/_temp/stateless_debug/result_path/check_status.tsv'

@azat azat force-pushed the replicated-mt-fix-sanity-on-failed-drop branch 2 times, most recently from 3ff2fa3 to b796aab Compare October 29, 2021 19:51
@azat azat marked this pull request as draft October 30, 2021 21:33
…f parts in ZooKeeper" error

If during removing replica_path from zookeeper, some error occurred
(zookeeper goes away), then it may not remove everything from zookeeper.

And on DETACH/ATTACH (or server restart, like stress tests does in the
analysis from this comment [1]), it will trigger an error:

    The local set of parts of table test_1.alter_table_4 doesn't look like the set of parts in ZooKeeper:

  [1]: ClickHouse#28296 (comment)

Fix this, by removing "metadata" at first, and only after this
everything else, this will avoid this error, since on ATTACH such table
will be marked as read-only.

v2: forget to remove remote_replica_path itself
v3: fix test_drop_replica by adding a check for remote_replica_path existence
@azat azat force-pushed the replicated-mt-fix-sanity-on-failed-drop branch from b796aab to 60a4115 Compare November 1, 2021 06:01
@azat azat marked this pull request as ready for review November 1, 2021 06:02
@alesapin alesapin self-assigned this Nov 2, 2021
@alesapin
Copy link
Member

alesapin commented Nov 2, 2021

No failures related to changes.

@alesapin alesapin merged commit 085cb27 into ClickHouse:master Nov 2, 2021
@azat azat deleted the replicated-mt-fix-sanity-on-failed-drop branch November 2, 2021 07:34
@tavplubix
Copy link
Member

Fix this, by removing "metadata" at first, and only after this
everything else, this will avoid this error, since on ATTACH such table
will be marked as read-only.

It will not be marked as readonly, metadata node will be re-created and table will try to start up:

/// In old tables this node may missing or be empty
String replica_metadata;
const bool replica_metadata_exists = current_zookeeper->tryGet(replica_path + "/metadata", replica_metadata);
if (!replica_metadata_exists || replica_metadata.empty())
{
/// We have to check shared node granularity before we create ours.
other_replicas_fixed_granularity = checkFixedGranularityInZookeeper();
ReplicatedMergeTreeTableMetadata current_metadata(*this, metadata_snapshot);
current_zookeeper->createOrUpdate(replica_path + "/metadata", current_metadata.toString(),
zkutil::CreateMode::Persistent);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants