New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare the introduction of more keeper faults #56917
Conversation
This is an automated comment for commit 6cf8c9b with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
It seems I will need to bring more commits from #56389. As the tests run in parallel, the moment you enable more faults (some ops had some disabled, and the refactor got rid of that) they all clash with each other more frequently so you see the lack of proper recovery more often, e.g. the |
|
Errors:
|
Failures:
|
{ | ||
my_storage.enqueuePartForCheck(part_name, MAX_AGE_OF_LOCAL_PART_THAT_WASNT_ADDED_TO_ZOOKEEPER); | ||
}); | ||
LOG_TRACE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tavplubix Do you mind having a look at this changes, specially checking that rename_part_to_temporary();
in the case where the data wasn't written to keeper (to retry) is safe? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it's not safe due to zero copy locks and the bug is also present when the block_id was created by a different replica concurrently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with comments
@@ -43,11 +44,9 @@ RestoreCoordinationRemote::RestoreCoordinationRemote( | |||
if (my_is_internal) | |||
{ | |||
String alive_node_path = my_zookeeper_path + "/stage/alive|" + my_current_host; | |||
zk->deleteEphemeralNodeIfContentMatches(alive_node_path, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth a good comment. And probably it would be more convenient to hide deleteEphemeralNodeIfContentMatches()
inside create()
with zkutil::CreateMode::Ephemeral
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not hide it for now. In some places the ephemeral node is created as part of a transaction and in other places we might not want to automatically recreate the ephemeral node. It also introduces extra round trips that are not necessary if you are not recovering from an error (one to check, one to remove or waiting for a long time for it to be removed).
I'll refactor it a bit
Working on rebasing the PR after the fix for zero-copy locks |
This reverts commit e4becc0.
Reimplemented ZK retries to |
If there are no tables left merge() will keep throwing errors constantly
Everything green except Marking it as ready again after the last changes. cc @devcrafter in case you want to review the latest changes before merging |
Need to rebase on top of 57764 which fixes some of the same problems found and fixed in this PR Done |
Even if it's not related, let's not ignore it. We can start with using CI DB to find when it started to fail. And it helps to find the reason quite easily: #57568 (comment) |
The new failures appeared with the merge with master and they look related. Need to investigate more, but I took master's way of improving ZK retries on backup coordination instead of the one in this PR so most likely the problem is there. |
Failures:
I would love to not have to rebase this again 😉 |
It'd be nice to understand what has changed from latest review |
The last 2 rebases:
|
|
||
system disable failpoint replicated_commit_zk_fail_after_op; | ||
system disable failpoint replicated_merge_tree_commit_zk_fail_when_recovering_from_hw_fault; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional, but we could check that after disabling failpoints, we can insert
{ | ||
if (!isEmpty()) | ||
{ | ||
WriteBufferFromOwnString buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we create just a method which will do this logging? (it's copy/pasted code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recovered the code by reverting partially the commit that removed it and kept it as-is for simplicity, but it could be improved / iterated in the future.
}); | ||
|
||
/// Independently of how many retries we had left we want to do at least one check of this inner retry | ||
/// at least once so a) we try to verify at least once if metadata was written and b) we set the proper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too many "at least once"
@@ -1019,22 +1038,6 @@ std::pair<std::vector<String>, bool> ReplicatedMergeTreeSinkImpl<async_insert>:: | |||
{ | |||
zookeeper->setKeeper(storage.getZooKeeper()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about moving this block to lock and commit
stage. Probably, you could explain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it because the table being in read only only matters when we are going to add a new part/commit to ZK. If we only have checks left (for example in resolve_duplicate_stage
) it doesn't make sense to not continue verifying because the table is stopped, as we can still return a proper response to the client.
A similar thing will happen for quorum for example (but it's not in this PR to keep in more concise). It's ok to check for quorum if the write succeeded but the table was set in read only after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Algunenano My comments are mostly minor, except probably #56917 (comment). But if you think it's fine, please feel free to fix comments in follow-up PR
Failures:
|
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Longer details:
Refactors ZooKeeperWithFaultInjection to prepare to introduce it in more places:
It reworks how part_committed_locally_but_zookeeper (old) / resolve_uncertain_commit_stage (less old) worked so that we can keep retrying in case the ZK session failed when committing. Before the uncertain stage could only be resolved if ZK failed after write (and in that case everything was good) but it wouldn't recover it data wasn't written to keeper. Now it recovers both cases.
Fixes an issue found related with ephemeral nodes cleaning up in backups fault and changes the name (
handleEphemeralNodeExistence
->deleteEphemeralNodeIfContentMatches
) to avoid further confusions. I included in this PR since changing the name and introducing faults is what leads to noticing the problem and enables fixing them only once (and not twice, one in a different PR and another time in the refactor)AFAICS. Closes #50465 as it removed the special handling of tryMulti() responses
cc @devcrafter