New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try fix 'some fetches may stuck' #30346
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most complex fix for our replication I've seen.
LogEntryPtr parsed_entry = {}; | ||
}; | ||
|
||
std::vector<QueueEntryInfo> source_queue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need comment with general idea.
String path_created = dynamic_cast<const Coordination::CreateResponse &>(*results.back()).path_created; | ||
log_entry->znode_name = path_created.substr(path_created.find_last_of('/') + 1); | ||
queue.insert(zookeeper, log_entry); | ||
zookeeper->multi(ops); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to hack queue here, because new entry will pulled in pullLogsToQueue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this change was incorrect, because we cannot pull entries from queue to queue
/// but before we copied its active parts set. In this case we will GET_PART entry in our queue | ||
/// and later will pull the original GET_PART from replication log. | ||
/// It should not cause any issues, but it does not allow to get rid of duplicated entries and add an assertion. | ||
if (created_get_parts.count(part_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe pass created_get_parts
as argument. Currently it's hard to understand that should_ignore_log_entry
depends on it.
@@ -324,7 +326,8 @@ class ReplicatedMergeTreeQueue | |||
/** Remove the action from the queue with the parts covered by part_name (from ZK and from the RAM). | |||
* And also wait for the completion of their execution, if they are now being executed. | |||
*/ | |||
void removePartProducingOpsInRange(zkutil::ZooKeeperPtr zookeeper, const MergeTreePartInfo & part_info, const ReplicatedMergeTreeLogEntryData & current); | |||
void removePartProducingOpsInRange(zkutil::ZooKeeperPtr zookeeper, const MergeTreePartInfo & part_info, | |||
const std::optional<ReplicatedMergeTreeLogEntryData> & current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is current? Maybe add a comment?
Functional stateless tests (debug) - race between
|
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Minor improvements in replica cloning and enqueuing fetch for broken parts, that should avoid extremely rare hanging of
GET_PART
entries in replication queue.