New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to snapshot replicating primary shards #6139
Add ability to snapshot replicating primary shards #6139
Conversation
I want to do a deeper review but how do we handle bw compat here if a node is from a prev version and reads the "WAITING" state which is doesn't know about it just fails? |
Yes, the node with old version will fail on trying to process new cluster state if there is a running snapshot that was started while a primary shard participating in the snapshot was relocated. That's why, as I mentioned in the comment, we cannot have snapshots during 1.1->1.2 rolling upgrade. We will need to add this to release notes. Alternatively, I could also add a check for the presence of older version nodes and if they are present - fallback to the old behavior. That's not going to be bulletproof solution though, since an older version node can suddenly show up later and will not be able to join the cluster. What do you think? |
+1 for best effort but It might make sense to have this as a separate commit so it's easier to review? I will do a review of this in the meanwhile. |
Added version check as a separate commit. |
} else { | ||
this.shards = shards; | ||
HashMap<String, ImmutableList.Builder<ShardId>> waitingIndices = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we name this differently it's super hard to differentiate between the member and the local var. I also would love to not use null
as the marker but make it final and check if it is empty? The extra object creation here is not worth the complexity IMO. We might even think of putting this into a separate method for readability ie Map<X,Y> findWaitingIndices()
?
so bascially if we snapshot an index that has a primary that is initializing or relocating we just wait for the shards to start up. I think we should documentt this behavior and also test that we can abort such a waiting snapshot just in case it never get's to a state where we are able to snapshot. I wonder if we also should add tests where we have snapshots in multiple states ie. a chaos snapshotting while indexing and relocating. And just make sure we are not seeing any failures or broken snapshots? |
@s1monw I've implemented suggested changes. Thanks! |
long start = System.currentTimeMillis(); | ||
// Produce chaos for 30 sec or until snapshot is done whatever comes first | ||
int randomIndices = 0; | ||
while(System.currentTimeMillis() - start < 30000 && !snapshotIsDone("test-repo", "test-snap")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we make this test somehow operations based rather than time based. The Problem with time-based tests is that you have a very very hard time to get anywhere near reproducability. I know this is multithreaded anyways so it won't really reproduces but we can nicely randomize the num ops per thread which make it a bit nicer :)
FYI - I removed the labels in favor of the issue |
LGTM |
This change adds a new cluster state that waits for the replication of a shard to finish before starting snapshotting process. Because this change adds a new snapshot state, an pre-1.2.0 nodes will not be able to join the 1.2.0 cluster that is currently running snapshot/restore operation. Closes elastic#5531
This change adds a new cluster state that waits for the replication of a shard to finish before starting snapshotting process. Because this change adds a new snapshot state, it will not be possible to do rolling upgrade from 1.1 branch to 1.2 with snapshot is being executed in the cluster.
Closes #5531