Add repair queue#2336
Conversation
There was a problem hiding this comment.
storeDetails are mutable; this should probably return a copy instead of a pointer for thread safety.
2667218 to
06095c5
Compare
There was a problem hiding this comment.
Hrm, I'm really not sure about the replica being in both of these queues at once. They'll both be competing for the range descriptor.
Maybe the ReplicateQueue should prune dead replicas.
There was a problem hiding this comment.
I've been thinking about this a lot. I'm not too worried about the competition for the range descriptor. It's only going to occur in a 5+ replication system and if we notice that it is failing a lot, then we can optimize against it.
There was a problem hiding this comment.
What if we only enqueue it on the replicate queue after all of the replicas have been removed?
06095c5 to
c510db9
Compare
3d799fe to
8b2d985
Compare
There was a problem hiding this comment.
0 is an untyped constant, you don't need need to wrap it
063e7c5 to
2ea7f39
Compare
|
@mrtracy I've added the client test and I think this is ready to go. Would you rather I check it in first, then we move the functionality into the uber-replicate queue, or just pull the relevant parts out of it? |
|
As long as its not breaking anything, I would say go ahead and merge this, and i'll be able to combine it with the unified queue in short order. LGTM. |
|
Excellent. Thanks. |
2ea7f39 to
ed2f276
Compare
There was a problem hiding this comment.
is this a hand-rolled util.SucceedsWithin?
There was a problem hiding this comment.
Originally, it had more in it. But you're right, it is. I'll convert it.
There was a problem hiding this comment.
Actually, it doesn't match.
We can't do exponential backoff here unless an extra gossiping goroutine is added. I'm going to leave it as is.
There was a problem hiding this comment.
We can't do exponential backoff here unless an extra gossiping goroutine is added.
I don't follow. why?
There was a problem hiding this comment.
The store pool relies on gossiped store descriptors every 10ms, if we did exponential backoff, it's possible to get to a point where we're past that timeout between runs. We also don't want to gossip them too often, as that may slow the test down as well.
There was a problem hiding this comment.
OK. Can you add a comment? also: for len(getRangeMetadata(proto.KeyMin, mtc, t)) != 2 {?
There was a problem hiding this comment.
Sure, done. And done.
|
I think this is ready to go. @tamird, can you take one last look? |
|
LGTM modulo some nits. Thanks for addressing. |
b1f95bb to
cff7aed
Compare
6bfc07d to
1a4e6fc
Compare
This is the basic structure for the repair queue. Please take a look.
This doesn't yet include the client test where we actually remove a store.