Skip to content

Add repair queue#2336

Merged
BramGruneir merged 7 commits intomasterfrom
bram/repair
Sep 9, 2015
Merged

Add repair queue#2336
BramGruneir merged 7 commits intomasterfrom
bram/repair

Conversation

@BramGruneir
Copy link
Copy Markdown
Member

This is the basic structure for the repair queue. Please take a look.
This doesn't yet include the client test where we actually remove a store.

Comment thread storage/store_pool.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storeDetails are mutable; this should probably return a copy instead of a pointer for thread safety.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, done

@BramGruneir BramGruneir force-pushed the bram/repair branch 2 times, most recently from 2667218 to 06095c5 Compare September 2, 2015 22:54
Comment thread storage/repair_queue.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I'm really not sure about the replica being in both of these queues at once. They'll both be competing for the range descriptor.

Maybe the ReplicateQueue should prune dead replicas.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this a lot. I'm not too worried about the competition for the range descriptor. It's only going to occur in a 5+ replication system and if we notice that it is failing a lot, then we can optimize against it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we only enqueue it on the replicate queue after all of the replicas have been removed?

Comment thread storage/repair_queue.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is an untyped constant, you don't need need to wrap it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

@BramGruneir BramGruneir force-pushed the bram/repair branch 3 times, most recently from 063e7c5 to 2ea7f39 Compare September 8, 2015 20:58
@BramGruneir
Copy link
Copy Markdown
Member Author

@mrtracy I've added the client test and I think this is ready to go. Would you rather I check it in first, then we move the functionality into the uber-replicate queue, or just pull the relevant parts out of it?
I think we should get it in first, since it establishes the unit tests.

@BramGruneir BramGruneir changed the title Add repair queue [WIP] Add repair queue Sep 8, 2015
@mrtracy
Copy link
Copy Markdown
Contributor

mrtracy commented Sep 8, 2015

As long as its not breaking anything, I would say go ahead and merge this, and i'll be able to combine it with the unified queue in short order.

LGTM.

@BramGruneir
Copy link
Copy Markdown
Member Author

Excellent. Thanks.

Comment thread storage/client_raft_test.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a hand-rolled util.SucceedsWithin?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, it had more in it. But you're right, it is. I'll convert it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it doesn't match.
We can't do exponential backoff here unless an extra gossiping goroutine is added. I'm going to leave it as is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do exponential backoff here unless an extra gossiping goroutine is added.

I don't follow. why?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store pool relies on gossiped store descriptors every 10ms, if we did exponential backoff, it's possible to get to a point where we're past that timeout between runs. We also don't want to gossip them too often, as that may slow the test down as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Can you add a comment? also: for len(getRangeMetadata(proto.KeyMin, mtc, t)) != 2 {?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. And done.

Comment thread storage/client_raft_test.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/go /got /

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@BramGruneir
Copy link
Copy Markdown
Member Author

I think this is ready to go. @tamird, can you take one last look?

Comment thread storage/client_raft_test.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a, e pattern

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Sep 9, 2015

LGTM modulo some nits. Thanks for addressing.

BramGruneir added a commit that referenced this pull request Sep 9, 2015
@BramGruneir BramGruneir merged commit 228829c into master Sep 9, 2015
@BramGruneir BramGruneir deleted the bram/repair branch September 9, 2015 18:59
@BramGruneir BramGruneir mentioned this pull request Sep 9, 2015
@jess-edwards jess-edwards mentioned this pull request Sep 9, 2015
78 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants