Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add a callback to StorePool for store status updates #12528

Merged
merged 1 commit into from
Jan 9, 2017

Conversation

spencerkimball
Copy link
Member

@spencerkimball spencerkimball commented Dec 21, 2016

This prevents busy purgatory checks for the replicate queue on every
node liveness heartbeat. However, we still do the check on the
periodic node status updates, which happen once a minute by default.

Fixes #12298


This change is Reviewable

@petermattis
Copy link
Collaborator

:lgtm:


Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/storage/store_pool.go, line 266 at r1 (raw file):

	for _, cb := range sp.mu.callbacks {
		cb(detail.desc, status)
	}

Are there any locking concerns with invoking the callbacks while holding StorePool.mu?


Comments from Reviewable

@spencerkimball
Copy link
Member Author

Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/storage/store_pool.go, line 266 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Are there any locking concerns with invoking the callbacks while holding StorePool.mu?

None with the one usage, but I think it's safer to copy the slice of callbacks and invoke them outside of the mutex lock.


Comments from Reviewable

@spencerkimball spencerkimball force-pushed the spencerkimball/replicate-purgatory-check branch from 87886a5 to 3dec4e0 Compare December 21, 2016 20:08
@BramGruneir
Copy link
Member

:lgtm:


Reviewed 2 of 3 files at r1, 2 of 2 files at r2.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Dec 21, 2016

:lgtm: though I'm not entirely clear on why we need a new mechanism; this seems exactly equivalent to a narrower gossip callback regex.


Reviewed 2 of 3 files at r1, 2 of 2 files at r2.
Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


pkg/storage/replicate_queue.go, line 120 at r2 (raw file):

	// Register a callback for any store updates.
	if sp := store.cfg.StorePool; sp != nil {
		sp.registerStoreCallback(func(desc *roachpb.StoreDescriptor, status storeStatus) {

desc -> _
status -> _


pkg/storage/replicate_queue.go, line 123 at r2 (raw file):

		pattern := gossip.MakeOrPattern(
			gossip.MakePrefixPattern(gossip.KeyStorePrefix),
			gossip.MakePrefixPattern(gossip.KeyNodeLivenessPrefix),

would removing this line, alone (without the rest of this change), produce the same change in behaviour?


pkg/storage/store_pool.go, line 155 at r2 (raw file):

// callback is invoked when a store's status changes. Clients of
// StorePool can register callbacks via registerStoreCallback().
type callback func(desc *roachpb.StoreDescriptor, status storeStatus)

nit: consider removing the names which aren't descriptive here


pkg/storage/store_pool.go, line 263 at r2 (raw file):

	// Get list of callbacks to invoke with lock held.
	status := detail.status(sp.clock.PhysicalTime(), sp.timeUntilStoreDead, 0, sp.nodeLivenessFn)
	var callbacks []callback

preallocate?


pkg/storage/store_pool.go, line 271 at r2 (raw file):

	// Invoke callbacks without store pool mutex held.
	for _, cb := range callbacks {
		cb(detail.desc, status)

is accessing detail.desc safe to do without the mutex held? seems like the answer is no, and this should be passing storeDesc instead.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


pkg/storage/replicate_queue.go, line 123 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

would removing this line, alone (without the rest of this change), produce the same change in behaviour?

I had this same thought as I was driving around doing errands this afternoon. Seems that you want to add a callback to NodeLiveness to get signaled when the liveness for a node changes.


Comments from Reviewable

@spencerkimball spencerkimball force-pushed the spencerkimball/replicate-purgatory-check branch from 3dec4e0 to f178e01 Compare December 26, 2016 19:10
@spencerkimball
Copy link
Member Author

Good point. It is equivalent. I'm adding a callback to node liveness.


Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


pkg/storage/replicate_queue.go, line 120 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

desc -> _
status -> _

N/A


pkg/storage/replicate_queue.go, line 123 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I had this same thought as I was driving around doing errands this afternoon. Seems that you want to add a callback to NodeLiveness to get signaled when the liveness for a node changes.

Done.


pkg/storage/store_pool.go, line 155 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: consider removing the names which aren't descriptive here

N/A


pkg/storage/store_pool.go, line 263 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

preallocate?

N/A


pkg/storage/store_pool.go, line 271 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

is accessing detail.desc safe to do without the mutex held? seems like the answer is no, and this should be passing storeDesc instead.

N/A


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 1 of 4 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/storage/node_liveness.go, line 477 at r3 (raw file):

		if exLive, live := exLiveness.isLive(now, offset), liveness.isLive(now, offset); !exLive && live {
			for _, cb := range nl.mu.callbacks {
				cb(liveness.NodeID)

Similar comment as before: you're invoking the callback with nl.mu held. I think the current usage is safe, but this seems prone to deadlock if the callback ever invokes a NodeLiveness method.


Comments from Reviewable

@spencerkimball spencerkimball force-pushed the spencerkimball/replicate-purgatory-check branch from f178e01 to 11ceab9 Compare January 9, 2017 15:24
@spencerkimball
Copy link
Member Author

Review status: 1 of 4 files reviewed at latest revision, 3 unresolved discussions.


pkg/storage/node_liveness.go, line 477 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Similar comment as before: you're invoking the callback with nl.mu held. I think the current usage is safe, but this seems prone to deadlock if the callback ever invokes a NodeLiveness method.

Done.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Jan 9, 2017

Reviewed 5 of 5 files at r3, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


pkg/storage/node_liveness.go, line 475 at r3 (raw file):

		// If isLive status is now true, but previously false, invoke any registered callbacks.
		now, offset := nl.clock.Now(), nl.clock.MaxOffset()
		if exLive, live := exLiveness.isLive(now, offset), liveness.isLive(now, offset); !exLive && live {

exLive, live are redundant


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/storage/node_liveness.go, line 475 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

exLive, live are redundant

Why?

This would be moderately terser (and clearer) without the variables, though:

if !exLiveness.isLive(now, offset) && liveness.isLive(now, offset) {
  ...
}

Comments from Reviewable

@spencerkimball
Copy link
Member Author

Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/storage/node_liveness.go, line 475 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Why?

This would be moderately terser (and clearer) without the variables, though:

if !exLiveness.isLive(now, offset) && liveness.isLive(now, offset) {
  ...
}

That was just leftover stuff from a previous iteration which invoked the callback on change from live to not-live and vice versa. Removed.


Comments from Reviewable

@spencerkimball spencerkimball force-pushed the spencerkimball/replicate-purgatory-check branch from 11ceab9 to 0806084 Compare January 9, 2017 15:50
@tamird
Copy link
Contributor

tamird commented Jan 9, 2017

Reviewed 1 of 1 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


Comments from Reviewable

This is used to prevent busy purgatory checks for the replicate queue
on every node liveness heartbeat. However, we still do the check on the
periodic node status updates, which happen once a minute by default.

Fixes #12298
@spencerkimball spencerkimball force-pushed the spencerkimball/replicate-purgatory-check branch from 0806084 to 47a4612 Compare January 9, 2017 16:41
@spencerkimball spencerkimball merged commit 2136531 into master Jan 9, 2017
@spencerkimball spencerkimball deleted the spencerkimball/replicate-purgatory-check branch January 9, 2017 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants