Skip to content

Commit

Permalink
kvserver: reject requests when quiescing
Browse files Browse the repository at this point in the history
This patch makes the Store reject requests once its stopper is
quiescing.

Before this patch, we didn't seem to have good protection against
requests not running after the stopper has been stopped. We've seen this
in some tests, where requests were racing with the engine closing.
Running after the stopper has stopped is generally pretty undefined
behavior, so let's avoid it.
I think the reason why we didn't see a lot of errors from such races is
that we're stopping the gRPC server even before we start quiescing, so
at least for requests coming from remote nodes we had some draining
window.

This is a lighter version of cockroachdb#51566. That patch was trying to run
requests as tasks, so that they properly synchronize with server
shutdown. This patch still allows races between requests that started
evaluating the server started quiescing and server shutdown.

Touches cockroachdb#51544

Release note: None
  • Loading branch information
andreimatei committed Aug 14, 2020
1 parent dd49b04 commit 63ff574
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 0 deletions.
14 changes: 14 additions & 0 deletions pkg/kv/kvserver/store_send.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,20 @@ import (
func (s *Store) Send(
ctx context.Context, ba roachpb.BatchRequest,
) (br *roachpb.BatchResponse, pErr *roachpb.Error) {

// Best effort short-circuit if the server is quiescing. Note that what we
// really want to do here is run the request as a stopper task, so that
// quiescing waits for it to finish, but we're afraid of the performance
// implications. We would also like to use WithCancelOnQuiesce() so that the
// tasks drain faster when the server is shutting down, but that introduces
// problems because all contexts for all requests will be canceled once they
// complete, which propagates to sub-tasks spawned with RunAsyncTask(), and
// most such async tasks don't want to be canceled because of the parent
// request. See discussions in #51566 around all of these.
if s.stopper.IsQuiescing() {
return nil, roachpb.NewError(&roachpb.NodeUnavailableError{})
}

// Attach any log tags from the store to the context (which normally
// comes from gRPC).
ctx = s.AnnotateCtx(ctx)
Expand Down
13 changes: 13 additions & 0 deletions pkg/util/stop/stopper.go
Original file line number Diff line number Diff line change
Expand Up @@ -575,3 +575,16 @@ func (s *Stopper) Quiesce(ctx context.Context) {
t.Stop()
}
}

// Quiescing returns true if the stopper is quiescing.
//
// Generally this method shouldn't be used because it provides no
// synchronization between the caller and quiescing. Generally a caller either
// wants to use RunTaks() which checks for quiescing at the beginning and the
// blocks the stopper stopping on the task, or at least ShouldQuiesce() which
// gives you a channel signaled in the future when quiescing is initiated.
func (s *Stopper) IsQuiescing() bool {
s.mu.Lock()
defer s.mu.Unlock()
return s.mu.quiescing
}

0 comments on commit 63ff574

Please sign in to comment.