Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce timer object creation to reduce gc overhead #4175

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 3 additions & 1 deletion server/raft_transport.go
Expand Up @@ -159,11 +159,13 @@ func (t *rpcTransport) processQueue(nodeID roachpb.NodeID, storeID roachpb.Store
done := make(chan *gorpc.Call, cap(ch))
var req *storage.RaftMessageRequest
protoResp := &storage.RaftMessageResponse{}
raftIdleTimer := time.NewTimer(raftIdleTimeout)
for {
raftIdleTimer.Reset(raftIdleTimeout)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is functionally equivalent. The old code would restart the timer on every iteration. But it looks like this new pattern introduces a small race where the timer can fire just as a request is received. The timer internally sends to raftIdleTimer.C and on the next iteration we call Reset, but that doesn't clear the channel. I'm not familiar enough with this code to know if that race is innocuous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; that does seem like an issue. Now that I look closely, there are several other problems with the shutdown-on-idle behavior. A new request could be written to the channel after the decision has been made to shut down the goroutine, and even without any races the goroutine could stop while there are still outstanding requests that will try to write to the done channel.

We have several instances of this pattern where we want a goroutine that goes away when it is idle (here, gossip.infoStore.runCallbacks, and my upcoming work on parallelizing raft execution). I'm going to try to refactor the infoStore pattern into something reusable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset returns false if the timer had expired (in other words, it returns false if there's a value parked on the channel). Seems easy enough to mitigate:

if !raftIdleTimer.Reset(raftIdleTimeout) {
  <-raftIdleTimer.C
}

select {
case <-t.rpcContext.Stopper.ShouldStop():
return
case <-time.After(raftIdleTimeout):
case <-raftIdleTimer.C:
if log.V(1) {
log.Infof("closing Raft transport to %d due to inactivity", nodeID)
}
Expand Down