Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate timers outside of loops to avoid repeat allocations #4367

Merged

Conversation

nvanbenschoten
Copy link
Member

There are currently 8 places in CockroachDB non-test code
that create a time.Timer using time.NewTimer during
every iteration of a loop. #4175 proposed a fix for the worst
instance of this issue within *rpcTransport.processQueue,
which resulted in upwards of 400,000 timers on a single node
inuse at a given time. The second biggest offender of this
issue was in kv.send, which resulted in about 30,000 timers
on a single node inuse at a given time. Together, I diagnosed
that these two issues were responsible for the memory leak
seen in #4346. After making these fixes, it looks like the
issue is gone as memory no longer grows without bound and
memory profiling no longer shows time.NewTimer as the third
largest source of memory allocations.

I've gone ahead and fixed all 8 occurences of this anti-pattern
across our codebase, using the strategy @tamird brought up in
this comment to avoid a race condition between iterations with the
timers. A few of the changes might be a little over-aggressive
as the loops are not as "tight" as the ones causing issues, but
I still think it's important to make this change now and avoid
these issues in the future.

Review on Reviewable

@nvanbenschoten
Copy link
Member Author

It looks like there were issues with the usage of time.Timer in the first commit I pushed, and it looks like Timer.Reset is even more difficult to use "correctly" than we thought. See this discussion, this issue by Dmitry Vyukov, this abandoned CL, and this issue.

I'm currently using Russ Cox's solution from the Google Groups discussion, but things are getting pretty ugly and there doesn't look like there's a good way to avoid the ugliness. It may be time to put this workaround into a timer abstraction, but I hate to abstract the stdlib unless it's absolutely necessary. How do people feel about the current fix, which I believe is race free and operates correctly in all cases (at least in a single goroutine context)?

@nvanbenschoten nvanbenschoten changed the title Allocate timers out of loops to avoid repeat allocations Allocate timers outside of loops to avoid repeat allocations Feb 13, 2016
select {
case act := <-ch:
return &act
case <-time.After(wait):
case <-waitTimer.C:
waitTimerRead = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worthwhile to put this pattern in util:

type Timer struct {
  *time.Timer
  Read bool
}

// Reset operates on a value so that Timer can be stack allocated.
func (t Timer) Reset(wait time.Duration) Timer {
  if t.Timer == nil {
    t.Timer = time.NewTimer(wait)
    return t
  }
  if !t.Timer.Reset(wait) && !t.Read {
    <-t.Timer.C
  }
  t.Read = false
  return t
}

Then, to use:

  var timer util.Timer
  for {
    timer = timer.Reset(wait)
    switch {
    case <-timer.C:
      timer.Read = true
      ...
    }
  }

Still irritating that you have to note when the timer channel has been read, but I can't think of anything better right now. This would also avoid creating and then immediately resetting the timer (though I doubt that is any sort of performance bottleneck).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also be calling time.Timer.Stop when the loop exits. Failure to do that prevents the timer memory from being reclaimed until the timer expires.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new util.Timer type that does exactly what you were describing in your first comment. It's unfortunate to have to wrap the standard library, but I agree it's better than having bug workarounds throughout our code. I've also added deferred Timer.Stop calls where needed. PTAL

@petermattis
Copy link
Collaborator

Nicely debugged. It's irritating that timers are so hard to use efficiently and correctly.

@bdarnell
Copy link
Member

LGTM

It looks like several of these could use a time.Ticker instead of a resetting time.Timer, which would simplify things. I think we only need this pattern when either the duration or the start time is irregular (as in store_pool and raft_transport).

I can see how this would lead to high memory consumption, but I'm not seeing how it becomes a leak that grows over time. Was the problem just that the ceiling was so high that it looked like an unbounded leak?

time.After is clearly the worst for memory use, and this Reset pattern looks like the best (at least when a Ticker cannot be used), but what about the middle ground of time.NewTimer() with a deferred Stop()? That would still create a little garbage on each iteration but would allow the timer objects to be collected without waiting for them to expire. That might be better than this unwieldy pattern for the less-critical instances.

@nvanbenschoten
Copy link
Member Author

@bdarnell I'm not seeing where a time.Ticker could be used as a replacement for a time.Timer. The problem is that a ticker cannot be reset between iterations, so time spent in each iteration would accumulate in the ticker. All of these changes needed timeout-like behavior for each iteration of a loop, where the timeouts were independent from those in other iterations.

In terms of the memory leak terminology, you are correct. I misspoke when I said "without bound", as I was not able to find anything that actually grew completely without bound. Instead, the timer memory footprint grew steadily over time until it reached the upper bound numbers I mentioned before (400,000 and 30,000). I'd be curious to hear if @spencerkimball was seeing a different issue that actually was an unbounded leak, and if so, if we can reproduce it.

@bdarnell
Copy link
Member

Spencer said the processes growing to more than 16GB each so I think there may be something else there.

Switching to a Ticker wouldn't be precisely equivalent since the time spent in each iteration would count against the time, but I think that's fine (and arguably better). For example, in kv/send.go, SendNextTimeout would become the time between the starts of successive attempts rather than the pauses between them; if it takes longer than SendNextTimeout for the first attempt to fail then the second would start immediately.

@tamird
Copy link
Contributor

tamird commented Feb 13, 2016

LGTM


Reviewed 10 of 10 files at r2.
Review status: all files reviewed at latest revision, 20 unresolved discussions.


acceptance/cluster/localcluster.go, line 507 [r2] (raw file):
Nice.


kv/send.go, line 214 [r2] (raw file):
might be worth moving this case to the top since it's relatively small and proximity to the Reset call makes this easier to read.

There are a few other instances that might benefit from the same treatment.


util/timer.go, line 22 [r2] (raw file):
you refer to it as "C" here but "Timer.C" below


util/timer.go, line 29 [r2] (raw file):
s/the timer goes off and //


util/timer.go, line 31 [r2] (raw file):
s/the timer will deadlone next Timer.Reset is called/the next call to Timer.Reset will deadlock/


util/timer.go, line 32 [r2] (raw file):
s/would then look/looks/


util/timer.go, line 35 [r2] (raw file):
add the deferred Stop() and consider mentioning it above.


util/timer.go, line 45 [r2] (raw file):
I don't follow this comment - doesn't Reset have to be called before use no matter what? In fact, won't Timer.C panic on NPE otherwise?


util/timer.go, line 58 [r2] (raw file):
Nice. Another option (which would avoid the return value) is to use **time.Timer, but that loses the embedding benefits.


util/timer.go, line 74 [r2] (raw file):
from succeeding


util/timer_test.go, line 27 [r2] (raw file):
consider extracting this into a constant, in case the test has to be loosened up later.


util/timer_test.go, line 30 [r2] (raw file):
is this needed?


util/timer_test.go, line 53 [r2] (raw file):
you don't want to check the return value? I think only this one differs from the stdlib's behaviour but maybe do it throughout anyway.


util/timer_test.go, line 61 [r2] (raw file):
This is racy. How about

select {
  case <-timer.C:
    t.Fatal("lost the race")
  default:
    timer = timer.Reset(1 * time.Millisecond)
}

util/timer_test.go, line 63 [r2] (raw file):
is this needed?


util/timer_test.go, line 77 [r2] (raw file):
This is racy, but I'm not sure how to mitigate it. Maybe just leave a comment.


util/timer_test.go, line 81 [r2] (raw file):
not needed?


util/timer_test.go, line 95 [r2] (raw file):
what's the point of sleeping here?


util/timer_test.go, line 101 [r2] (raw file):
not needed, right?


Comments from the review on Reviewable.io


func TestTimerStop(t *testing.T) {
var timer Timer
timer = timer.Reset(1 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeouts this short have been problematic for us in the past; I wouldn't be surprised to occasionally see the call to Stop() delayed by over a millisecond on circleci. We try to avoid test timeouts smaller than 10ms.

@tamird
Copy link
Contributor

tamird commented Feb 13, 2016

Review status: all files reviewed at latest revision, 19 unresolved discussions.


util/timer_test.go, line 61 [r2] (raw file):
Duh, that's still racy. Probably best to increase the timeouts like @bdarnell suggested.


Comments from the review on Reviewable.io

// but requires users of Timer to set Timer.Read to true whenever
// they successfully read from the Timer's channel. Reset operates on
// and returns a value so that Timer can be stack allocated.
func (t Timer) Reset(d time.Duration) Timer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this take and return a Timer value instead of making the receiver a pointer? You're not actually making anything immutable since this method will be invariably used as timer = timer.Reset() and the assignment to timer.Read after reading from the channel mutates the value directly. It would be easy to forget the assignment when calling this method which would lead to subtle bugs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought was to avoid having util.Timer allocated on the heap. But I wasn't thinking clearly when I suggested it. The call to defer Timer.Stop() isn't going to work properly with this setup. The one extra allocation is unlikely to be problematic. Best to make all of the Timer methods pointer receivers.

@tamird
Copy link
Contributor

tamird commented Feb 14, 2016

Review status: all files reviewed at latest revision, 20 unresolved discussions.


util/timer.go, line 59 [r2] (raw file):
Another option is to have the Timer include a **time.Timer - that would allow you to keep the value receiver without having to return anything, though you'll have to add a constructor.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Feb 14, 2016

Review status: all files reviewed at latest revision, 20 unresolved discussions.


util/timer.go, line 59 [r2] (raw file):
Actually, I guess that isn't much better, since that will still incur an allocation.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 20 unresolved discussions.


util/timer.go, line 59 [r2] (raw file):
I don't see how that would help with the defer timer.Stop() pattern either. In that case, you need timer to be a pointer receiver or else your defer will be pointing to the wrong object.


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Review status: 1 of 10 files reviewed at latest revision, 20 unresolved discussions.


kv/send.go, line 214 [r2] (raw file):
Done.


util/timer.go, line 22 [r2] (raw file):
Done.


util/timer.go, line 29 [r2] (raw file):
Done.


util/timer.go, line 31 [r2] (raw file):
Done.


util/timer.go, line 32 [r2] (raw file):
Done.


util/timer.go, line 35 [r2] (raw file):
Done.


util/timer.go, line 45 [r2] (raw file):
Removed.


util/timer.go, line 59 [r2] (raw file):
Changed all of the Timer methods to pointer receivers. I actually do think the defer timer.Stop() would work even as with value receivers as long as the inner time.Timer pointer was initialized first instead of being initialized lazily, because its pointer is just passed between struct values. I think @tamird's double indirection approach was targeted at permitting this lazy initialization, which it would do, but like he said, would allocate twice. Either way though, pointer receivers seem like the way to go as they make everything simpler.


util/timer.go, line 74 [r2] (raw file):
Done.


util/timer_test.go, line 27 [r2] (raw file):
Done.


util/timer_test.go, line 30 [r2] (raw file):
Nope, but I added those in later because I figured unit tests should demonstrate proper usage patterns, even if their simplified setups don't require the full pattern. I'd rather be overly verbose here than have examples make an implicit "last read doesn't need to signal" assumption that may be misinterpreted/misused later.


util/timer_test.go, line 41 [r2] (raw file):
Thanks for the heads up. I changed everything to be multiples of a 10ms time step.


util/timer_test.go, line 53 [r2] (raw file):
Done.


util/timer_test.go, line 61 [r2] (raw file):
Done.


util/timer_test.go, line 63 [r2] (raw file):
Nope, but see above.


util/timer_test.go, line 95 [r2] (raw file):
None. Removed.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Feb 14, 2016

Reviewed 9 of 9 files at r3.
Review status: all files reviewed at latest revision, 4 unresolved discussions.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

LGTM


Review status: all files reviewed at latest revision, 4 unresolved discussions.


util/timer.go, line 47 [r3] (raw file):
s/not/no/g


Comments from the review on Reviewable.io

There are currently 8 places in CockroachDB non-test code
that create a `time.Timer` using `time.NewTimer` during
every iteration of a loop. cockroachdb#4175 proposed a fix for the worst
instance of this issue within `*rpcTransport.processQueue`,
which resulted in upwards of **400,000** timers on a single node
inuse at a given time. The second biggest offender of this
issue was in `kv.send`, which resulted in about **30,000** timers
on a single node inuse at a given time. Together, I diagnosed
that these two issues were responsible for the memory "leak"
seen in cockroachdb#4346. After making these fixes, it looks like the
issue is gone as memory no longer grows without bound.

I've gone ahead and fixed all 8 occurences of this anti-pattern
across our codebase, using the strategy @tamird brought up in
[this](https://github.com/cockroachdb/cockroach/pull/4175/files#r52558817)
comment
to avoid a race condition between iterations with the timers.
A few of the changes might be a little over-aggressive as the
loops are not as "tight" as the ones causing issues, but I still
think it's important to make this change now and avoid these
issues in the future.
@nvanbenschoten
Copy link
Member Author

Review status: 9 of 10 files reviewed at latest revision, 4 unresolved discussions.


util/timer.go, line 47 [r3] (raw file):
Done.


Comments from the review on Reviewable.io

nvanbenschoten added a commit that referenced this pull request Feb 15, 2016
Allocate timers outside of loops to avoid repeat allocations
@nvanbenschoten nvanbenschoten merged commit 96381b5 into cockroachdb:master Feb 15, 2016
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/timers branch February 15, 2016 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants