New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocate timers outside of loops to avoid repeat allocations #4367
Allocate timers outside of loops to avoid repeat allocations #4367
Conversation
It looks like there were issues with the usage of I'm currently using Russ Cox's solution from the Google Groups discussion, but things are getting pretty ugly and there doesn't look like there's a good way to avoid the ugliness. It may be time to put this workaround into a timer abstraction, but I hate to abstract the stdlib unless it's absolutely necessary. How do people feel about the current fix, which I believe is race free and operates correctly in all cases (at least in a single goroutine context)? |
select { | ||
case act := <-ch: | ||
return &act | ||
case <-time.After(wait): | ||
case <-waitTimer.C: | ||
waitTimerRead = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worthwhile to put this pattern in util
:
type Timer struct {
*time.Timer
Read bool
}
// Reset operates on a value so that Timer can be stack allocated.
func (t Timer) Reset(wait time.Duration) Timer {
if t.Timer == nil {
t.Timer = time.NewTimer(wait)
return t
}
if !t.Timer.Reset(wait) && !t.Read {
<-t.Timer.C
}
t.Read = false
return t
}
Then, to use:
var timer util.Timer
for {
timer = timer.Reset(wait)
switch {
case <-timer.C:
timer.Read = true
...
}
}
Still irritating that you have to note when the timer channel has been read, but I can't think of anything better right now. This would also avoid creating and then immediately resetting the timer (though I doubt that is any sort of performance bottleneck).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also be calling time.Timer.Stop
when the loop exits. Failure to do that prevents the timer memory from being reclaimed until the timer expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a new util.Timer
type that does exactly what you were describing in your first comment. It's unfortunate to have to wrap the standard library, but I agree it's better than having bug workarounds throughout our code. I've also added deferred Timer.Stop
calls where needed. PTAL
Nicely debugged. It's irritating that timers are so hard to use efficiently and correctly. |
LGTM It looks like several of these could use a I can see how this would lead to high memory consumption, but I'm not seeing how it becomes a leak that grows over time. Was the problem just that the ceiling was so high that it looked like an unbounded leak?
|
56062e8
to
7c28db6
Compare
@bdarnell I'm not seeing where a In terms of the memory leak terminology, you are correct. I misspoke when I said "without bound", as I was not able to find anything that actually grew completely without bound. Instead, the timer memory footprint grew steadily over time until it reached the upper bound numbers I mentioned before (400,000 and 30,000). I'd be curious to hear if @spencerkimball was seeing a different issue that actually was an unbounded leak, and if so, if we can reproduce it. |
Spencer said the processes growing to more than 16GB each so I think there may be something else there. Switching to a |
LGTM Reviewed 10 of 10 files at r2. acceptance/cluster/localcluster.go, line 507 [r2] (raw file): kv/send.go, line 214 [r2] (raw file): There are a few other instances that might benefit from the same treatment. util/timer.go, line 22 [r2] (raw file): util/timer.go, line 29 [r2] (raw file): util/timer.go, line 31 [r2] (raw file): util/timer.go, line 32 [r2] (raw file): util/timer.go, line 35 [r2] (raw file): util/timer.go, line 45 [r2] (raw file): util/timer.go, line 58 [r2] (raw file): util/timer.go, line 74 [r2] (raw file): util/timer_test.go, line 27 [r2] (raw file): util/timer_test.go, line 30 [r2] (raw file): util/timer_test.go, line 53 [r2] (raw file): util/timer_test.go, line 61 [r2] (raw file): select {
case <-timer.C:
t.Fatal("lost the race")
default:
timer = timer.Reset(1 * time.Millisecond)
} util/timer_test.go, line 63 [r2] (raw file): util/timer_test.go, line 77 [r2] (raw file): util/timer_test.go, line 81 [r2] (raw file): util/timer_test.go, line 95 [r2] (raw file): util/timer_test.go, line 101 [r2] (raw file): Comments from the review on Reviewable.io |
|
||
func TestTimerStop(t *testing.T) { | ||
var timer Timer | ||
timer = timer.Reset(1 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeouts this short have been problematic for us in the past; I wouldn't be surprised to occasionally see the call to Stop()
delayed by over a millisecond on circleci. We try to avoid test timeouts smaller than 10ms.
Review status: all files reviewed at latest revision, 19 unresolved discussions. util/timer_test.go, line 61 [r2] (raw file): Comments from the review on Reviewable.io |
// but requires users of Timer to set Timer.Read to true whenever | ||
// they successfully read from the Timer's channel. Reset operates on | ||
// and returns a value so that Timer can be stack allocated. | ||
func (t Timer) Reset(d time.Duration) Timer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this take and return a Timer
value instead of making the receiver a pointer? You're not actually making anything immutable since this method will be invariably used as timer = timer.Reset()
and the assignment to timer.Read
after reading from the channel mutates the value directly. It would be easy to forget the assignment when calling this method which would lead to subtle bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought was to avoid having util.Timer
allocated on the heap. But I wasn't thinking clearly when I suggested it. The call to defer Timer.Stop()
isn't going to work properly with this setup. The one extra allocation is unlikely to be problematic. Best to make all of the Timer
methods pointer receivers.
Review status: all files reviewed at latest revision, 20 unresolved discussions. util/timer.go, line 59 [r2] (raw file): Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 20 unresolved discussions. util/timer.go, line 59 [r2] (raw file): Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 20 unresolved discussions. util/timer.go, line 59 [r2] (raw file): Comments from the review on Reviewable.io |
7c28db6
to
5fbb187
Compare
Review status: 1 of 10 files reviewed at latest revision, 20 unresolved discussions. kv/send.go, line 214 [r2] (raw file): util/timer.go, line 22 [r2] (raw file): util/timer.go, line 29 [r2] (raw file): util/timer.go, line 31 [r2] (raw file): util/timer.go, line 32 [r2] (raw file): util/timer.go, line 35 [r2] (raw file): util/timer.go, line 45 [r2] (raw file): util/timer.go, line 59 [r2] (raw file): util/timer.go, line 74 [r2] (raw file): util/timer_test.go, line 27 [r2] (raw file): util/timer_test.go, line 30 [r2] (raw file): util/timer_test.go, line 41 [r2] (raw file): util/timer_test.go, line 53 [r2] (raw file): util/timer_test.go, line 61 [r2] (raw file): util/timer_test.go, line 63 [r2] (raw file): util/timer_test.go, line 95 [r2] (raw file): Comments from the review on Reviewable.io |
Reviewed 9 of 9 files at r3. Comments from the review on Reviewable.io |
LGTM Review status: all files reviewed at latest revision, 4 unresolved discussions. util/timer.go, line 47 [r3] (raw file): Comments from the review on Reviewable.io |
There are currently 8 places in CockroachDB non-test code that create a `time.Timer` using `time.NewTimer` during every iteration of a loop. cockroachdb#4175 proposed a fix for the worst instance of this issue within `*rpcTransport.processQueue`, which resulted in upwards of **400,000** timers on a single node inuse at a given time. The second biggest offender of this issue was in `kv.send`, which resulted in about **30,000** timers on a single node inuse at a given time. Together, I diagnosed that these two issues were responsible for the memory "leak" seen in cockroachdb#4346. After making these fixes, it looks like the issue is gone as memory no longer grows without bound. I've gone ahead and fixed all 8 occurences of this anti-pattern across our codebase, using the strategy @tamird brought up in [this](https://github.com/cockroachdb/cockroach/pull/4175/files#r52558817) comment to avoid a race condition between iterations with the timers. A few of the changes might be a little over-aggressive as the loops are not as "tight" as the ones causing issues, but I still think it's important to make this change now and avoid these issues in the future.
5fbb187
to
89fe553
Compare
Review status: 9 of 10 files reviewed at latest revision, 4 unresolved discussions. util/timer.go, line 47 [r3] (raw file): Comments from the review on Reviewable.io |
Allocate timers outside of loops to avoid repeat allocations
There are currently 8 places in CockroachDB non-test code
that create a
time.Timer
usingtime.NewTimer
duringevery iteration of a loop. #4175 proposed a fix for the worst
instance of this issue within
*rpcTransport.processQueue
,which resulted in upwards of 400,000 timers on a single node
inuse at a given time. The second biggest offender of this
issue was in
kv.send
, which resulted in about 30,000 timerson a single node inuse at a given time. Together, I diagnosed
that these two issues were responsible for the memory leak
seen in #4346. After making these fixes, it looks like the
issue is gone as memory no longer grows without bound and
memory profiling no longer shows
time.NewTimer
as the thirdlargest source of memory allocations.
I've gone ahead and fixed all 8 occurences of this anti-pattern
across our codebase, using the strategy @tamird brought up in
this comment to avoid a race condition between iterations with the
timers. A few of the changes might be a little over-aggressive
as the loops are not as "tight" as the ones causing issues, but
I still think it's important to make this change now and avoid
these issues in the future.