write: fix test races and enable CI `-race` flag #500

gbbr · 2018-10-22T13:21:51Z

No description provided.

gbbr · 2018-10-22T13:55:42Z

writer/payload.go

@@ -336,7 +336,7 @@ func (s *QueuablePayloadSender) flushQueue() error {
 func (s *QueuablePayloadSender) removeQueuedPayload(e *list.Element) *list.Element {
 	next := e.Next()
 	payload := e.Value.(*Payload)
-	s.currentQueuedSize -= int64(len(payload.Bytes))
+	atomic.AddInt64(&s.currentQueuedSize, -int64(len(payload.Bytes)))


Couldn't find a better way here to stop the race from happening. I think the "correct" way would be to refactor a bit and improve the overall testing of the package, but that seems like a rather large job for this PR alone. The scope here was simply to add the -race flag into CI. I do plan however to separately improve the testing and perhaps help get rid of the syncBarrier channel too.

What's causing the race here? The idea when I implemented this was that currentQueuedSize was only accessed from the sender routine.

EDIT: Oh I see, it's because you started accessing it from the test. Could we remove the atomic and use the syncBarrier on the sender to get around the race? We'll probably also have to drop the NumQueuedPayloads method or then we'll have to add a lock to s.QueuedPayloads.

AlexJF · 2018-10-22T16:25:04Z

statsd/statsd.go

+
+// Client returns the global StatsClient.
+func Client() StatsClient {
+	mu.RLock()


This is one of those cases where 100% race correctness bugs me. In normal operation you initialize Client during app bootstrap and never set it again so we could access it lock free everywhere.

However, because we call SetClient on tests, we now have to introduce a lock on what was before a lock-free operation. I'm not sure if the Go compiler is smart enough to realize it can ditch the lock outside of test runs but I suspect this could have a non-negligible toll on hot and loopy code paths where Client() is called often.

I wonder if there's a different way to appease -race in these cases by triggering some global coroutine sync (or something like C/C++ macros, where we could disable some locks at compile time if we detect we are not compiling for a test run).

Anyway, -race rant mode off 😄

I agree that it's not nice to introduce locks here for the sake of tests. I should've spent more time on this. I'll try again tomorrow by using TestMain(m *testing.M) and initializing the mock statsd client only once. That should work and it should eliminate the race.

AlexJF · 2018-10-22T16:30:33Z

writer/multi_writer_test.go

-		assert.Equal(t, "ping2", msg2.(string))
+		assert.ElementsMatch(t,
+			[]string{"ping1", "ping2"},
+			[]string{(<-multi.mch).(string), (<-multi.mch).(string)},


Is execution order guaranteed here?

Yeah, the slice expression is not evaluated until both receives complete. As for assert.ElementsMatch matches elements regardless of their order.

AlexJF · 2018-10-22T16:38:39Z

writer/payload.go

@@ -336,7 +336,7 @@ func (s *QueuablePayloadSender) flushQueue() error {
 func (s *QueuablePayloadSender) removeQueuedPayload(e *list.Element) *list.Element {
 	next := e.Next()
 	payload := e.Value.(*Payload)
-	s.currentQueuedSize -= int64(len(payload.Bytes))
+	atomic.AddInt64(&s.currentQueuedSize, -int64(len(payload.Bytes)))


What's causing the race here? The idea when I implemented this was that currentQueuedSize was only accessed from the sender routine.

EDIT: Oh I see, it's because you started accessing it from the test. Could we remove the atomic and use the syncBarrier on the sender to get around the race? We'll probably also have to drop the NumQueuedPayloads method or then we'll have to add a lock to s.QueuedPayloads.

gbbr · 2018-10-23T09:36:15Z

@AlexJF PTAL, I've used TestMain to set up the statsd.Client, which worked. For the other race, I've checked the length of the queue after stopping the sender, which also worked. I reckon that should be fine, no?

AlexJF

LGTM!

gbbr added this to the 6.7.0 milestone Oct 22, 2018

gbbr requested a review from AlexJF October 22, 2018 13:21

gbbr force-pushed the gbbr/races branch from 646e1e6 to 7776b45 Compare October 22, 2018 13:53

gbbr commented Oct 22, 2018

View reviewed changes

AlexJF reviewed Oct 22, 2018

View reviewed changes

gbbr force-pushed the gbbr/races branch 2 times, most recently from 3192e40 to cc28b52 Compare October 23, 2018 09:24

writer: remove statsd.Client race in writer tests

863755d

gbbr force-pushed the gbbr/races branch from cc28b52 to 863755d Compare October 23, 2018 09:35

AlexJF approved these changes Oct 23, 2018

View reviewed changes

gbbr merged commit d915c49 into master Oct 23, 2018

gbbr deleted the gbbr/races branch October 23, 2018 12:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write: fix test races and enable CI `-race` flag #500

write: fix test races and enable CI `-race` flag #500

gbbr commented Oct 22, 2018

gbbr Oct 22, 2018

AlexJF Oct 22, 2018

AlexJF Oct 22, 2018

gbbr Oct 22, 2018

AlexJF Oct 22, 2018

gbbr Oct 22, 2018

AlexJF Oct 22, 2018

gbbr commented Oct 23, 2018

AlexJF left a comment

write: fix test races and enable CI -race flag #500

write: fix test races and enable CI -race flag #500

Conversation

gbbr commented Oct 22, 2018

gbbr Oct 22, 2018

Choose a reason for hiding this comment

AlexJF Oct 22, 2018

Choose a reason for hiding this comment

AlexJF Oct 22, 2018

Choose a reason for hiding this comment

gbbr Oct 22, 2018

Choose a reason for hiding this comment

AlexJF Oct 22, 2018

Choose a reason for hiding this comment

gbbr Oct 22, 2018

Choose a reason for hiding this comment

AlexJF Oct 22, 2018

Choose a reason for hiding this comment

gbbr commented Oct 23, 2018

AlexJF left a comment

Choose a reason for hiding this comment

write: fix test races and enable CI `-race` flag #500

write: fix test races and enable CI `-race` flag #500