Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not append to shared slice #16

Merged
merged 1 commit into from
Aug 10, 2016
Merged

Do not append to shared slice #16

merged 1 commit into from
Aug 10, 2016

Conversation

tummychow
Copy link
Contributor

@tummychow tummychow commented Aug 9, 2016

Simplified example of how this bug works: https://play.golang.org/p/yIxB5LPORx

Detailed repro scenario:

  1. suppose c.Tags has capacity > length (this is quite possible since it's provided by the caller, eg the example at the top of the file c.Tags = append(c.Tags, "us-east-1a") would probably result in capacity>length)
  2. caller 1 enters the format function, passing in a single tag. It reaches line 117 and executes append
  3. since c.Tags has residual capacity, the underlying memory of the c.Tags slice is mutated so that the next element contains caller 1's tag. Caller 1's format invocation receives a slice pointing to the underlying memory of c.Tags, but with length of len(c.Tags) + 1.
  4. before caller 1 can continue any further, caller 2 enters the format function, passing in a single tag. There is no locking around format, so caller 2 also reaches line 117 and executes append.
  5. since c.Tags still has residual capacity, its memory is mutated so that the next element contains caller 2's single tag. Since caller 1 wrote its tag into that same element, its tag has now been overwritten.
  6. caller 1 continues execution and writes its tags to the buffer. Since it's using the same underlying memory as caller 2, it will now write caller 2's tag to its buffer, instead of the tag that was originally passed in.

We detected this bug in Veneur because Veneur happens to have several different metrics that have a part tag, but the range of values for that tag are disjoint for various metric names. Eg veneur.import.response_duration_ns has potential parts of part:merge or part:request. Other metrics, like veneur.flush.duration_ns, report parts of part:json or part:post. We were receiving packets veneur.import.response_duration_ns and part:json, and since our dashboard was splitting the timeseries by part, we noticed those unexpected parts and proceeded to investigate. Wireshark revealed that the offending packet was sent right after another packet that had part:json, and from there, the cause of the bug was clear. We don't know how many of our other go services have been impacted. If the spurious tags weren't being displayed on their dashboards, then they might have just never noticed.

Since c.Tags is shared across all invocations of format, it is unsafe to append concurrently to it. We could lock the function (or invoke it from inside sendMsg which is already locked), but my fix is to simply append to a copy instead, which eliminates all potential sharing between separate invocations of format. (You could append to the caller's tags instead, but if they're sharing that slice between multiple places, then you could end up mutating it unexpectedly.)

cc @gphat @antifuchs @rhwlo

If two calls to this function happen at ~the same time, one of them will
append to the shared c.Tags slice, and then the other one will. If
c.Tags has enough room to avoid being reallocated, then it will be
mutated directly instead, and the second append will overwrite the first
one. We should not be appending to a slice that is scoped to the entire
statsd client.
@gphat
Copy link
Contributor

gphat commented Aug 10, 2016

Great catch, @tummychow!

@remh
Copy link
Contributor

remh commented Aug 10, 2016

Great catch @tummychow that's a great explanation

@remh remh merged commit 94b37b6 into DataDog:master Aug 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants