Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Simplified example of how this bug works: https://play.golang.org/p/yIxB5LPORx
Detailed repro scenario:
c.Tags
has capacity > length (this is quite possible since it's provided by the caller, eg the example at the top of the filec.Tags = append(c.Tags, "us-east-1a")
would probably result in capacity>length)format
function, passing in a single tag. It reaches line 117 and executesappend
c.Tags
has residual capacity, the underlying memory of thec.Tags
slice is mutated so that the next element contains caller 1's tag. Caller 1'sformat
invocation receives a slice pointing to the underlying memory ofc.Tags
, but with length oflen(c.Tags) + 1
.format
function, passing in a single tag. There is no locking aroundformat
, so caller 2 also reaches line 117 and executes append.c.Tags
still has residual capacity, its memory is mutated so that the next element contains caller 2's single tag. Since caller 1 wrote its tag into that same element, its tag has now been overwritten.We detected this bug in Veneur because Veneur happens to have several different metrics that have a
part
tag, but the range of values for that tag are disjoint for various metric names. Egveneur.import.response_duration_ns
has potential parts ofpart:merge
orpart:request
. Other metrics, likeveneur.flush.duration_ns
, report parts ofpart:json
orpart:post
. We were receiving packetsveneur.import.response_duration_ns
andpart:json
, and since our dashboard was splitting the timeseries bypart
, we noticed those unexpected parts and proceeded to investigate. Wireshark revealed that the offending packet was sent right after another packet that hadpart:json
, and from there, the cause of the bug was clear. We don't know how many of our other go services have been impacted. If the spurious tags weren't being displayed on their dashboards, then they might have just never noticed.Since
c.Tags
is shared across all invocations offormat
, it is unsafe to append concurrently to it. We could lock the function (or invoke it from insidesendMsg
which is already locked), but my fix is to simply append to a copy instead, which eliminates all potential sharing between separate invocations offormat
. (You could append to the caller'stags
instead, but if they're sharing that slice between multiple places, then you could end up mutating it unexpectedly.)cc @gphat @antifuchs @rhwlo