New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contrib: fix span start option races #2418
Conversation
90c2665
to
b99b705
Compare
Signed-off-by: Eliott Bouhana <eliott.bouhana@datadoghq.com>
b99b705
to
8f24d7b
Compare
BenchmarksBenchmark execution time: 2023-12-12 18:29:22 Comparing candidate commit 3c089f2 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 40 metrics, 1 unstable metrics. |
contrib/gin-gonic/gin/gintrace.go
Outdated
|
||
// Make sure opts is a copy of the span options, scoped to this request, | ||
// to avoid races and leaks to spanOpts or cfg.spanOpts | ||
opts := append([]tracer.StartSpanOption{ | ||
tracer.ResourceName(cfg.resourceNamer(c)), | ||
tracer.Tag(ext.HTTPRoute, c.FullPath()), | ||
httptrace.HeaderTagsFromRequest(c.Request, cfg.headerTags), | ||
}, spanOpts...) | ||
|
||
if !math.IsNaN(cfg.analyticsRate) { | ||
opts = append(opts, tracer.Tag(ext.EventSampleRate, cfg.analyticsRate)) | ||
} | ||
opts = append(opts, tracer.Tag(ext.HTTPRoute, c.FullPath())) | ||
opts = append(opts, httptrace.HeaderTagsFromRequest(c.Request, cfg.headerTags)) | ||
span, ctx := httptrace.StartRequestSpan(c.Request, opts...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely actually not an issue here, but I'd want to point out that this change affectes the order of the options, which could affect the end result...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is why a "builder" would be preferable to enforce this too, whereas every contrib is doing its own thing at the moment, and some of them have ordering tests while others don't.
This is a known limitation of the current design that we discussed in the past here #1352
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was difficult to use a builder here, since the builder just ended up being shared as well, which led to the same problem. At least for now, I made a helper method to make those copies more explicit, so I'm hoping that'll help a bit with readability. It doesn't fully prevent this in the future, but it does make it more explicit.
I've gone ahead and made several changes, and fixed additional races. Given how much of the code I've authored, I shouldn't be the one to approve this, so will seek an additional review from someone else on the team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few comments about some unnecessary copies.
Only the GRPC one looks like it could actually have a performance impact on customers. The rest are fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some changes that we need to make. Not all of them need to go in here, but some of these need to be modified.
// This product includes software developed at Datadog (https://www.datadoghq.com/). | ||
// Copyright 2023 Datadog, Inc. | ||
|
||
package options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was reading some contribs changed in this PR, I think we could go further to help thanks to some ImmutableOptions
wrapper type.
For example avoiding:
func Middleware(opts ...Options) Handler {
opts = options.Copy(opts) // problem avoided here
opts = append(opts, foo)
return func(w http.ResponseWriter, r *http.Request) {
opts = append(opts, headersTags(r)) // problem not avoided here
}
}
And doing instead:
func Middleware(opts ...Options) Handler {
localOpts = options.Copy(opts) // problem avoided here
localOpts = append(localOpts, foo)
opts := options.ImmutableOptions(localOpts)
return func(w http.ResponseWriter, r *http.Request) {
opts = append(opts, headersTags(r)) // problem not avoided here
}
}
Or having a full wrapper type Options
with an Append
method so it can do more safety checks?
My point is: how can we further improve this API to avoid future mistakes again and maybe add this ability to block any further uses of append
once we know we are done and anything coming later is in a request handler, from concurrent goroutines.
cc @knusbaum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstracting the mutations is potentially a good idea, but we would have to be careful to do it in an efficient way so that every Append
doesn't cause a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider it, but let's do so in a follow-up PR. I'd like to keep this PR as minimal as possible given the already large blast radius of changes.
…g/dd-trace-go into eliott.bouhana/fix-span-opts-race
contrib/zenazn/goji.v1/web/goji.go
Outdated
@@ -38,9 +38,7 @@ func Middleware(opts ...Option) func(*web.C, http.Handler) http.Handler { | |||
warnonce sync.Once | |||
) | |||
defaults(&cfg) | |||
localOpts := make([]Option, len(opts)) | |||
copy(localOpts, opts) // make a copy to avoid changes to opts having side effects to the returned router | |||
for _, fn := range localOpts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this one anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
What does this PR do?
Fix data race from closure on the span options used when starting spans in multiple contribs.
This PR introduces a new package
contrib/internal/options
with a method calledCopy
which provides a newly allocated options slice with a copy of the provides options inside it.Note: This PR does not fix the scenario where a customer keeps a reference to the options they pass to their contrib setup and then alters them later on, which would have the side effect of altering the options in the generated config. That could theoretically create a race condition, but it's not supported behavior. This PR just focused on race conditions caused by our library.
Motivation
This PR originally fixes a data race found on
github.com/go-chi.chi.v5
contrib while running some-race
tests on dd-source. It was then discovered that a lot of contrib's had the same issue. I noticed 2 mechanisms on every contrib when buildings options to start a new span: either we create a new array and append the global options into it (no issue with this mechanism), or we append options directly into thecfg.spanOpts
slice. This second approach could lead to data races in case the array has enough capacity to append in-place the new element appended.Reviewer's Checklist
For Datadog employees:
@DataDog/security-design-and-guidance
.Unsure? Have a question? Request a review!