-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddtrace/tracer: reduce lock contention for spans #1775
base: main
Are you sure you want to change the base?
Conversation
goos: darwin goarch: arm64 pkg: gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ ConcurrentTracing-10 823.9µ ± 2% 589.5µ ± 6% -28.44% (p=0.000 n=10)
BenchmarksBenchmark execution time: 2024-05-28 13:51:43 Comparing candidate commit 46f8df3 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 1 unstable metrics. |
This is less than what I'm reporting in the issue description. I think the reason is that the machine I used has 32 logical cores and our benchmark uses 24. Lock contention gets worse with a higher number of cores. |
Will debug the test failure. |
9f63e5a
to
e25e8fd
Compare
Tests are green ✅ - Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Cool to see sync.Pool being used, I don't think I've ever seen it used before
Yeah, it's probably best to stay away from it in most cases 😅, but I think the reduced lock contention here makes it worth it. Since this is pretty mission critical, I'm also adding @nsrip-dd as a reviewer. We need to ensure that this change doesn't increase the chance of spandID collisions or other problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lead me down quite a rabbit hole 😄 The biggest difficulty in reviewing this is that I don't know what the expectations are for the quality of trace/span ID generation. Over what time period do they need to be unique? Over the life of a single process? Across all traffic from an individual customer for some time? Across all customers?
The math/rand
RNG truncates its seeds to 32 bits. My biggest concern with this change is that it will increase the likelihood that dd-trace-go apps produce trace/span IDs with the same seed close together. Rather than seeding once at the start of the process, it could be seeded arbitrarily many times over the lifetime of a process (e.g. if the RNG gets garbage collected while it's in the pool).
Maybe we need to investigate using an RNG with a bigger seed space? I did a little poking around and something from the PCG family seems like it might be good. Each seed selects a different, uncorrelated random sequence, and a second value can be used to start at a random spot within the sequence. The Rust authors have some good writing on RNGs as well. But... this is a little out of my depth.
// This is pretty much optimal for avoiding contention. | ||
r := randPool.Get().(*rand.Rand) | ||
// NOTE: TestTextMapPropagator fails if we return r.Uint64() here. Seems like | ||
// span ids are expected to be 64 bit with the first bit being 0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure span IDs can be full-size 64 bit unsigned integers. The problem is that tests in textmap_test.go
are asserting the formatted span IDs match strconv.Itoa(int(childSpanID))
(noticed the signed, platform-dependent-sized integer), when they should match strconv.FormatUint(childSpanID, 10)
. I recall that tests actually also fail on 32-bit archs when they shouldn't. So really, the tests should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually these tests might be related to past problem processing full 64 bits IDs in languages like Node.JS and Java (IIRC)
Not sure if this limitation is still valid.
} | ||
// seedSeq makes sure we don't create two generators with the same seed | ||
// by accident. | ||
return rand.New(rand.NewSource(seed + atomic.AddInt64(&seedSeq, 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing, but I think seedSeq
is only needed if we have to fall back to the timestamp. The resolution of the timestamp might be such that two random sources created close together get the seed. In that case, and assuming time.Now is monotonic, seedSeq
would indeed prevent the same seed being used multiple times in the same process. Across process, all bets are off when using timestamps as a seed, with or without seedSeq
.
But if we can successfully use the cryptograph RNG source, I don't think the seed needs to be changed. We should already have as low of a chance of collision as we could hope to get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.Now is monotonic, seedSeq would indeed prevent the same seed being used multiple times in the same process. Across process, all bets are off when using timestamps as a seed, with or without seedSeq.
As I understand Pool uses thread local underneath - this is niice! Definitely how this rand should be implemented! :)
Since the initialization cost will be amortized, you could add process id and a hash of a hostname, to ensure the seed is more unique.
Also to avoid a chance of collisions more I would recommend more transformative operations than Add. e.g. multiply or shift would ensure that seeds created in sequence will definitely not collide. However value Overflow will need to be handled to ensure bits are not lost.
For the same reason its best to not modify the crypto acquired seed - it should already be good enough 🤞
But if we can successfully use the cryptograph RNG source, I don't think the seed needs to be changed. We should already have as low of a chance of collision as we could hope to get.
This truncated 32 bit seed is something we've already run into as an issue and to reduce the likelihood of collisions we added this here: https://github.com/DataDog/dd-trace-go/blob/main/ddtrace/tracer/tracer.go#L520 |
@felixge is this something you feel is ready to merge? Is there anything my team can help with to get it over the finish line? |
@katiehockman maybe 😅. Any reason this is on your radar right now? I could try to make some time tomorrow to re-read the latest comments and see if this is ready from my PoV. |
@felixge no particular reason. I'm just trying to clean up the backlog. If we aren't sure about this PR and want to just close it, and re-open it later if/when we feel it's ready, that's another option. Or can mark it with the |
What does this PR do?
Optimizes the concurrent generation of random span IDs using sync.Pool. The old code used a naive mutex approach that seems to have been copied from the stdlib without proper attribution.
Before:
After:
Motivation
#1774 causes a small regression in performance, so I took a quick look to see what's going on and noticed the high lock contention on span generation.
Describe how to test/QA your changes
Reviewer's Checklist
Triage
milestone is set.