-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracer: enable stats flushing when Flush() is called #1661
Conversation
Flush() flushed buffered traces, but it did not flush stats. We enabled flushing stats by adding a flush method to the statsdClient interface and calling that method when trace flushing occurred. Fixes #1547
Altered TestFlush to verify that flushSync() flushes stats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good, but I just want to confirm with @katiehockman whether DataDog/system-tests#596 refers to statsd metrics or trace stats.
I suspect it's the latter because the stats concentrator doesn't have a public flush method, and isn't flushed in tracer.Flush()
.
Either way, flushing the statsd client is valuable (what the PR currently does), let's just confirm whether flushing the stats concentrator is needed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good just two small comments! 🥳
ddtrace/tracer/tracer_test.go
Outdated
Name: "http.request", | ||
}, | ||
// Start must be older than latest bucket to get flushed | ||
Start: time.Now().UnixNano() - 3*500000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to choose 3*500000
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the bucketSize for the stats concentrator here is set to 500000
, setting the start time for the span to be time.Now().UnixNano() - 3*500000
ensures that this span does not belong to the current bucket (which does not get flushed when t.stats.flushAndSend(time.Now(), withoutCurrentBucket)
is called in tracer.go
). If the start time for the span is set to time.Now().UnixNano() - 250000
, for example, this test case will fail.
I chose the bucketSize of 500000
copying these unit tests in stats_test.go
To increase code clarity I can set some variable bucketSize
to 500000 and use that throughout! (Edit: or just use defaultStatsBucketSize
as Katie suggested)
ddtrace/tracer/tracer_test.go
Outdated
tr.flushSync() | ||
assert.Len(t, tw.Flushed(), 1) | ||
assert.Equal(t, ts.flushed, 1) | ||
assert.NotZero(t, transport.Stats()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert something more strong than just NotZero
? Or is the result of transport.Stats()
not consistent here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think assert.Len(t, transport.Stats(), 1)
also works! I can make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!!
Since this'll solve the problem that forced the workaround here, would you be up for fixing that next? That would basically involve removing the Stop
function that was added in this PR, and just using Flush there instead, like it was before I added that stopgap. (You can actually verify that this PR is doing what it's meant to do by using those tests 😄 )
Co-authored-by: Katie Hockman <katie@hockman.dev>
Co-authored-by: Katie Hockman <katie@hockman.dev>
Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
aa4942a
What does this PR do?
Flush() flushed buffered traces, but it did not flush stats. We added flushing for both the stats concentrator and the statsdClient when Flush() is called.
Motivation
There were issues in parametric tests because Flush() did not flush stats. Fixes #1547
Describe how to test/QA your changes
Reviewer's Checklist
Triage
milestone is set.