Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracer: enable stats flushing when Flush() is called #1661

Merged
merged 13 commits into from
Jan 19, 2023
Merged

Conversation

lievan
Copy link
Contributor

@lievan lievan commented Jan 13, 2023

What does this PR do?

Flush() flushed buffered traces, but it did not flush stats. We added flushing for both the stats concentrator and the statsdClient when Flush() is called.

Motivation

There were issues in parametric tests because Flush() did not flush stats. Fixes #1547

Describe how to test/QA your changes

Reviewer's Checklist

  • If known, an appropriate milestone has been selected; otherwise the Triage milestone is set.
  • Changed code has unit tests for its functionality.
  • If this interacts with the agent in a new way, a system test has been added.

Flush() flushed buffered traces, but it did not flush stats. We enabled flushing stats by adding a flush method to the statsdClient interface and calling that method when trace flushing occurred.

Fixes #1547
@lievan lievan requested a review from a team January 13, 2023 19:49
@lievan lievan added this to the v1.47.0 milestone Jan 13, 2023
@pr-commenter
Copy link

pr-commenter bot commented Jan 13, 2023

Benchmarks

Comparing candidate commit 8d5f2b0 in PR branch evan.li/flush-stats with baseline commit 2dd2f38 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 6 cases.

Copy link
Contributor

@ahmed-mez ahmed-mez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, but I just want to confirm with @katiehockman whether DataDog/system-tests#596 refers to statsd metrics or trace stats.
I suspect it's the latter because the stats concentrator doesn't have a public flush method, and isn't flushed in tracer.Flush().

Either way, flushing the statsd client is valuable (what the PR currently does), let's just confirm whether flushing the stats concentrator is needed as well.

Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good just two small comments! 🥳

Name: "http.request",
},
// Start must be older than latest bucket to get flushed
Start: time.Now().UnixNano() - 3*500000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to choose 3*500000?

Copy link
Contributor Author

@lievan lievan Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the bucketSize for the stats concentrator here is set to 500000, setting the start time for the span to be time.Now().UnixNano() - 3*500000 ensures that this span does not belong to the current bucket (which does not get flushed when t.stats.flushAndSend(time.Now(), withoutCurrentBucket) is called in tracer.go). If the start time for the span is set to time.Now().UnixNano() - 250000, for example, this test case will fail.

I chose the bucketSize of 500000 copying these unit tests in stats_test.go

To increase code clarity I can set some variable bucketSize to 500000 and use that throughout! (Edit: or just use defaultStatsBucketSize as Katie suggested)

tr.flushSync()
assert.Len(t, tw.Flushed(), 1)
assert.Equal(t, ts.flushed, 1)
assert.NotZero(t, transport.Stats())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert something more strong than just NotZero? Or is the result of transport.Stats() not consistent here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assert.Len(t, transport.Stats(), 1) also works! I can make that change.

@katiehockman katiehockman changed the title ddtrace/tracer: enable stats flushing when Flush() is called tracer: enable stats flushing when Flush() is called Jan 17, 2023
Copy link
Contributor

@katiehockman katiehockman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!!

Since this'll solve the problem that forced the workaround here, would you be up for fixing that next? That would basically involve removing the Stop function that was added in this PR, and just using Flush there instead, like it was before I added that stopgap. (You can actually verify that this PR is doing what it's meant to do by using those tests 😄 )

ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved
ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved
ahmed-mez
ahmed-mez previously approved these changes Jan 18, 2023
Copy link
Contributor

@ahmed-mez ahmed-mez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

ajgajg1134
ajgajg1134 previously approved these changes Jan 18, 2023
Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

katiehockman
katiehockman previously approved these changes Jan 18, 2023
@lievan lievan merged commit 3992f84 into main Jan 19, 2023
@lievan lievan deleted the evan.li/flush-stats branch January 19, 2023 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tracer: Flush doesn't flush stats
4 participants