re-architect the client #91

arbll · 2019-06-20T15:33:32Z

What does this PR do ?

The main goal of this PR is to fix a bunch of bad behaviors this client had that would result in drops, increased resource consumption on the agent and back pressure on the instrumented app. Since some big architectural changes were needed to fix most of them, I thought it would make sense to fix them all at the same time.
The goal is also to make the client performant OOB. This includes enabling buffering and setting sensible values for high throughput.

Makes buffering less bursty by sending buffers as soon as they fill up (previously we sent all of them at the same time after 100ms). Fixes Buffered client is inefficient #20.
Add a way to configure the maximum size of a single payload (was always 1432, the optimal size for local UDP). Fixes No way to change OptimalPayloadSize #67
Sets the default maximum size of a single payload to 1432 when using UDP and 8192 when using UDS.
Move the networking to a different Goroutine to avoid blocking the instrumented app / reduce latency (and remove the async UDS client which was basically the same thing but only for UDS).
Remove all allocations during the formating logic to avoid putting pressure on the instrumented app GC.

Architecture

The high level idea is:
Every-time the instrumented app makes a call to the client we format the metric and store it in the current buffer. When this buffer gets full (or we reach a timeout) it is forwarded to a queue of buffers that should be sent over UDP or UDS by the sender asap. A new buffer is then taken from a buffer pool and used as the current buffer. When the sender is done with the buffer we added to the queue, it puts it back in the buffer pool.

There are two main goals with this design:

Avoiding any dynamic allocations after the client initialization using free lists
Decoupling the calls made by the instrumented app from the networking logic (this is particularly important with UDS that blocks by default but UDP also takes some time to send and has a few blocking edge cases)

Disclaimer: I took some inspiration from https://github.com/smira/go-statsd with the architecture.

Upgrade notes

Sending a metric over UDS won't return an error if we fail to forward the datagram to the agent. We took this decision for two main reasons:
- This made the UDS client blocking by default which is not desirable
- This design was flawed if you used a buffer as only the call that actually sent the buffer would return an error
The buffered option has been removed as the client can only be buffered. If for some reason you need to have only one dogstatsd message per payload you can still use the WithMaxMessagesPerPayload option set to 1.
The asyncUDS option has been removed as the networking layer is now running in a separate Goroutine.

Benchmarks against master

This was not the main goal of this PR but we get some nice improvements in raw performance:

Result against the default configuration:

benchmark                      old ns/op     new ns/op     delta
BenchmarkStatsdUDP1-8          1940          145           -92.53%
BenchmarkStatsdUDP10-8         2212          144           -93.49%
BenchmarkStatsdUDP100-8        2234          234           -89.53%
BenchmarkStatsdUDP1000-8       2397          1088          -54.61%
BenchmarkStatsdUDP10000-8      2488          1181          -52.53%
BenchmarkStatsdUDP100000-8     2446          1058          -56.75%
BenchmarkStatsdUDP200000-8     2530          986           -61.03%
BenchmarkStatsdUDS1-8          4811          131           -97.28%
BenchmarkStatsdUDS10-8         6078          141           -97.68%
BenchmarkStatsdUDS100-8        6544          234           -96.42%
BenchmarkStatsdUDS1000-8       7519          1210          -83.91%
BenchmarkStatsdUDS10000-8      7611          1183          -84.46%
BenchmarkStatsdUDS100000-8     7664          981           -87.20%
BenchmarkStatsdUDS200000-8     7854          1005          -87.20%

benchmark                      old allocs     new allocs     delta
BenchmarkStatsdUDP1-8          1              0              -100.00%
BenchmarkStatsdUDP10-8         1              0              -100.00%
BenchmarkStatsdUDP100-8        1              0              -100.00%
BenchmarkStatsdUDP1000-8       1              0              -100.00%
BenchmarkStatsdUDP10000-8      1              0              -100.00%
BenchmarkStatsdUDP100000-8     1              0              -100.00%
BenchmarkStatsdUDP200000-8     1              0              -100.00%
BenchmarkStatsdUDS1-8          9              0              -100.00%
BenchmarkStatsdUDS10-8         9              0              -100.00%
BenchmarkStatsdUDS100-8        9              0              -100.00%
BenchmarkStatsdUDS1000-8       9              0              -100.00%
BenchmarkStatsdUDS10000-8      9              0              -100.00%
BenchmarkStatsdUDS100000-8     9              0              -100.00%
BenchmarkStatsdUDS200000-8     10             0              -100.00%

benchmark                      old bytes     new bytes     delta
BenchmarkStatsdUDP1-8          32            0             -100.00%
BenchmarkStatsdUDP10-8         32            0             -100.00%
BenchmarkStatsdUDP100-8        32            0             -100.00%
BenchmarkStatsdUDP1000-8       32            0             -100.00%
BenchmarkStatsdUDP10000-8      37            0             -100.00%
BenchmarkStatsdUDP100000-8     44            2             -95.45%
BenchmarkStatsdUDP200000-8     53            6             -88.68%
BenchmarkStatsdUDS1-8          584           1             -99.83%
BenchmarkStatsdUDS10-8         584           1             -99.83%
BenchmarkStatsdUDS100-8        584           1             -99.83%
BenchmarkStatsdUDS1000-8       585           1             -99.83%
BenchmarkStatsdUDS10000-8      597           2             -99.66%
BenchmarkStatsdUDS100000-8     622           5             -99.20%
BenchmarkStatsdUDS200000-8     648           7             -98.92%

Results against the "optimal" configuration (MAX_INT elements in the buffer, AsyncUDS):

benchmark                      old ns/op     new ns/op     delta
BenchmarkStatsdUDP1-8          361           145           -59.83%
BenchmarkStatsdUDP10-8         398           144           -63.82%
BenchmarkStatsdUDP100-8        464           234           -49.57%
BenchmarkStatsdUDP1000-8       1395          1088          -22.01%
BenchmarkStatsdUDP10000-8      1573          1181          -24.92%
BenchmarkStatsdUDP100000-8     1609          1058          -34.24%
BenchmarkStatsdUDP200000-8     1612          986           -38.83%
BenchmarkStatsdUDS1-8          344           131           -61.92%
BenchmarkStatsdUDS10-8         385           141           -63.38%
BenchmarkStatsdUDS100-8        425           234           -44.94%
BenchmarkStatsdUDS1000-8       1477          1210          -18.08%
BenchmarkStatsdUDS10000-8      1537          1183          -23.03%
BenchmarkStatsdUDS100000-8     1275          981           -23.06%
BenchmarkStatsdUDS200000-8     1485          1005          -32.32%

benchmark                      old allocs     new allocs     delta
BenchmarkStatsdUDP1-8          1              0              -100.00%
BenchmarkStatsdUDP10-8         1              0              -100.00%
BenchmarkStatsdUDP100-8        1              0              -100.00%
BenchmarkStatsdUDP1000-8       1              0              -100.00%
BenchmarkStatsdUDP10000-8      1              0              -100.00%
BenchmarkStatsdUDP100000-8     1              0              -100.00%
BenchmarkStatsdUDP200000-8     1              0              -100.00%
BenchmarkStatsdUDS1-8          1              0              -100.00%
BenchmarkStatsdUDS10-8         1              0              -100.00%
BenchmarkStatsdUDS100-8        1              0              -100.00%
BenchmarkStatsdUDS1000-8       1              0              -100.00%
BenchmarkStatsdUDS10000-8      1              0              -100.00%
BenchmarkStatsdUDS100000-8     1              0              -100.00%
BenchmarkStatsdUDS200000-8     1              0              -100.00%

benchmark                      old bytes     new bytes     delta
BenchmarkStatsdUDP1-8          65            0             -100.00%
BenchmarkStatsdUDP10-8         65            0             -100.00%
BenchmarkStatsdUDP100-8        65            0             -100.00%
BenchmarkStatsdUDP1000-8       66            0             -100.00%
BenchmarkStatsdUDP10000-8      68            0             -100.00%
BenchmarkStatsdUDP100000-8     73            2             -97.26%
BenchmarkStatsdUDP200000-8     78            6             -92.31%
BenchmarkStatsdUDS1-8          78            1             -98.72%
BenchmarkStatsdUDS10-8         78            1             -98.72%
BenchmarkStatsdUDS100-8        78            1             -98.72%
BenchmarkStatsdUDS1000-8       79            1             -98.73%
BenchmarkStatsdUDS10000-8      80            2             -97.50%
BenchmarkStatsdUDS100000-8     86            5             -94.19%
BenchmarkStatsdUDS200000-8     85            7             -91.76%

it was only benchmarking a small chunck of the code and there were some random stuff in there too

This reverts commit a551161.

… ... rely on the interface instead of the implementation. I removed all the tests that were testing the implementation. The main goal here is to detach the existing unit tests from the implementation to be able to change the implementation and still test it with the old unit tests. A few benchmarks that were not useful were also removed at the same time.

…ement) setting the precision to 6 appears to be much more expensive even when formating floats with more than 6 digits after the decimal point using 6 digits was introduced here #32 most other open source projects seems to use -1 the only exception is timing which will often have a high number of digits after the decimal point. let's keep it at 6 for it

…instead

increase the perf by 10-20%

KSerrania

Very nice PR 👍, the new architecture seems neat!

Made a first pass with some comments & nits, will take a look again later.

statsd/format.go

statsd/options.go

statsd/service_check.go

statsd/statsd.go

statsd/options.go

…arent now

truthbk

Whoops, never submitted this out. Just some comments, but this looks very nice.

truthbk · 2019-07-26T10:28:56Z

statsd/buffer.go

+
+func newStatsdBuffer(maxSize, maxElements int) *statsdBuffer {
+	return &statsdBuffer{
+		buffer:      make([]byte, 0, maxSize*2),


comment here explaining the *2 would be nice, even if you have to talk a little bit about allocations and internal slice behavior.

truthbk · 2019-07-26T10:43:39Z

statsd/buffer.go

+	}
+}
+
+func (b *statsdBuffer) reset() {


These are not thread-safe.... we'll have to synchronize access to the buffer (or maybe you've dealt with it cleverly somewhere else ;)

That was intended, this struct is not thread safe. It's the role of what's using the struct to make sure it's not accessed concurrently

statsd/buffer_pool.go

truthbk · 2019-07-26T10:53:22Z

statsd/format.go

+}
+
+func appendWithoutNewlines(buffer []byte, s string) []byte {
+	// fastpath for strings without newlines


Is it really a fastpath? won't it do something really close to what you're doing below anyways? (just wondering if you'll be traversing the string twice unnecessarily).

This was ported from the old code. I was surprised as well but IndexByte has an optimized assembly implementation that is indeed faster in benchmarks https://golang.org/src/internal/bytealg/index_amd64.s

Optimizing for no newlines sounds good to me as this should be the normal case

statsd/format.go

truthbk · 2019-07-26T11:23:34Z

statsd/statsd.go

-		c.commands = c.commands[cmdsFlushed+1:]
+// flush the current buffer. Lock must be held by caller.
+// flushed buffer writen to the network asynchronously.
+func (c *Client) flushLocked() {


I feel this name is confusing. I'd prefer something like flushUnsafe or something.

truthbk · 2019-07-26T11:28:31Z

statsd/statsd.go

+	return ""
+}
+
+func (c *Client) writeMetric(m metric) error {


Also unsafe, right? I feel like this should be noted.

added an unsafe suffix

truthbk · 2019-07-26T11:31:09Z

statsd/statsd.go

-		if b != '\n' {
-			buf = append(buf, b)
-		}
+	if err := c.Flush(); err != nil {


should an error flushing prevent us from closing the sender?

Probably not. Let's do a best effort flush and ignore the error ?

truthbk · 2019-07-26T11:53:57Z

statsd/buffer_pool.go

+	case b := <-p.pool:
+		return b
+	default:
+		return newStatsdBuffer(p.bufferMaxSize, p.bufferMaxElements)


I've got mixed feelings about this... if we're using a pool, expected behavior would be to pull the buffer from the pool, the pool allows setting a bound on the resources used. Wondering if we should set a flag, with respect to getting strict pool behavior, or allowing us to dynamically grow the pool. Any thoughts?

I decided to go with this as it allows for some flexibility if you have a use case with sudden bursts of metrics. I see this as being somewhat similar to sync.Pool but with a maximum number of element that should be kept and no cleanup on gc.

truthbk

⚡️ nice!

… side

statsd/event.go

statsd/service_check.go

statsd/statsd.go

truthbk

Latest changes make sense to me! 👍

I think we're pretty much ready to go! Let's make sure all 3.x caveats (memory behavior, obsolete options, etc) are properly documented, and feel free to pull the trigger on this as soon as you're ready.

arbll · 2019-10-14T11:38:24Z

@becoded Thanks for the review. I am guessing the reason behind those changes is to add the NOOP client right ? The PR is already very large, let's move to #92

arbll and others added 30 commits May 30, 2019 16:27

remove benchmark

d5694a8

it was only benchmarking a small chunck of the code and there were some random stuff in there too

rebuild formating logic from scratch

6656d4b

test against old format logic

0c20bd3

Revert "test against old format logic"

95f6602

This reverts commit a551161.

fix logic with 0 tags

1cb0fb7

fix rate logic

08d8a6c

add a bunch of tests

034a9b4

test remove newlines in tags

1f733c3

benchmark old format

0b44d6d

remove old format benchmarking

6fc84c9

format events

23a817a

add events to benchmark

27abe68

add service check formating

1f5756e

add buffer

05cdf62

move event and service check code to their own file, cleanup

dff3196

add buffer pool

83ddc28

reset buffers on retrun

40e84b4

add async sender

4a325bd

remove async uds implementation

f561c19

[temp hack] use the new formating logic

2e70615

move sampling logic in a method

637bfc2

use sender and statsd buffer, fix a few bugs in the sender

e254623

fix a bug when reseting the statsdBuffers

080fbca

flush sender data on close

bee822d

remove formating of increment and decrement, rely on count formating …

7026d70

…instead

unlock without relying on defer

5ff6d81

increase the perf by 10-20%

add basic bench and parallel bench

3522a66

fix NewWithWriter

e85160f

KSerrania reviewed Jul 11, 2019

View reviewed changes

arbll added 4 commits July 12, 2019 14:03

extra space, MaxByte(s)PerPayload

d160684

feedback

4f2295e

add note about creating an unbuffered client

c9776d5

remove some entries about buffering as it should be completely transp…

df52748

…arent now

truthbk changed the title ~~re-architecture the client~~ re-architect the client Jul 19, 2019

truthbk reviewed Aug 9, 2019

View reviewed changes

arbll added 4 commits August 12, 2019 11:30

add comment about maxisze*2

8fd9367

add comment about statsdBuffer not being thread safe

34d738a

rename unsafe functions in statsd.go

933bd89

best effort flush

49a6494

truthbk approved these changes Sep 2, 2019

View reviewed changes

arbll added 3 commits September 3, 2019 13:26

reduce buffer pool overhead

c1a5f18

use different default pool size for udp and uds

445eddf

use UDP buffering defaults for UDS until we fix an issue on the agent…

8be3b3a

… side

becoded reviewed Sep 18, 2019

View reviewed changes

statsd/event.go Show resolved Hide resolved

statsd/service_check.go Show resolved Hide resolved

statsd/statsd.go Show resolved Hide resolved

truthbk approved these changes Oct 11, 2019

View reviewed changes

arbll added 2 commits October 14, 2019 13:33

cleanup

822c47b

add minimal readme back

b6a27b8

arbll mentioned this pull request Oct 14, 2019

3.0.0 changelog #94

Merged

arbll merged commit f3f4255 into master Oct 15, 2019

irabinovitch mentioned this pull request Oct 22, 2019

Add a NoOp client #92

Merged

arbll mentioned this pull request Oct 22, 2019

Missing WithAsyncUDS option? #97

Closed

This was referenced Nov 4, 2019

stats: fix TestUDSExportError and test submit metric error DataDog/opencensus-go-exporter-datadog#57

Closed

stats: fix DistributionData and TestUDSExportError test DataDog/opencensus-go-exporter-datadog#55

Merged

CodyPubNub mentioned this pull request Dec 13, 2019

Telemetry added in PR #91 is unable to be disabled #114

Closed

sslotnick-isp mentioned this pull request Mar 12, 2020

Upgrade DataDog client to 3.x, add support for Distribution() istreamlabs/go-metrics#13

Merged

olivielpeau mentioned this pull request Oct 26, 2020

[UDS] Use better payload size defaults #171

Merged

furkansenharputlu mentioned this pull request Jun 3, 2021

[TT-2641] Update datadog to latest and remove undefined methods TykTechnologies/tyk-pump#357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-architect the client #91

re-architect the client #91

arbll commented Jun 20, 2019 •

edited

KSerrania left a comment

truthbk left a comment

truthbk Jul 26, 2019

truthbk Jul 26, 2019

arbll Aug 12, 2019

truthbk Jul 26, 2019

arbll Aug 9, 2019 •

edited

truthbk Jul 26, 2019

truthbk Jul 26, 2019

arbll Aug 12, 2019

truthbk Jul 26, 2019

arbll Aug 12, 2019

truthbk Jul 26, 2019

arbll Aug 13, 2019

truthbk left a comment

truthbk left a comment

arbll commented Oct 14, 2019

re-architect the client #91

re-architect the client #91

Conversation

arbll commented Jun 20, 2019 • edited

What does this PR do ?

Architecture

Upgrade notes

Benchmarks against master

KSerrania left a comment

Choose a reason for hiding this comment

truthbk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbll Aug 9, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

truthbk left a comment

Choose a reason for hiding this comment

truthbk left a comment

Choose a reason for hiding this comment

arbll commented Oct 14, 2019

arbll commented Jun 20, 2019 •

edited

arbll Aug 9, 2019 •

edited