Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic buffering and preemptive flushing #146

Merged
merged 21 commits into from
May 26, 2020

Conversation

kbogtob
Copy link
Contributor

@kbogtob kbogtob commented Apr 23, 2020

What does this PR do?

Introducing new breaking changes:
It refactors the batch class into the MessageBuffer class. It removes the usage of the #batch method on the Statsd class as buffering is now automatic and always used. It flushes preemptively when reaching a given message count (max_buffer_pool_size) or 95% of the maximum buffer size (max_buffer_payload_size). If the user wants to flush data, they could call a new method #flush on Statsd. (that method is used a lot in the specs)

Adding new specific integration tests relevant to buffering to test edge cases.

Motivation

Preparation of version 5 road map with automatic buffering, refactored telemetry (buffered as other messages) and async IO in a separate thread.

Notes

We cannot use the optimized buffer size for UDP (1432 bytes) for now because of how telemetry is implemented. It eats part of the buffer size with a worst estimation of the size it could take. But in UDP, with a 1432 byte size, only a few global tags would make the telemetry allocate a size bigger of the buffer than its own size!

I removed this problem for now by increasing the default size of the buffer. I will add another PR after this one is merged to refactor telemetry as a "client" of the buffer instead of it allocating buffer size and using directly the low level Connection class.

Copy link
Contributor

@remeh remeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High quality addition to the library! Thanks for your work on this.
I've left a few comments, the most notable one being about the telemetry stuff that I think should probably be done before merging this PR.

OK = 0
WARNING = 1
CRITICAL = 2
UNKNOWN = 3

DEFAULT_BUFFER_SIZE = 8 * 1_024
UDP_DEFAULT_BUFFER_SIZE = 8_192
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value (8kb) won't be optimal on real networks where UDP is transported in an Ethernet packet, having most of the time only 1432 bytes available (when the MTU is 1500 = everywhere). Setting this to 8kb means that there will be IP fragmentation and that there is chances to lose 8kb of metrics.

You've detailed what were the reasons for this bump, maybe we should consider doing the telemetry improvements in a PR targeting the current branch (kbogtob/automatic-buffering), to merge it into this one and then after, to merge this one in master?

module Datadog
class Statsd
class MessageBuffer
PAYLOAD_SIZE_TOLERANCE = 0.05
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the idea but are you sure that it's helping somewhere (against just using 100% of the buffer size)? I'm not against it, I just want to know if there is benefit adding this extra math in the buffer code. (mostly because while we will debug the output of the client, we'll have to keep this in mind)

With a buffer size of 1472, it represents 73 bytes, which could be enough for one more metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to send messages preemptively when we think that we can't send a new message. You're right that when using a size of 1432, it's quite a lot. Do you think I should remove it?

lib/datadog/statsd.rb Outdated Show resolved Hide resolved
lib/datadog/statsd.rb Outdated Show resolved Hide resolved
lib/datadog/statsd/message_buffer.rb Show resolved Hide resolved
Copy link
Contributor

@remeh remeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM.

Copy link
Contributor

@albertvaka albertvaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor, but I think batch would be a better name than pool for what we are doing. Feel free to ignore this if you think otherwise. Other than that LGTM.

@kbogtob kbogtob force-pushed the kbogtob/automatic-buffering branch from 8acd4c3 to 5fecfbc Compare May 26, 2020 08:36
@kbogtob kbogtob merged commit 0f1d768 into master May 26, 2020
@kbogtob kbogtob deleted the kbogtob/automatic-buffering branch May 26, 2020 08:43
@iancanderson
Copy link

iancanderson commented Nov 12, 2020

@kbogtob I'm interested in the performance gains from this and #151

Are there any plans to push a new gem release with these PRs? Thanks!

remeh added a commit that referenced this pull request Nov 16, 2020
remeh added a commit that referenced this pull request Nov 16, 2020
remeh added a commit that referenced this pull request Mar 23, 2021
Restores kbogtob performances improvements (#151 and #146)
sample_rate: nil,
disable_telemetry: false,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this was a breaking API change that didn't trigger a changelog entry or major version bump. I suggest at least retro-actively adding a changelog entry, I had to dig around in the source to find this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @wjessop thanks for your comment and sorry for the inconvenience.

I'll add a note about this in the 4th bullet point of the changelog entry of the 5.0.0 major version here, that the telemetry parameter has been renamed as well 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll go in with: #288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants