enhancement(dogstatsd): allow autoscaling UDP stream handlers on available parallelism#1624
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Binary Size Analysis (Agent Data Plane)Target: 4aeb9c7 (baseline) vs 474ee65 (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
figment |
+83.99 KiB | 81 |
hyper_util |
-51.07 KiB | 26 |
hyper |
+35.82 KiB | 112 |
otlp_protos::otlp_include::opentelemetry |
-34.86 KiB | 103 |
prost |
+31.38 KiB | 197 |
core |
+16.71 KiB | 1768 |
saluki_components::transforms::dogstatsd_mapper |
-15.40 KiB | 7 |
tokio_rustls |
+11.88 KiB | 11 |
saluki_components::common::datadog |
-11.83 KiB | 37 |
rustls |
-8.94 KiB | 9 |
serde_core |
+8.44 KiB | 72 |
serde_with |
+8.33 KiB | 25 |
http_body_util |
+7.89 KiB | 35 |
h2 |
-7.88 KiB | 124 |
saluki_core::data_model::event |
+6.49 KiB | 23 |
saluki_components::transforms::apm_stats |
-6.12 KiB | 10 |
saluki_context::hash::hash_context_with_seen |
-6.04 KiB | 3 |
rmp |
+5.33 KiB | 21 |
tracing |
-5.33 KiB | 19 |
bytes |
-5.13 KiB | 26 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
+0.7% +55.1Ki +0.5% +28.3Ki [8798 Others]
[NEW] +18.2Ki [NEW] +18.0Ki _<saluki_components::sources::dogstatsd::DogStatsDConfiguration as saluki_core::components::sources::builder::SourceBuilder>::build::_{{closure}}::h0a04e990e65ed1d6
+283% +15.9Ki +288% +15.9Ki h2::proto::connection::DynConnection<B>::recv_frame::hfe9e3743004c5b69
+62e2% +15.0Ki +37e3% +15.0Ki _<saluki_components::transforms::trace_sampler::TraceSampler as saluki_core::components::transforms::SynchronousTransform>::transform_buffer::h1728b8cbf2bf6b0a
[NEW] +12.4Ki [NEW] +12.2Ki _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::h56147e9779637f82
[NEW] +11.9Ki [NEW] +11.7Ki _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::h9997f85ef8cf1fd5
[NEW] +11.0Ki [NEW] +10.9Ki _<core::pin::Pin<P> as core::future::future::Future>::poll::h8bf635c589f5ae5d
[NEW] +10.9Ki [NEW] +10.8Ki saluki_components::sources::otlp::logs::transform::transform_log_record::hefcb9c2341939142
[NEW] +9.74Ki [NEW] +9.58Ki _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h0a1ad77b53d917cc
+665% +8.73Ki +717% +8.73Ki tokio_rustls::common::Stream<IO,C>::read_io::h02bb9fd672079913
[NEW] +8.45Ki [NEW] +8.28Ki _<serde_with::content::de::ContentDeserializer<E> as serde_core::de::Deserializer>::deserialize_struct::h72e6f84457605ff5
+82% +8.31Ki +83% +8.31Ki h2::server::Connection<T,B>::poll_closed::hb97b7451a1190592
+158% +7.86Ki +162% +7.86Ki saluki_components::transforms::trace_obfuscation::sql::obfuscate_sql_string::h390ed7ef0924ab0a
-62.9% -7.73Ki -63.7% -7.73Ki _<hyper_util::server::conn::auto::Connection<I,S,E> as core::future::future::Future>::poll::h1e8af002a4a1e0d9
[DEL] -8.42Ki [DEL] -8.18Ki saluki_components::transforms::dogstatsd_mapper::_::_<impl serde_core::de::Deserialize for saluki_components::transforms::dogstatsd_mapper::MetricMappingConfig>::deserialize::h0e19828712c22b7b
[DEL] -8.62Ki [DEL] -8.51Ki rustls::conn::ConnectionCore<Data>::process_new_packets::h69c943ffffcd0cb3
-84.9% -11.2Ki -86.1% -11.2Ki _<saluki_components::sources::otlp::logs::translator::OtlpLogsTranslator as core::iter::traits::iterator::Iterator>::next::h3860cbd002a9a1e5
[DEL] -15.4Ki [DEL] -15.1Ki saluki_components::common::datadog::apm::_::_<impl serde_core::de::Deserialize for saluki_components::common::datadog::apm::ApmConfiguration>::deserialize::h1893898e9ea8a5a8
[DEL] -16.1Ki [DEL] -15.8Ki _<saluki_components::sources::dogstatsd::DogStatsDConfiguration as saluki_core::components::sources::builder::SourceBuilder>::build::_{{closure}}::hb64dfc27f47c0320
[DEL] -16.5Ki [DEL] -16.4Ki saluki_components::transforms::trace_sampler::TraceSampler::process_trace::h5195145a1b6aa3ad
[DEL] -19.4Ki [DEL] -19.2Ki _<hyper_util::server::conn::auto::Connection<I,S,E> as core::future::future::Future>::poll::h12ac62a4f3751e08
+0.2% +90.2Ki +0.2% +63.5Ki TOTAL
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6dc20dc80e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: 0ab03884-f062-4665-9c5d-4c6ab7a7dc4e Baseline: 4aeb9c7 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | +4.52 | [+4.19, +4.84] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +2.55 | [-1.65, +6.75] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.03 | [-0.15, +0.09] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | +4.87 | [-52.05, +61.78] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | +4.52 | [+4.19, +4.84] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +2.55 | [-1.65, +6.75] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | +2.08 | [-3.97, +8.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | +1.70 | [-4.29, +7.68] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | +1.28 | [-1.04, +3.60] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +0.78 | [-1.22, +2.77] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | +0.33 | [-30.53, +31.19] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | +0.24 | [+0.07, +0.40] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | +0.08 | [-0.10, +0.25] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | +0.04 | [-0.09, +0.17] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | +0.03 | [-0.21, +0.28] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | +0.01 | [-0.06, +0.08] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | +0.00 | [-0.14, +0.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.05, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.03 | [-0.15, +0.09] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | -0.04 | [-0.21, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | -0.04 | [-0.12, +0.03] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | -0.06 | [-0.22, +0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | -0.11 | [-0.27, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | -0.14 | [-0.27, -0.01] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | -0.20 | [-0.37, -0.04] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | -0.25 | [-0.39, -0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | -0.25 | [-0.33, -0.18] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | -0.26 | [-1.69, +1.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | -0.31 | [-0.45, -0.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | -0.74 | [-0.90, -0.59] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | -0.86 | [-1.03, -0.69] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | -1.01 | [-1.15, -0.87] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | -1.12 | [-1.16, -1.09] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | -1.29 | [-60.02, +57.45] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | -1.39 | [-3.60, +0.81] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | -2.17 | [-2.28, -2.06] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 121.57MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 39.49MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 59.97MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 178.38MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 26.87MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
webern
left a comment
There was a problem hiding this comment.
Cool. Is there a known performance gain or is it more theoretical?
| }; | ||
|
|
||
| /// `dogstatsd_autoscale_udp_listeners` — bind multiple UDP sockets with SO_REUSEPORT for kernel load balancing. | ||
| DOGSTATSD_AUTOSCALE_UDP_LISTENERS = SalukiAnnotation { |
There was a problem hiding this comment.
Is it necessary to make it configurable? To have it on? Would it cause problems.
This is just coming from configuration fatigue 😅
There was a problem hiding this comment.
It would probably be fine, but...
My main hesitation is that, internally, we have very little UDP-based DSD traffic... and it's almost the complete opposite for customers. Whatever insight we derive about its behavior when dogfooding will likely be irrelevant when running in a customer environment.
| /// the number of available vCPUs. | ||
| /// | ||
| /// Returns `None` when autoscaling is disabled, which keeps the legacy single-socket behavior. The platform | ||
| /// gate for `SO_REUSEPORT` lives inside the listener — this method intentionally stays platform-agnostic. |
There was a problem hiding this comment.
👍 for a platform agnostic function
| /// | ||
| /// - Ability to configure `Listener` to emit multiple streams for connectionless address families, allowing for load | ||
| /// balancing. (Only possible for UDP via SO_REUSEPORT, as UDS does not support SO_REUSEPORT.) | ||
| /// On Linux, UDP listeners can be configured to bind multiple sockets to the same address using `SO_REUSEPORT`, |
There was a problem hiding this comment.
Should work for unix/darwin to, right?
There was a problem hiding this comment.
SO_REUSEPORT/SO_REUSEADDR are available on Darwin, but they don't provide any load balancing like they do on Linux.
| fn bind_one(addr: SocketAddr) -> io::Result<TokioUdpSocket> { | ||
| let socket = Socket::new(Domain::for_address(addr), Type::DGRAM, Some(Protocol::UDP))?; | ||
| socket.set_reuse_address(true)?; | ||
| socket.set_reuse_port(true)?; |
There was a problem hiding this comment.
Would it be possible to try this flow with set_reuse_port and fall back to the standard approach if it fails? That seems like it'd be better for targeting than target_os, since IIUC the practical target set here is "linux 3.9+ and darwin?" I'm not sure how much <3.9 is out in the wild but RHEL 7 is only on 3.10 and we have quite a few of those
There was a problem hiding this comment.
"Yes" to having it try to fallibly set the option as a sideband way to detect support, "no" to Darwin, since Darwin doesn't actually do any load balancing between the sockets.
@webern There's definitely an upper bound to how fast we can accept datagrams -- whether UDP or UDS -- with a single task. For connectionless scenarios, you run the risk of dropping data if you can't keep up since there's no backpressure signal to the clients. The Datadog Agent tries to handle this by separating receive from decode, but it pays the price in terms of having to allocate and shuttle large pre-decode buffers between the receive and decode ("pipeline") tasks.. whereas we want to keep all of our benefits of how we do it now (zero-copy decode before |
3d5ba43 to
474ee65
Compare

Summary
This PR adds the ability to "autoscale" the UDP stream handlers in the DogStatsD source based on available parallelism.
Currently, the UDP listener for the DogStatsD source emits a single stream, in the first
acceptcall, and then pends forever after that. It does the same thing for UDS datagram listeners. The reason is simple: these are connectionless transports, so there's really only ever one "stream" we can yield back unlike TCP or UDS streams.However, on Linux platforms, we have a trick available to us for UDP:
SO_REUSEPORT. This socket option allows binding N UDP sockets to the same address/port in a way that the kernel will load balance incoming datagrams across them, letting us exceed the processing capacity of a single task.This PR introduces support for yielding multiple UDP streams from
Listenerand using that in the DogStatsD source through a new configuration setting,dogstatsd_autoscale_udp_listeners. When enabled, the DogStatsD source will configure the UDP listener to emit N UDP streams, where N is calculated from the available parallelism: for every 8 vCPUs, we yield an additional UDP stream up to a maximum of 4. This gives us a small amount of additional parallelism without trying to track too closely to vCPU, such that the smaller count of UDP stream handlers can hopefully maximize their on-CPU time.Change Type
How did you test this PR?
saluki-iochanges.References
DADP-2
Closes #1593