Skip to content

feat(dogstatsd): added replay functionality#1711

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 14 commits into
mainfrom
lt/dogstatsd-replay
May 28, 2026
Merged

feat(dogstatsd): added replay functionality#1711
gh-worker-dd-mergequeue-cf854d[bot] merged 14 commits into
mainfrom
lt/dogstatsd-replay

Conversation

@lucastemb
Copy link
Copy Markdown
Contributor

What

Adds DogStatsD replay support across the staged replay layers:

  • parses DogStatsD capture files and tagger state
  • exposes replay trigger and stop control endpoints
  • sends replay packets back into ADP over UDS with replay credentials
  • marks replay-injected packets on receive and resolves tags from captured tagger state

Why

Replay should preserve the origin/tag attribution from the capture file instead of resolving replayed traffic against the current live workload state.

Validation

  • cargo check -p saluki-components -p saluki-context -p saluki-io --tests

Notes

Opened as draft for human review.

@datadog-official

This comment has been minimized.

@dd-octo-sts dd-octo-sts Bot added area/core Core functionality, event model, etc. area/io General I/O and networking. area/components Sources, transforms, and destinations. source/dogstatsd DogStatsD source. labels May 20, 2026
@lucastemb lucastemb changed the title DogStatsD replay packets with captured origin tags feat(dogstatsd): added replay functionality May 20, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 21, 2026

Binary Size Analysis (Agent Data Plane)

Baseline: cdcd393 · Comparison: e56b344 · diff
Analysis Configuration: stripped binaries · Pass/Fail Threshold: +5%
Sizes: 37.47 MiB (baseline) vs 37.86 MiB (comparison)
Size Change: +400.18 KiB (+1.04%)

✅ Binary size difference within threshold

Changes by Module
Module File Size Symbols
axum +41.91 KiB 281
agent_data_plane::cli::dogstatsd +40.10 KiB 69
hyper_util -32.37 KiB 50
prost +31.63 KiB 407
core +30.20 KiB 8482
hyper +21.85 KiB 262
saluki_components::sources::dogstatsd +21.26 KiB 314
figment -17.43 KiB 572
[sections] +17.42 KiB 8
agent_data_plane::main::_{{closure}} +15.85 KiB 2
tokio +15.43 KiB 3072
saluki_components::transforms::dogstatsd_mapper +14.39 KiB 16
HUF_readDTableX2_wksp +12.23 KiB 1
hashbrown +10.89 KiB 627
datadog_protos::trace_include::stats +10.86 KiB 7
agent_data_plane::internal::env -10.67 KiB 195
HUF_decompress4X2_usingDTable_internal_bmi2 +10.37 KiB 1
HUF_decompress4X2_usingDTable_internal_default.part.0 +10.25 KiB 1
otlp_protos::otlp_include::opentelemetry -9.81 KiB 269
ZSTD_decompressSequencesLong_default.constprop.0 +9.70 KiB 1
Detailed Symbol Changes
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +2.6%  +394Ki  +2.9%  +333Ki    [29567 Others]
  [NEW]  +148Ki  [NEW]  +148Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::h1442cdfd80857eb5
  [NEW] +85.2Ki  [NEW] +85.1Ki    agent_data_plane::cli::dogstatsd::handle_dogstatsd_command::_{{closure}}::h2f44e2aae78424b6
  [NEW] +67.1Ki  [NEW] +66.9Ki    agent_data_plane::cli::run::create_topology::_{{closure}}::h683b050b7ce873b5
  [NEW] +66.4Ki  [NEW] +66.2Ki    saluki_core::topology::built::BuiltTopology::spawn::_{{closure}}::hf3da0d4e44f578ec
  [NEW] +58.2Ki  [NEW] +58.1Ki    agent_data_plane::cli::debug::handle_debug_command::_{{closure}}::h2c5c3b71f5087cac
  [NEW] +57.8Ki  [NEW] +57.6Ki    saluki_core::topology::blueprint::TopologyBlueprint::build::_{{closure}}::h737d07b4508c915e
  [NEW] +49.2Ki  [NEW] +48.9Ki    agent_data_plane::main::_{{closure}}::h013f94cec0e3d1f3
  [NEW] +48.7Ki  [NEW] +48.6Ki    core::ops::function::FnOnce::call_once::h63780479e231bd4f
  [NEW] +41.0Ki  [NEW] +40.9Ki    saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h4c31dc59c83ba86c
  [NEW] +39.6Ki  [NEW] +39.5Ki    saluki_components::transforms::apm_stats::ApmStats::process_trace::hcf8eb5c623203b12
  [DEL] -46.2Ki  [DEL] -46.1Ki    saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h5a52458f89115d09
  [DEL] -48.7Ki  [DEL] -48.6Ki    core::ops::function::FnOnce::call_once::h2381be3aed56de36
  [DEL] -55.8Ki  [DEL] -55.6Ki    agent_data_plane::cli::dogstatsd::handle_dogstatsd_command::_{{closure}}::hf8451dd5d4acbe44
  [DEL] -56.2Ki  [DEL] -56.0Ki    agent_data_plane::internal::env::ADPEnvironmentProvider::from_configuration::_{{closure}}::h63b5ef266069c98b
  [DEL] -56.4Ki  [DEL] -56.2Ki    agent_data_plane::cli::debug::handle_debug_command::_{{closure}}::h3e58b9e6fa0ace2f
  [DEL] -57.5Ki  [DEL] -57.4Ki    saluki_core::topology::blueprint::TopologyBlueprint::build::_{{closure}}::hc82801e20e65589c
  [DEL] -59.2Ki  [DEL] -59.0Ki    agent_data_plane::internal::env::workload::build_collector::_{{closure}}::hd78f2f040f66b612
  [DEL] -66.1Ki  [DEL] -65.9Ki    saluki_core::topology::built::BuiltTopology::spawn::_{{closure}}::h983cedab3a7a5d9e
  [DEL] -66.4Ki  [DEL] -66.2Ki    agent_data_plane::cli::run::create_topology::_{{closure}}::h9fef51344054dc09
  [DEL]  -143Ki  [DEL]  -143Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::hcb9ccdc5cdf3b8f5
  +1.0%  +400Ki  +1.1%  +339Ki    TOTAL

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 21, 2026

Regression Detector (Agent Data Plane)

Run ID: aa356b29-ac68-400f-baf2-3e9c4cc3badf
Baseline: cdcd3937 · Comparison: e56b3444 · diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment (35)

Experiments configured erratic: true are tagged (ignored) and skipped when determining which experiments regressed or improved. Experiments which are detected as erratic at runtime are tagged (erratic) to flag that the run's sample dispersion was high, but their regression / improvement signal still counts.

experiment goal Δ mean % links
otlp_ingest_logs_5mb_cpu (ignored) cpu ⚪ +1.57 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_cpu (erratic) cpu ⚪ +1.21 metrics profiles logs
quality_gates_rss_idle memory ⚪ +0.49 metrics profiles logs
dsd_uds_100mb_3k_contexts_cpu (erratic) cpu ⚪ +0.48 metrics profiles logs
dsd_uds_500mb_3k_contexts_cpu (erratic) cpu ⚪ +0.44 metrics profiles logs
otlp_ingest_metrics_5mb_memory memory ⚪ +0.31 metrics profiles logs
dsd_uds_1mb_3k_contexts_memory memory ⚪ +0.26 metrics profiles logs
quality_gates_rss_dsd_low memory ⚪ +0.26 metrics profiles logs
dsd_uds_512kb_3k_contexts_memory memory ⚪ +0.25 metrics profiles logs
otlp_ingest_traces_5mb_memory memory ⚪ +0.20 metrics profiles logs
quality_gates_rss_dsd_heavy memory ⚪ +0.19 metrics profiles logs
quality_gates_rss_dsd_medium memory ⚪ +0.16 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_memory memory ⚪ +0.15 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_throughput throughput ⚪ -0.06 metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory ⚪ +0.02 metrics profiles logs
otlp_ingest_traces_5mb_throughput throughput ⚪ -0.01 metrics profiles logs
dsd_uds_512kb_3k_contexts_throughput throughput ⚪ -0.00 metrics profiles logs
dsd_uds_1mb_3k_contexts_throughput throughput ⚪ -0.00 metrics profiles logs
otlp_ingest_logs_5mb_throughput (ignored) throughput ⚪ +0.00 metrics profiles logs
dsd_uds_100mb_3k_contexts_throughput throughput ⚪ +0.00 metrics profiles logs
dsd_uds_10mb_3k_contexts_throughput throughput ⚪ +0.00 metrics profiles logs
dsd_uds_100mb_3k_contexts_memory memory ⚪ -0.01 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_throughput throughput ⚪ +0.02 metrics profiles logs
otlp_ingest_metrics_5mb_throughput throughput ⚪ +0.03 metrics profiles logs
dsd_uds_10mb_3k_contexts_memory memory ⚪ -0.05 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_memory memory ⚪ -0.08 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_cpu (erratic) cpu ⚪ -0.14 metrics profiles logs
dsd_uds_500mb_3k_contexts_memory memory ⚪ -0.20 metrics profiles logs
dsd_uds_500mb_3k_contexts_throughput throughput ⚪ +0.46 metrics profiles logs
dsd_uds_512kb_3k_contexts_cpu (erratic) cpu ⚪ -0.48 metrics profiles logs
otlp_ingest_metrics_5mb_cpu (erratic) cpu ⚪ -0.91 metrics profiles logs
dsd_uds_10mb_3k_contexts_cpu (erratic) cpu ⚪ -1.06 metrics profiles logs
otlp_ingest_traces_5mb_cpu (erratic) cpu ⚪ -1.61 metrics profiles logs
dsd_uds_1mb_3k_contexts_cpu (erratic) cpu ⚪ -3.38 metrics profiles logs
otlp_ingest_logs_5mb_memory (ignored) memory ⚪ -9.94 metrics profiles logs
Bounds Checks: ✅ Passed (5)
experiment check replicates observed links
quality_gates_rss_dsd_heavy memory_usage 10/10 ✅ 123 MiB ≤ 140 MiB metrics profiles logs
quality_gates_rss_dsd_low memory_usage 10/10 ✅ 39.8 MiB ≤ 50 MiB metrics profiles logs
quality_gates_rss_dsd_medium memory_usage 10/10 ✅ 59.9 MiB ≤ 75 MiB metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 ✅ 177 MiB ≤ 200 MiB metrics profiles logs
quality_gates_rss_idle memory_usage 10/10 ✅ 26.6 MiB ≤ 40 MiB metrics profiles logs
Explanation

A change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression (is_regression: true). Improvements use the matching criteria for the improving direction. Experiments configured erratic: true (tagged (ignored)) are skipped outright; experiments detected as erratic at runtime (tagged (erratic)) still count, since that flag describes sample dispersion rather than directional certainty. The Δ mean % cell is colored accordingly: 🟢 = improvement, 🔴 = regression, ⚪ = neutral. Reduction in CPU or memory is an improvement; reduction in ingress throughput is a regression.

@lucastemb lucastemb marked this pull request as ready for review May 21, 2026 16:35
@lucastemb lucastemb requested a review from a team as a code owner May 21, 2026 16:35
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 93a6fa12e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lib/saluki-io/src/net/unix/replay_send.rs Outdated
Comment thread lib/saluki-io/src/net/unix/replay_send.rs Outdated
Comment on lines +394 to +396
struct TopologyControlSurfaces {
dogstatsd: Option<DogStatsDControlSurface>,
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually really like this concept of grouping together the necessary types to provide a "control surface" over a given pipeline.

One thing we should do as an immediate follow-up PR, I think, is actually unify this such that we also include the DSD Stats stuff in here as well. I think that would mean moving things around so we basically pass back just the API handler, and create_topology actually creates the component configuration and all of that... but it seems doable.

That would also, I believe, let us remove the need for a separate DogStatsDControlPlaneConfiguration type.

Comment thread bin/agent-data-plane/src/cli/run.rs Outdated
Comment thread lib/saluki-context/src/origin.rs Outdated
@dd-octo-sts dd-octo-sts Bot removed the area/core Core functionality, event model, etc. label May 22, 2026
@lucastemb lucastemb requested a review from tobz May 22, 2026 21:08
Comment thread lib/saluki-io/src/net/unix/non_linux.rs Outdated
Comment thread lib/saluki-io/src/net/unix/replay_send.rs Outdated
Comment on lines +8 to +10
use saluki_components::sources::{
TimestampResolution, TrafficCaptureReader, DEFAULT_REPLAY_LOOPS, REPLAY_CREDENTIALS_GID,
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a code smell to me, but also it feels non-blocking. I'd say we should follow up by seeing if we can reorganize some of the DSD replay code into datadog-agent-commons.

@lucastemb lucastemb requested a review from tobz May 28, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/components Sources, transforms, and destinations. area/io General I/O and networking. mergequeue-status: done source/dogstatsd DogStatsD source.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants