Skip to content

feat(trace-utils)!: add dedup convenience to VecMap#2049

Open
yannham wants to merge 3 commits into
mainfrom
yannham/vecmap-as-dedup
Open

feat(trace-utils)!: add dedup convenience to VecMap#2049
yannham wants to merge 3 commits into
mainfrom
yannham/vecmap-as-dedup

Conversation

@yannham
Copy link
Copy Markdown
Contributor

@yannham yannham commented May 28, 2026

What does this PR do?

Follow-up of #2022. This split off some work required in #2043 for deduplication before msgpack encoding as a separate PR.

Motivation

See #2022 for overall motivation. To integrate VecMap, it turns out we can't currently rely on the agent to properly handle maps with deuplicate entries (the "last win" semantics is the current implementation, but not guaranteed). We need to perform deduplication before msgpack encoding. This PR backports the required machinery into VecMap as a separate PR.

Additional Notes

N/A

How to test the change?

Tests included.

@yannham yannham requested a review from paullegranddc May 28, 2026 09:15
@yannham yannham requested review from a team as code owners May 28, 2026 09:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

📚 Documentation Check Results

⚠️ 602 documentation warning(s) found

📦 libdd-trace-utils - 602 warning(s)


Updated: 2026-05-28 13:14:23 UTC | Commit: f766209 | missing-docs job results

@github-actions
Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/yannham/vecmap-as-dedup

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 21 21 No change (0%)
datadog-live-debugger 6 6 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-remote-config 3 3 No change (0%)
datadog-sidecar 57 57 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 5 5 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 13 13 No change (0%)
Total 196 196 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

🔒 Cargo Deny Results

⚠️ 4 issue(s) found, showing only errors (advisories, bans, sources)

📦 libdd-trace-utils - 4 error(s)

Show output
error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:177:1
    │
177 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
    │
    ├ ID: RUSTSEC-2026-0097
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
    ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
      
      - The `log` and `thread_rng` features are enabled
      - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
      - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
      - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
      - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
      
      `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
    ├ Announcement: https://github.com/rust-random/rand/pull/1763
    ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
    ├ rand v0.8.5
      ├── (dev) libdd-common v4.1.0
      │   ├── libdd-capabilities-impl v2.0.0
      │   │   └── libdd-trace-utils v5.0.0
      │   │       └── (dev) libdd-trace-utils v5.0.0 (*)
      │   └── libdd-trace-utils v5.0.0 (*)
      ├── (dev) libdd-trace-normalization v2.0.0
      │   └── libdd-trace-utils v5.0.0 (*)
      ├── libdd-trace-utils v5.0.0 (*)
      └── proptest v1.5.0
          └── (dev) libdd-tinybytes v1.1.1
              ├── (dev) libdd-tinybytes v1.1.1 (*)
              └── libdd-trace-utils v5.0.0 (*)

error[vulnerability]: Name constraints for URI names were incorrectly accepted
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:199:1
    │
199 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0098
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0098
    ├ Name constraints for URI names were ignored and therefore accepted.
      
      Note this library does not provide an API for asserting URI names, and URI name constraints are otherwise not implemented.  URI name constraints are now rejected unconditionally.
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-965h-392x-2mh5](https://github.com/rustls/webpki/security/advisories/GHSA-965h-392x-2mh5). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.1.0
          │       ├── libdd-capabilities-impl v2.0.0
          │       │   └── libdd-trace-utils v5.0.0
          │       │       └── (dev) libdd-trace-utils v5.0.0 (*)
          │       └── libdd-trace-utils v5.0.0 (*)
          ├── libdd-common v4.1.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.1.0 (*)

error[vulnerability]: Name constraints were accepted for certificates asserting a wildcard name
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:199:1
    │
199 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0099
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0099
    ├ Permitted subtree name constraints for DNS names were accepted for certificates asserting a wildcard name.
      
      This was incorrect because, given a name constraint of `accept.example.com`, `*.example.com` could feasibly allow a name of `reject.example.com` which is outside the constraint.
      This is very similar to [CVE-2025-61727](https://go.dev/issue/76442).
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-xgp8-3hg3-c2mh](https://github.com/rustls/webpki/security/advisories/GHSA-xgp8-3hg3-c2mh). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.1.0
          │       ├── libdd-capabilities-impl v2.0.0
          │       │   └── libdd-trace-utils v5.0.0
          │       │       └── (dev) libdd-trace-utils v5.0.0 (*)
          │       └── libdd-trace-utils v5.0.0 (*)
          ├── libdd-common v4.1.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.1.0 (*)

error[vulnerability]: Reachable panic in certificate revocation list parsing
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:199:1
    │
199 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0104
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0104
    ├ A panic was reachable when parsing certificate revocation lists via [`BorrowedCertRevocationList::from_der`]
      or [`OwnedCertRevocationList::from_der`].  This was the result of mishandling a syntactically valid empty
      `BIT STRING` appearing in the `onlySomeReasons` element of a `IssuingDistributionPoint` CRL extension.
      
      This panic is reachable prior to a CRL's signature being verified.
      
      Applications that do not use CRLs are not affected.
      
      Thank you to @tynus3 for the report.
    ├ Solution: Upgrade to >=0.103.13, <0.104.0-alpha.1 OR >=0.104.0-alpha.7 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      └── rustls v0.23.37
          ├── hyper-rustls v0.27.7
          │   └── libdd-common v4.1.0
          │       ├── libdd-capabilities-impl v2.0.0
          │       │   └── libdd-trace-utils v5.0.0
          │       │       └── (dev) libdd-trace-utils v5.0.0 (*)
          │       └── libdd-trace-utils v5.0.0 (*)
          ├── libdd-common v4.1.0 (*)
          └── tokio-rustls v0.26.0
              ├── hyper-rustls v0.27.7 (*)
              └── libdd-common v4.1.0 (*)

advisories FAILED, bans ok, sources ok

Updated: 2026-05-28 13:16:10 UTC | Commit: f766209 | dependency-check job results

@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented May 28, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

semver-check | validate   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. SEMVER Validation Failed: Major API changes detected without breaking change marker in PR title or footer.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 88.66%
Overall Coverage: 72.93% (+0.02%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 15e2e0e | Docs | Datadog PR Page | Give us feedback!

Comment thread libdd-trace-utils/src/span/vec_map.rs Outdated
Comment on lines +221 to +222
static WARNED: AtomicBool = AtomicBool::new(false);
if !WARNED.swap(true, Ordering::Relaxed) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the condition is unlikely, wouldn't it be better to do a relaxed atomic load, and a conditional store? (of course, we might get duplicata if used concurrently but I don't think it's that likely or we will get that many if ti happens)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is the same discussion we had for the dedup flag, and it seems the wisdom is it's actually faster to unconditionally store (store buffer, avoided conditional even if predicted, yadayada). I think the case where it's not true is when the variable is contended (storing needs an exclusive cache line, which might invalidate the cache of other cores, while loading doesn't). But if we don't expect this to actually be accessed concurrently, the situation should be rather similar to deduped (we just have to use atomics to please the compiler)...

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.65979% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.92%. Comparing base (f7d471d) to head (15e2e0e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2049      +/-   ##
==========================================
+ Coverage   72.90%   72.92%   +0.02%     
==========================================
  Files         460      460              
  Lines       76396    76481      +85     
==========================================
+ Hits        55696    55774      +78     
- Misses      20700    20707       +7     
Components Coverage Δ
libdd-crashtracker 65.22% <ø> (ø)
libdd-crashtracker-ffi 36.82% <ø> (ø)
libdd-alloc 98.77% <ø> (ø)
libdd-data-pipeline 85.60% <ø> (ø)
libdd-data-pipeline-ffi 75.70% <ø> (ø)
libdd-common 79.89% <ø> (ø)
libdd-common-ffi 74.41% <ø> (ø)
libdd-telemetry 73.34% <ø> (ø)
libdd-telemetry-ffi 31.36% <ø> (ø)
libdd-dogstatsd-client 82.64% <ø> (ø)
datadog-ipc 76.22% <ø> (ø)
libdd-profiling 81.69% <ø> (ø)
libdd-profiling-ffi 64.79% <ø> (ø)
libdd-sampling 97.46% <ø> (ø)
datadog-sidecar 29.19% <ø> (ø)
datdog-sidecar-ffi 10.17% <ø> (ø)
spawn-worker 48.86% <ø> (ø)
libdd-tinybytes 93.80% <ø> (ø)
libdd-trace-normalization 81.71% <ø> (ø)
libdd-trace-obfuscation 87.30% <ø> (ø)
libdd-trace-protobuf 68.25% <ø> (ø)
libdd-trace-utils 88.97% <88.65%> (+0.02%) ⬆️
libdd-tracer-flare 86.88% <ø> (ø)
libdd-log 74.83% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yannham yannham changed the title feat(trace-utils): add dedup convenience to VecMap feat(trace-utils)!: add dedup convenience to VecMap May 28, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 28, 2026

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.63 MB 7.63 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 82.94 MB 82.94 MB 0% (0 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 94.02 MB 94.02 MB 0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.25 MB 10.25 MB 0% (0 B) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 24.54 MB 24.54 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 83.96 KB 83.96 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 178.23 MB 178.25 MB +0% (+16.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 915.25 MB 915.26 MB +0% (+17.22 KB) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 8.03 MB 8.03 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 83.96 KB 83.96 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 23.77 MB 23.77 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 47.43 MB 47.43 MB 0% (0 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 21.27 MB 21.27 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 85.29 KB 85.29 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 182.23 MB 182.25 MB +0% (+16.00 KB) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 908.37 MB 908.39 MB +0% (+16.97 KB) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.20 MB 6.20 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 85.29 KB 85.29 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 25.48 MB 25.48 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 45.07 MB 45.07 MB 0% (0 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 73.95 MB 73.95 MB 0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.52 MB 8.52 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 89.36 MB 89.36 MB 0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.36 MB 10.36 MB 0% (0 B) 👌

Comment on lines +202 to +203
.into_iter()
.collect(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the 2nd collect to VecMap here?
Wouldn't it be the exact same, with one less copy for the Owned variant to beDedupedVecMap::Owned(HashMap<K, V>)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants