fix(core): use compensated summation for histograms#1666
Conversation
Use modified Neumaier algorithm to calculate sums, counts, and quantiles
for histogram samples.
The naive `sum += value * weight` loop suffers catastrophic cancellation
when the sample stream contains values of wildly different magnitudes.
The classic Kahan/Peters counter-example `{1, +1e100, 1, -1e100}`
evaluates to 0 with naive summation but to the correct 2.0 with the new
algorithm.
Signed-off-by: Mark Kirichenko <mark.kirichenko@datadoghq.com>
Binary Size Analysis (Agent Data Plane)Target: 55bca14 (baseline) vs df02eb3 (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
core |
-33.99 KiB | 1451 |
smallvec |
+22.57 KiB | 62 |
figment |
-10.78 KiB | 12 |
saluki_core::data_model::event |
-8.63 KiB | 22 |
[Unmapped] |
-4.36 KiB | 1 |
anon.0cac25fc52ba6f4fc475348a8c66d8e3.39.llvm.4081512695721216103 |
+3.90 KiB | 1 |
anon.8bb1cfcdf181d421e8889bc3626b8144.17.llvm.125851317713716366 |
-3.81 KiB | 1 |
[sections] |
-3.30 KiB | 7 |
anon.bcfbe2edddf7aafb2d5d5e0cc5ffa1e5.16.llvm.12024661772337668300 |
-3.15 KiB | 1 |
hashbrown |
+3.15 KiB | 24 |
anon.0f2fa1d1fad1031510176699744ee20b.644.llvm.3708903403341574001 |
+3.06 KiB | 1 |
papaya |
+2.56 KiB | 11 |
anon.eb51c975e2567ebfca80d7da0abd4cd1.7.llvm.11556325092004374934 |
-2.55 KiB | 1 |
anon.40340ab29c26454e228b882d4ecb70d4.0.llvm.14605026808982526804 |
+2.54 KiB | 1 |
saluki_api::DynamicRoute::http |
+1.99 KiB | 1 |
anon.bdadb00871588c874a55b8b73ce579a9.0.llvm.8997752761958168434 |
-1.93 KiB | 1 |
anon.41b0b8befc3118c3a2b0f17ec06872eb.9.llvm.17862187855904175761 |
+1.93 KiB | 1 |
alloc |
-1.91 KiB | 68 |
serde_core |
+1.88 KiB | 40 |
tokio_util |
-1.84 KiB | 11 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +6.41Ki [NEW] +6.33Ki matchit::router::Router<T>::insert::h261aa988f7ce6a25
[NEW] +6.14Ki [NEW] +6.05Ki matchit::router::Router<T>::insert::hfe75556626a71a8d
[NEW] +6.07Ki [NEW] +5.99Ki matchit::tree::Node<T>::insert::h889792c8a42776b1
+904% +5.75Ki +10e2% +5.75Ki axum::routing::Router<S>::route::h0531592b945792f9
[NEW] +5.42Ki [NEW] +5.33Ki serde_core::de::MapAccess::next_value::ha8691cfa6c85eadd
+795% +5.40Ki +908% +5.40Ki tokio::runtime::runtime::Runtime::block_on::h59580466f3413de1
+992% +5.27Ki +12e2% +5.27Ki tokio::runtime::runtime::Runtime::block_on::h26cc5fddc7a2e095
[NEW] +3.90Ki [NEW] +16 anon.0cac25fc52ba6f4fc475348a8c66d8e3.39.llvm.4081512695721216103
[NEW] +3.06Ki [NEW] +74 anon.0f2fa1d1fad1031510176699744ee20b.644.llvm.3708903403341574001
[DEL] -2.55Ki [DEL] -80 anon.eb51c975e2567ebfca80d7da0abd4cd1.7.llvm.11556325092004374934
[DEL] -2.89Ki [DEL] -2.77Ki quick_cache::shard::CacheShard<Key,Val,We,B,L,Plh>::insert::haf55a3f4a81ec55a
[DEL] -3.15Ki [DEL] -74 anon.bcfbe2edddf7aafb2d5d5e0cc5ffa1e5.16.llvm.12024661772337668300
[DEL] -3.48Ki [DEL] -2.33Ki _<serde_core::de::impls::<impl serde_core::de::Deserialize for core::time::Duration>::deserialize::DurationVisitor as serde_core::de::Visitor>::visit_map::hed68c5e9e330be6d
[DEL] -3.81Ki [DEL] -16 anon.8bb1cfcdf181d421e8889bc3626b8144.17.llvm.125851317713716366
-51.3% -4.36Ki [ = ] 0 [Unmapped]
[DEL] -5.26Ki [DEL] -5.10Ki _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::hedaa6bc6b4faff3a
[DEL] -6.01Ki [DEL] -5.93Ki matchit::tree::Node<T>::insert::ha63513d725e691c6
[DEL] -6.11Ki [DEL] -6.02Ki matchit::router::Router<T>::insert::h02e8e157485e9a5b
[DEL] -6.17Ki [DEL] -6.06Ki axum::routing::path_router::PathRouter<S,_>::route::h75fd76341dec4a4b
[DEL] -6.39Ki [DEL] -6.31Ki matchit::tree::Node<T>::insert::h2e5068899f601750
-0.6% -37.1Ki -0.6% -25.2Ki [7148 Others]
-0.1% -39.9Ki -0.1% -19.6Ki TOTAL
Regression Detector (Agent Data Plane)Run ID: Optimization Goals: ✅ No significant changes detectedFine details of change detection per experiment (35)Experiments configured
Bounds Checks: ✅ Passed (5)
ExplanationA change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression ( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df02eb32c4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let mut ddsketch = DDSketch::default(); | ||
| for sample in histogram.samples() { | ||
| ddsketch.insert_n(sample.value.into_inner(), sample.weight); | ||
| ddsketch.insert_n(sample.value.into_inner(), sample.weight.0 as u64); |
There was a problem hiding this comment.
Preserve fractional histogram weights when encoding sketches
When this encoder handles a histogram built from a non-integer sample rate such as @0.21, Histogram::insert now stores sample.weight as the raw weight (~4.76), while summary.count() rounds that to the nearest sample count. This cast truncates the same sample to 4 before inserting it into the DDSketch, so encoded histogram payloads undercount fractional-weight samples and can disagree with the aggregate count/sum produced from the same histogram. Convert the raw weight using the same rounding/accounting policy before passing it to insert_n.
Useful? React with 👍 / 👎.
19791e3
into
main
## Summary
Use modified Neumaier algorithm to calculate sums, counts, and quantiles for histogram samples.
The naive `sum += value * weight` loop suffers catastrophic cancellation when the sample stream contains values of wildly different magnitudes. The classic Kahan/Peters counter-example `{1, +1e100, 1, -1e100}` evaluates to 0 with naive summation but to the correct 2.0 with the new algorithm.
## Change Type
- [x] Bug fix
- [ ] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance
## How did you test this PR?
Added unit tests to check correctness.
## References
[Similar PR in Datadog-Agent](DataDog/datadog-agent#49913).
Co-authored-by: mark.kirichenko <mark.kirichenko@datadoghq.com> 19791e3
Summary
Use modified Neumaier algorithm to calculate sums, counts, and quantiles for histogram samples.
The naive
sum += value * weightloop suffers catastrophic cancellation when the sample stream contains values of wildly different magnitudes. The classic Kahan/Peters counter-example{1, +1e100, 1, -1e100}evaluates to 0 with naive summation but to the correct 2.0 with the new algorithm.Change Type
How did you test this PR?
Added unit tests to check correctness.
References
Similar PR in Datadog-Agent.