fix(export): Fix histogram theft bug on interleaved and incomplete histograms#323
fix(export): Fix histogram theft bug on interleaved and incomplete histograms#323bwplotka wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the histogram processing logic in google/export/transform.go by replacing buildDistribution with buildDistributions. This change allows the exporter to handle interleaved (ungrouped) histogram samples from sources like Kong by caching and returning multiple completed distributions in a single batch. Comprehensive unit tests have also been added to cover these scenarios. The review feedback suggests clarifying the necessity of the touched slice for deterministic test ordering, pre-allocating the result slice to avoid unnecessary allocations, and cleaning up obsolete TODO comments in the new test cases.
40f5d56 to
bd3aa88
Compare
| // Whether to not emit a sample. | ||
| skip bool | ||
|
|
||
| hash uint64 |
There was a problem hiding this comment.
This is allocating bigger structs, but if we trust AI benchmark it does not yield big difference.
Assessment: #324
|
Given some risks, I wonder if we shouldn't release an unofficial image first for cx to try. |
|
Experimental image with this fix, captured in release-2.53.5-gmp-histcompat branch: |
…theft Add cases to TestSampleBuilder for: - ungrouped (interleaved) histogram samples - ungrouped (interleaved) histogram samples with first group incomplete - ungrouped (interleaved) histogram samples with first group skipped due to new bucket Enforce strict _bucket, _count, _sum ordering for test samples. TAG=agy CONV=8f508481-de1c-4e6b-ad3a-718d089a2fbe
When histogram samples in a scrape batch are ungrouped (interleaved across different series label sets for the same metric name), existing buildDistribution would return as soon as any histogram in the cache completed. This caused histogram distribution theft (attaching completed distributions to the wrong series) and sample loss for other interleaved series. Replace buildDistribution with buildHistograms to consume the entire contiguous block of samples for a histogram metric name. Cache series metadata (hash, proto, lset) on distribution entries and emit all completed distributions in deterministic order when the block ends. Incomplete or skipped histogram distributions are cleanly discarded without being emitted under incorrect series label sets. TAG=agy CONV=ae038258-882b-4fbd-8b3f-15976bc590f2
Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
bd3aa88 to
7ce3854
Compare
|
/prombench |
Summary
When histogram samples in a scrape batch are ungrouped or interleaved across different series label sets for the same metric family (such as from non-compliant sources like Kong/kong#14925), the existing
buildDistributionlogic insampleBuilderreturned prematurely as soon as any single histogram in the cache completed (dist.complete()).This premature return caused two critical bugs:
Key Changes
1. Contiguous Block Processing (
google/export/transform.go)buildDistributionwithbuildDistributions, called fromnext. Instead of aborting on the first completed distribution,buildDistributionsnow consumes the entire contiguous block of samples for a histogram metric family name before emitting results.distributionstruct to store series metadata (hash,proto, andlset). When the block finishes, all completed distributions are built and emitted using their own cached metadata, guaranteeing deterministic ordering and accurate label attribution.+Infbucket) or skipped series are cleanly discarded at the end of the block without being emitted under incorrect label sets or causing sample loss for other series.results []hashedSeriestosampleBuilderto pool and reuse result slices across iterations without extra heap allocations. Added adefercleanup block inbuildDistributionsto return cached distribution objects to the pool (putDistribution) and clear the map after processing each metric family.2. Regression Testing (
google/export/transform_test.go)TestSampleBuilder(referencing internal regression trackingb/516519320):cmpopts.SortSlices) to test assertions when comparing multiple emitted histogram series.Related Issues
b/516519320