fix(metrics): unblock OTLP/JSON histogram, expo, summary ingestion#60261
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
7606959 to
b46e116
Compare
|
| let counter = &metrics[0]; | ||
| let histogram = &metrics[1]; | ||
|
|
||
| eprintln!("counter.data is_some = {}", counter.data.is_some()); | ||
| eprintln!("histogram.data is_some = {}", histogram.data.is_some()); | ||
| if let Some(Data::Histogram(h)) = &histogram.data { | ||
| eprintln!("histogram.data_points.len = {}", h.data_points.len()); | ||
| if let Some(dp) = h.data_points.first() { | ||
| eprintln!( | ||
| "dp.count={} sum={:?} bucket_counts={:?} explicit_bounds={:?}", | ||
| dp.count, dp.sum, dp.bucket_counts, dp.explicit_bounds | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Superfluous
eprintln! debug prints in tests
Several tests contain eprintln! calls (e.g. counter.data is_some, histogram.data is_some, dp.count=...) that were clearly left in from development. They produce noise on --nocapture runs and add no assertion value since the asserts below them already capture the failure message. Per the project simplicity rule "has no superfluous parts," these can be removed. The same pattern appears in exponential_histogram_with_string_u64s, summary_with_string_u64s, number_data_point_as_int_string, edge_u64_above_i64_max_round_trips, edge_as_int_signed_boundaries, and histogram_with_unquoted_u64_works.
Prompt To Fix With AI
This is a comment left during a code review.
Path: rust/capture-logs/tests/metrics_test.rs
Line: 46-59
Comment:
**Superfluous `eprintln!` debug prints in tests**
Several tests contain `eprintln!` calls (e.g. `counter.data is_some`, `histogram.data is_some`, `dp.count=...`) that were clearly left in from development. They produce noise on `--nocapture` runs and add no assertion value since the asserts below them already capture the failure message. Per the project simplicity rule "has no superfluous parts," these can be removed. The same pattern appears in `exponential_histogram_with_string_u64s`, `summary_with_string_u64s`, `number_data_point_as_int_string`, `edge_u64_above_i64_max_round_trips`, `edge_as_int_signed_boundaries`, and `histogram_with_unquoted_u64_works`.
How can I resolve this? If you propose a fix, please make it concise.| "bucketCounts":["1"], | ||
| "explicitBounds":[] | ||
| }] | ||
| }} | ||
| ] | ||
| }] | ||
| }] | ||
| }"#; | ||
|
|
||
| let result = parse_otel_metrics_message(&Bytes::from(json)); | ||
| assert!( | ||
| result.is_err(), | ||
| "spec-violating count=\"not-a-number\" must be rejected, not silently dropped" | ||
| ); | ||
| } | ||
|
|
||
| /// Regression for the gap that necessitated the expanded EXPONENTIAL_HISTOGRAM | ||
| /// defaults: client sends a minimal spec-valid expo without any of the upstream- | ||
| /// undeclared-default fields. Without our defaults, this hard-errors and the | ||
| /// metric silently drops. | ||
| #[test] | ||
| fn edge_expo_with_minimal_fields_only() { | ||
| let json = r#"{ | ||
| "resourceMetrics":[{ | ||
| "resource":{"attributes":[]}, | ||
| "scopeMetrics":[{ | ||
| "scope":{"name":"x"}, | ||
| "metrics":[ | ||
| {"name":"minimal.expo","exponentialHistogram":{ | ||
| "aggregationTemporality":2, | ||
| "dataPoints":[{"timeUnixNano":"1700000000000000000"}] | ||
| }} | ||
| ] | ||
| }] | ||
| }] | ||
| }"#; |
There was a problem hiding this comment.
Repeated test structure for
minimal fields scenario
edge_expo_with_minimal_fields_only and edge_summary_with_minimal_fields_only are structurally identical: send a payload with a single metric variant containing one data point with only timeUnixNano, then assert data is not None. The team preference is parameterised tests. These two (and potentially the three *_with_string_u64s variants) could be expressed as a single table-driven test, keeping the payloads in a &[(&str, fn(&Metric) -> bool)] slice and removing the repeated boilerplate.
Context Used: Do not attempt to comment on incorrect alphabetica... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: rust/capture-logs/tests/metrics_test.rs
Line: 395-430
Comment:
**Repeated test structure for `minimal fields` scenario**
`edge_expo_with_minimal_fields_only` and `edge_summary_with_minimal_fields_only` are structurally identical: send a payload with a single metric variant containing one data point with only `timeUnixNano`, then assert `data` is not `None`. The team preference is parameterised tests. These two (and potentially the three `*_with_string_u64s` variants) could be expressed as a single table-driven test, keeping the payloads in a `&[(&str, fn(&Metric) -> bool)]` slice and removing the repeated boilerplate.
**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
DanielVisca
left a comment
There was a problem hiding this comment.
Looks good,
when one of these upstream gaps closes the failing canary a good signal. The boundary tests seem thorough too.
Greptile's comments are probably fine to ignore for now.The eprintln only print on test failure (cargo captures stderr by default), and the inner state diagnostics on failure are worthwhile at this point.


Problem
The
opentelemetry-protoRust crate has several upstream deserialization gaps that cause OTLP/JSON metric payloads to silently drop data rather than returning errors:value: {}AnyValue objects fail to deserialize.fixed64/uint64/sfixed64fields (count,zeroCount,asInt,bucketCounts, and timestamp fields) lack thedeserialize_string_to_u64annotation, so the OTLP/JSON spec-canonical string encoding silently producesdata: Noneinstead of a parsed value.ExponentialHistogram,ExponentialHistogramDataPoint,SummaryDataPoint,Buckets, andExemplarall lack#[serde(default)], so any missing non-Optionproto field hard-errors and trips the upstream silencing pattern.The silencing pattern itself —
Metric.databeing#[serde(flatten)] Option<Data>on a#[serde(default)]struct — means any inner deserialization failure is swallowed and the metric is dropped with no log line and no error returned to the client.Changes
patch_otel_jsonis extended with three independently-removable workaround layers, each tagged with aFIXMEreferencing its upstream issue:count,zeroCount,asInt, andbucketCountselements viacoerce_string_to_integer, which triesi64first thenu64to handle both signed and unsigned spec-valid values.timeUnixNanoandstartTimeUnixNanodescendants insideexponentialHistogramandsummaryvariants viacoerce_unix_nano_descendants.ExponentialHistogram,ExponentialHistogramDataPoint,SummaryDataPoint,Buckets, andExemplarviafill_*_defaultsfunctions, preventing hard-errors on minimal but spec-valid payloads.How did you test this code?
A new integration test file
tests/metrics_test.rscovers:u64fields alongside a sum counter (the primary regression case).u64fields (baseline sanity check).u64and timestamp fields.u64fields.NumberDataPoint.asIntas a JSON string.u64::MAXround-trip via theu64fallback path incoerce_string_to_integer.i64::MAX,i64::MIN,0,±1) forasInt.bucketCountsarray.exponentialHistogram: {}variant.Two tests (
edge_negative_value_in_u64_field_should_error,edge_non_numeric_string_in_u64_field_should_error) are marked#[ignore]because the upstream silencing pattern currently prevents them from returning errors. They should be un-ignored once the upstream structure changes.Publish to changelog?
No