fix(metrics): unblock OTLP/JSON histogram, expo, summary ingestion by jonmcwest · Pull Request #60261 · PostHog/posthog

jonmcwest · 2026-05-27T14:53:58Z

Problem

The opentelemetry-proto Rust crate has several upstream deserialization gaps that cause OTLP/JSON metric payloads to silently drop data rather than returning errors:

[upstream-1253] Empty value: {} AnyValue objects fail to deserialize.
[upstream-3328] Several fixed64/uint64/sfixed64 fields (count, zeroCount, asInt, bucketCounts, and timestamp fields) lack the deserialize_string_to_u64 annotation, so the OTLP/JSON spec-canonical string encoding silently produces data: None instead of a parsed value.
[upstream-unreported] ExponentialHistogram, ExponentialHistogramDataPoint, SummaryDataPoint, Buckets, and Exemplar all lack #[serde(default)], so any missing non-Option proto field hard-errors and trips the upstream silencing pattern.

The silencing pattern itself — Metric.data being #[serde(flatten)] Option<Data> on a #[serde(default)] struct — means any inner deserialization failure is swallowed and the metric is dropped with no log line and no error returned to the client.

Changes

patch_otel_json is extended with three independently-removable workaround layers, each tagged with a FIXME referencing its upstream issue:

String→integer coercion for count, zeroCount, asInt, and bucketCounts elements via coerce_string_to_integer, which tries i64 first then u64 to handle both signed and unsigned spec-valid values.
Timestamp coercion for timeUnixNano and startTimeUnixNano descendants inside exponentialHistogram and summary variants via coerce_unix_nano_descendants.
Default injection for all required-by-serde fields in ExponentialHistogram, ExponentialHistogramDataPoint, SummaryDataPoint, Buckets, and Exemplar via fill_*_defaults functions, preventing hard-errors on minimal but spec-valid payloads.

How did you test this code?

A new integration test file tests/metrics_test.rs covers:

Histogram with string-encoded u64 fields alongside a sum counter (the primary regression case).
Histogram with unquoted u64 fields (baseline sanity check).
Exponential histogram with string-encoded u64 and timestamp fields.
Summary with string-encoded u64 fields.
NumberDataPoint.asInt as a JSON string.
u64::MAX round-trip via the u64 fallback path in coerce_string_to_integer.
Signed boundary values (i64::MAX, i64::MIN, 0, ±1) for asInt.
Mixed string/number encoding in the same bucketCounts array.
Minimal (field-sparse) exponential histogram and summary payloads that would previously silently drop.
Empty exponentialHistogram: {} variant.

Two tests (edge_negative_value_in_u64_field_should_error, edge_non_numeric_string_in_u64_field_should_error) are marked #[ignore] because the upstream silencing pattern currently prevents them from returning errors. They should be un-ignored once the upstream structure changes.

Publish to changelog?

No

jonmcwest · 2026-05-27T14:54:16Z

fix(metrics): unblock OTLP/JSON histogram, expo, summary ingestion #60261 👈 (View in Graphite)
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

greptile-apps · 2026-05-27T15:21:46Z

Comments Outside Diff (1)

rust/capture-logs/src/service.rs, line 80-114 (link)

New coercions apply to log and trace payloads too

patch_otel_json is shared by parse_otel_message (logs) and parse_otel_traces_message (traces) as well as parse_otel_metrics_message. The new object-level checks for keys count, zeroCount, asInt, and bucketCounts now fire whenever those keys appear anywhere in any OTLP payload type. In practice the risk is very low (these keys don't appear in log/trace schemas), but any future field named count in a log or trace proto would be silently coerced. Worth a comment noting the scope, or alternatively narrowing the coercions to a separate patch_otel_metrics_json function that is only called from parse_otel_metrics_message.

Prompt To Fix With AI

This is a comment left during a code review.
Path: rust/capture-logs/src/service.rs
Line: 80-114

Comment:
**New coercions apply to log and trace payloads too**

`patch_otel_json` is shared by `parse_otel_message` (logs) and `parse_otel_traces_message` (traces) as well as `parse_otel_metrics_message`. The new object-level checks for keys `count`, `zeroCount`, `asInt`, and `bucketCounts` now fire whenever those keys appear anywhere in any OTLP payload type. In practice the risk is very low (these keys don't appear in log/trace schemas), but any future field named `count` in a log or trace proto would be silently coerced. Worth a comment noting the scope, or alternatively narrowing the coercions to a separate `patch_otel_metrics_json` function that is only called from `parse_otel_metrics_message`.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
rust/capture-logs/tests/metrics_test.rs:46-59
**Superfluous `eprintln!` debug prints in tests**

Several tests contain `eprintln!` calls (e.g. `counter.data is_some`, `histogram.data is_some`, `dp.count=...`) that were clearly left in from development. They produce noise on `--nocapture` runs and add no assertion value since the asserts below them already capture the failure message. Per the project simplicity rule "has no superfluous parts," these can be removed. The same pattern appears in `exponential_histogram_with_string_u64s`, `summary_with_string_u64s`, `number_data_point_as_int_string`, `edge_u64_above_i64_max_round_trips`, `edge_as_int_signed_boundaries`, and `histogram_with_unquoted_u64_works`.

### Issue 2 of 3
rust/capture-logs/tests/metrics_test.rs:395-430
**Repeated test structure for `minimal fields` scenario**

`edge_expo_with_minimal_fields_only` and `edge_summary_with_minimal_fields_only` are structurally identical: send a payload with a single metric variant containing one data point with only `timeUnixNano`, then assert `data` is not `None`. The team preference is parameterised tests. These two (and potentially the three `*_with_string_u64s` variants) could be expressed as a single table-driven test, keeping the payloads in a `&[(&str, fn(&Metric) -> bool)]` slice and removing the repeated boilerplate.

### Issue 3 of 3
rust/capture-logs/src/service.rs:80-114
**New coercions apply to log and trace payloads too**

`patch_otel_json` is shared by `parse_otel_message` (logs) and `parse_otel_traces_message` (traces) as well as `parse_otel_metrics_message`. The new object-level checks for keys `count`, `zeroCount`, `asInt`, and `bucketCounts` now fire whenever those keys appear anywhere in any OTLP payload type. In practice the risk is very low (these keys don't appear in log/trace schemas), but any future field named `count` in a log or trace proto would be silently coerced. Worth a comment noting the scope, or alternatively narrowing the coercions to a separate `patch_otel_metrics_json` function that is only called from `parse_otel_metrics_message`.

_{Reviews (1): Last reviewed commit: "cargo fmt" | Re-trigger Greptile}

greptile-apps · 2026-05-27T15:21:50Z

+    let counter = &metrics[0];
+    let histogram = &metrics[1];
+
+    eprintln!("counter.data is_some = {}", counter.data.is_some());
+    eprintln!("histogram.data is_some = {}", histogram.data.is_some());
+    if let Some(Data::Histogram(h)) = &histogram.data {
+        eprintln!("histogram.data_points.len = {}", h.data_points.len());
+        if let Some(dp) = h.data_points.first() {
+            eprintln!(
+                "dp.count={} sum={:?} bucket_counts={:?} explicit_bounds={:?}",
+                dp.count, dp.sum, dp.bucket_counts, dp.explicit_bounds
+            );
+        }
+    }


Superfluous eprintln! debug prints in tests

Several tests contain eprintln! calls (e.g. counter.data is_some, histogram.data is_some, dp.count=...) that were clearly left in from development. They produce noise on --nocapture runs and add no assertion value since the asserts below them already capture the failure message. Per the project simplicity rule "has no superfluous parts," these can be removed. The same pattern appears in exponential_histogram_with_string_u64s, summary_with_string_u64s, number_data_point_as_int_string, edge_u64_above_i64_max_round_trips, edge_as_int_signed_boundaries, and histogram_with_unquoted_u64_works.

Prompt To Fix With AI

This is a comment left during a code review. Path: rust/capture-logs/tests/metrics_test.rs Line: 46-59 Comment: **Superfluous `eprintln!` debug prints in tests** Several tests contain `eprintln!` calls (e.g. `counter.data is_some`, `histogram.data is_some`, `dp.count=...`) that were clearly left in from development. They produce noise on `--nocapture` runs and add no assertion value since the asserts below them already capture the failure message. Per the project simplicity rule "has no superfluous parts," these can be removed. The same pattern appears in `exponential_histogram_with_string_u64s`, `summary_with_string_u64s`, `number_data_point_as_int_string`, `edge_u64_above_i64_max_round_trips`, `edge_as_int_signed_boundaries`, and `histogram_with_unquoted_u64_works`. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-27T15:21:51Z

+                "bucketCounts":["1"],
+                "explicitBounds":[]
+              }]
+            }}
+          ]
+        }]
+      }]
+    }"#;
+
+    let result = parse_otel_metrics_message(&Bytes::from(json));
+    assert!(
+        result.is_err(),
+        "spec-violating count=\"not-a-number\" must be rejected, not silently dropped"
+    );
+}
+
+/// Regression for the gap that necessitated the expanded EXPONENTIAL_HISTOGRAM
+/// defaults: client sends a minimal spec-valid expo without any of the upstream-
+/// undeclared-default fields. Without our defaults, this hard-errors and the
+/// metric silently drops.
+#[test]
+fn edge_expo_with_minimal_fields_only() {
+    let json = r#"{
+      "resourceMetrics":[{
+        "resource":{"attributes":[]},
+        "scopeMetrics":[{
+          "scope":{"name":"x"},
+          "metrics":[
+            {"name":"minimal.expo","exponentialHistogram":{
+              "aggregationTemporality":2,
+              "dataPoints":[{"timeUnixNano":"1700000000000000000"}]
+            }}
+          ]
+        }]
+      }]
+    }"#;


Repeated test structure for minimal fields scenario

edge_expo_with_minimal_fields_only and edge_summary_with_minimal_fields_only are structurally identical: send a payload with a single metric variant containing one data point with only timeUnixNano, then assert data is not None. The team preference is parameterised tests. These two (and potentially the three *_with_string_u64s variants) could be expressed as a single table-driven test, keeping the payloads in a &[(&str, fn(&Metric) -> bool)] slice and removing the repeated boilerplate.

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: rust/capture-logs/tests/metrics_test.rs Line: 395-430 Comment: **Repeated test structure for `minimal fields` scenario** `edge_expo_with_minimal_fields_only` and `edge_summary_with_minimal_fields_only` are structurally identical: send a payload with a single metric variant containing one data point with only `timeUnixNano`, then assert `data` is not `None`. The team preference is parameterised tests. These two (and potentially the three `*_with_string_u64s` variants) could be expressed as a single table-driven test, keeping the payloads in a `&[(&str, fn(&Metric) -> bool)]` slice and removing the repeated boilerplate. **Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0)) How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

DanielVisca

Looks good,
when one of these upstream gaps closes the failing canary a good signal. The boundary tests seem thorough too.

Greptile's comments are probably fine to ignore for now.The eprintln only print on test failure (cargo captures stderr by default), and the inner state diagnostics on failure are worthwhile at this point.

DanielVisca · 2026-05-27T21:40:20Z

Validated locally and it works so merging!

deployment-status-posthog · 2026-05-27T23:50:08Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-27 23:50 UTC	Run
prod-us	✅ Deployed	2026-05-28 00:02 UTC	Run
prod-eu	✅ Deployed	2026-05-28 00:04 UTC	Run

cargo fmt

b46e116

jonmcwest force-pushed the 05-27-fix_metrics_unblock_otlp_json_histogram_expo_summary_ingestion branch from 7606959 to b46e116 Compare May 27, 2026 15:07

jonmcwest marked this pull request as ready for review May 27, 2026 15:14

greptile-apps Bot reviewed May 27, 2026

View reviewed changes

DanielVisca requested review from a team and DanielVisca May 27, 2026 20:53

DanielVisca approved these changes May 27, 2026

View reviewed changes

DanielVisca merged commit e086842 into master May 27, 2026
194 checks passed

DanielVisca deleted the 05-27-fix_metrics_unblock_otlp_json_histogram_expo_summary_ingestion branch May 27, 2026 21:40

DanielVisca mentioned this pull request May 27, 2026

feat(metrics): extract trace and span IDs from OTel exemplars #60363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(metrics): unblock OTLP/JSON histogram, expo, summary ingestion#60261

fix(metrics): unblock OTLP/JSON histogram, expo, summary ingestion#60261
DanielVisca merged 1 commit into
masterfrom
05-27-fix_metrics_unblock_otlp_json_histogram_expo_summary_ingestion

jonmcwest commented May 27, 2026 •

edited

Loading

Uh oh!

jonmcwest commented May 27, 2026

Uh oh!

greptile-apps Bot commented May 27, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 27, 2026

Uh oh!

greptile-apps Bot May 27, 2026

Uh oh!

DanielVisca left a comment

Uh oh!

DanielVisca commented May 27, 2026

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jonmcwest commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Publish to changelog?

Uh oh!

jonmcwest commented May 27, 2026

Uh oh!

greptile-apps Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

DanielVisca left a comment

Choose a reason for hiding this comment

Uh oh!

DanielVisca commented May 27, 2026

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonmcwest commented May 27, 2026 •

edited

Loading

greptile-apps Bot commented May 27, 2026 •

edited

Loading

deployment-status-posthog Bot commented May 27, 2026 •

edited

Loading