Skip to content

Tracing: query↔ingest schema asymmetry on ag.metrics.duration.cumulative #4172

@mmabrouk

Description

@mmabrouk

Summary

The /api/tracing/spans/query and /api/tracing/spans/ingest endpoints are not round-trippable. A customer hit this while trying to migrate traces between two Agenta instances: query output, resubmit it via ingest, every span is silently dropped.

Which side mutates the data

The ingest side is the one that writes a non-canonical shape. It is not "query returning raw data that ingest rejects" — it is "ingest writes a shape that ingest itself rejects on a second pass."

At api/oss/src/core/tracing/utils/parsing.py:285-290, during ingestion we compute duration from start_time/end_time and overwrite the field:

if raw_span.start_time and raw_span.end_time:
    duration_s = (raw_span.end_time - raw_span.start_time).total_seconds()
    duration_ms = round(duration_s * 1_000, 3)
    if duration_ms is not None:
        ag["metrics"]["duration"] = {"cumulative": duration_ms}  # scalar

This writes duration.cumulative as a scalar (1922.653).

The Pydantic model at sdk/agenta/sdk/models/tracing.py:44, however, declares:

class AgMetricEntryAttributes(BaseModel):
    cumulative: Optional[Metrics] = None  # Metrics = Dict[str, NumericJson]
    incremental: Optional[Metrics] = None

cumulative is supposed to be a dict (that is how costs and tokens are stored: {"total": ..., "prompt": ..., "completion": ...}). So on the next ingest, validation rejects the scalar.

Repro

Against eu.cloud.agenta.ai with a valid API key:

  1. POST /api/tracing/spans/query with {"focus": "trace", "limit": 1} — response contains "duration": {"cumulative": 1922.653} on every span with a non-zero duration.
  2. POST /api/tracing/spans/ingest with {"traces": <response.traces>} (or after rewriting IDs to avoid dedup) — response 202 Accepted, body {"count": 0, "links": []}. Nothing is persisted.

Narrowing test at /tmp/agenta-test/test_metrics.py confirms: submitting a span with metrics.duration.cumulative as a scalar returns count: 0; submitting the same span with metrics.duration.cumulative = {"total": 1922.653} returns count: 1.

Proposed fix

Make ingest write the canonical dict shape at parsing.py:290:

ag["metrics"]["duration"] = {"cumulative": {"total": duration_ms}}

This aligns with the AgMetricEntryAttributes / Metrics = Dict[str, NumericJson] contract and is consistent with how costs and tokens are already stored ({"total": ..., "prompt": ..., "completion": ...}).

Alternative considered: widen the model to accept Union[Metrics, float] for duration only. Rejected — it makes duration special-cased versus the other metric entries and keeps the inconsistency visible in the API.

Migration for existing data

Existing rows in production have duration.cumulative = <scalar> on disk. Options:

  1. One-off backfill migration rewriting cumulative: scalar{"total": scalar}.
  2. Accept both shapes on read and normalize on the way out of the query endpoint, then do the backfill lazily.

Option 2 is safer for prod. The query response normalization would live next to _parse_span_into_response in parsing.py.

Related

See companion issue on silent validation failures in parse_spans_from_request — that is what hides this bug from clients today.

Metadata

Metadata

Assignees

Labels

BackendbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions