Skip to content

Telemetry: span.log() / span.logError() for log-in-trace correlation #925

@EhabY

Description

@EhabY

Part of the Telemetry Phase A rollout.

service.log() and service.logError() emit LogRecord-shaped events with no traceId or parentEventId, so a log call made inside a service.trace(...) callback is uncorrelated with the surrounding span. OTel backends that link logs to traces (Grafana's Trace-to-Logs, Honeycomb's trace events panel, Datadog APM's log-in-trace view) need both trace_id and span_id on the LogRecord to make the link. We currently can't provide either.

Scope

Add log methods on the Span interface, with the same signature as the top-level service.log / logError:

interface Span {
  // ... existing fields
  log(
    eventName: string,
    properties?: Record<string, string>,
    measurements?: Record<string, number>,
  ): void;
  logError(
    eventName: string,
    error: unknown,
    properties?: Record<string, string>,
    measurements?: Record<string, number>,
  ): void;
}

Behavior:

  • Identical to service.log / logError, except the emitted event carries the span's traceId and parentEventId = span.eventId.
  • No durationMs measurement (these are point-in-time logs, not timed operations).
  • Routes through the same #safeEmit so a telemetry failure cannot reach the caller.
  • NOOP_SPAN.log / logError are no-ops (consistent with phase).

Top-level service.log / logError are unchanged: they continue to emit uncorrelated LogRecords.

Why explicit, not ambient

OTel SDKs use AsyncLocalStorage to track the "current active span" so that logger.info(...) calls anywhere in code automatically pick up the span context. We could do the same, but:

  • It's invisible to readers — a service.log("foo") call might or might not have trace context depending on the call stack.
  • AsyncLocalStorage propagation breaks across some async boundaries (worker_threads, certain native callbacks, cross-process IPC), leading to silent loss of correlation.
  • The explicit span.log(...) form makes the data contract obvious: if you want correlation, take the Span parameter and call it.

We can revisit ambient context if it becomes necessary, but explicit is the simpler and clearer default.

Discriminator at export time

The exporter (#903) currently uses traceId presence to decide whether an event becomes a LogRecord or a Span. With span.log() shipping, that discriminator no longer holds — a logged event in trace context will have a traceId but should still export as a LogRecord, not a Span.

Options to resolve, decided here so #903 can wire the routing correctly:

  • A. Use durationMs presence: Spans always have durationMs (framework-set on time/trace/phase); logs never do. Simple, no schema change. Edge case: a caller manually passing measurements: { durationMs: ... } to service.log would be misclassified.
  • B. Add explicit eventKind: "log" | "span" field: Disambiguates cleanly at the cost of one new field on every event.

Recommend A. The edge case is genuinely weird (passing durationMs to log is misuse) and we can document it as reserved if needed. Add an explicit eventKind field if a real bug surfaces.

Tests

  • span.log emits an event with the span's traceId and parentEventId, no durationMs.
  • span.logError emits an event with traceId, parentEventId, the normalized error block, no durationMs.
  • Top-level service.log / logError continue to emit without traceId (no regression).
  • NOOP_SPAN.log / logError no-op when telemetry is off.
  • Throwing sink does not affect the surrounding trace's return value (existing isolation contract still holds for log events).

Out of scope

  • Ambient context propagation. Tracked here as a future option only.
  • Changes to top-level service.log / logError. They remain trace-context-free.

Why this is needed

Without it, the exporter (#903)'s OTLP/Logs output has no trace_id on log records, so OTel backends cannot link logs to traces in their UIs. Adding span.log() is the producer-side change that completes the loop.

Depends on #900 (the core service this extends).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions