Skip to content

[FEATURE]: Standardize OpenTelemetry trace and metrics protocol for plugins #43

@lucarlig

Description

@lucarlig

Summary

Define and implement a standardized framework protocol for OpenTelemetry-style trace context, plugin spans, and plugin metrics.

This is the framework-side twin of IBM/cpex-plugins issue #27. Plugin implementations should not invent their own telemetry path, and should not assume an OpenTelemetry server, collector, or similar external service is present.

This can start with the immediate span propagation requirement, but the framework contract should be designed as the longer-term place for plugin telemetry so more metric types can be added later without each plugin creating a separate protocol.

Problem

Plugins need a common way to receive trace context from the host and return telemetry information back to the host. Today this is not standardized across plugins, which makes tracing and metrics behavior inconsistent and pushes infrastructure assumptions into plugin code.

The framework should own the telemetry contract so every plugin can use the same model for trace IDs, spans, and metrics.

Near-Term Requirement

The first required use case is internal plugin span creation for OpenTelemetry tracing:

  • the host/framework passes parent trace context, including the trace ID, into the plugin
  • the plugin can create an internal span associated with that parent trace
  • the plugin returns the span/trace data through the framework boundary
  • the framework is responsible for exporting/sending that telemetry to OpenTelemetry
  • the plugin remains self-contained and does not connect directly to an OpenTelemetry server, collector, or similar backend

This near-term span flow is required, but it should not be treated as the whole telemetry design. The protocol should leave room for future plugin metrics and richer telemetry data.

Scope

  • define how trace context is passed from the host/runtime into plugins
  • define how plugins return span information back through the framework
  • define the framework responsibility for exporting/sending plugin telemetry to OpenTelemetry
  • define what metric types plugins may report now or in future versions, including names, values, units, labels/attributes, timing data, counters, and error/status fields
  • define whether plugin spans are created by the host, by plugins, or represented as structured span events returned to the host
  • provide language/runtime-neutral structures that Rust, Python, Go, and WASM/plugin-boundary implementations can support
  • keep plugin implementations self-contained and independent from any external OpenTelemetry collector/server
  • document the protocol and expected lifecycle for trace IDs, span IDs, parent context, metrics, and error propagation
  • add conformance tests or fixtures showing the standard behavior across plugin boundaries

Out of Scope

  • requiring a deployed OpenTelemetry collector/server for framework or plugin tests
  • plugin-specific telemetry implementation in downstream plugin repos
  • application-specific dashboards or backend observability setup

Open Questions

  • Which trace/span fields are mandatory vs optional at the plugin boundary?
  • Should plugins receive an active span context, or only structured trace/parent IDs?
  • Should plugins return completed span data, events, metrics, or both?
  • What metric types should be supported in the first version: counters, gauges, histograms/timers, error counts, latency, payload sizes?
  • How should telemetry failures be handled so they do not fail plugin execution?
  • How should the protocol evolve so future metrics can be added without breaking existing plugins?

Acceptance Criteria

  • A documented telemetry protocol exists for passing trace context into plugins and span/metrics data out of plugins.
  • The near-term flow supports plugin-created internal spans using parent trace context supplied by the host/framework.
  • The framework, not each plugin, owns exporting/sending plugin telemetry to OpenTelemetry.
  • The protocol is standardized across all plugin languages/runtimes supported by the framework.
  • The design does not require plugins to know about, connect to, or configure an external OpenTelemetry server or collector.
  • The design leaves room for future plugin metrics beyond the initial span propagation requirement.
  • Tests or conformance fixtures cover trace context propagation and span reporting through the framework boundary.
  • Downstream plugin work, including IBM/cpex-plugins issue #27, can implement against this contract without adding repo-specific telemetry assumptions.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions