Skip to content

[FEATURE] OpenTelemetry integration — structured tracing, metrics, and logging #197

@ElioNeto

Description

@ElioNeto

Description

ApexStore has basic Prometheus metrics (#149) but no distributed tracing or correlation IDs. Production systems require OpenTelemetry for debugging latency, identifying bottlenecks, and correlating operations across services.

Proposed Implementation

  1. Add opentelemetry crate dependency
  2. Instrument all engine operations with tracing spans (set, get, delete, scan, flush, compact)
  3. Propagate trace context from HTTP layer through to storage engine
  4. Export traces via OTLP (Jaeger, Tempo, Datadog)
  5. Add per-operation latency histograms with configurable buckets
  6. Expose trace_id in HTTP response headers for debugging

Impact

  • Required for production SRE / observability
  • Enables latency breakdown (WAL fsync vs memtable vs compaction)
  • Critical for debugging performance issues in production

Labels

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions