Description
ApexStore has basic Prometheus metrics (#149) but no distributed tracing or correlation IDs. Production systems require OpenTelemetry for debugging latency, identifying bottlenecks, and correlating operations across services.
Proposed Implementation
- Add
opentelemetry crate dependency
- Instrument all engine operations with tracing spans (set, get, delete, scan, flush, compact)
- Propagate trace context from HTTP layer through to storage engine
- Export traces via OTLP (Jaeger, Tempo, Datadog)
- Add per-operation latency histograms with configurable buckets
- Expose trace_id in HTTP response headers for debugging
Impact
- Required for production SRE / observability
- Enables latency breakdown (WAL fsync vs memtable vs compaction)
- Critical for debugging performance issues in production
Labels
Description
ApexStore has basic Prometheus metrics (#149) but no distributed tracing or correlation IDs. Production systems require OpenTelemetry for debugging latency, identifying bottlenecks, and correlating operations across services.
Proposed Implementation
opentelemetrycrate dependencyImpact
Labels