Summary
Provide observability: expose Prometheus-style metrics for event processing, agent loop health, processing latency, error counts, and DLQ size.
What to do
- Add GET /metrics (or integrate prom-client) to expose counters/histograms:
-
- events_processed_total
- events_failed_total
- event_processing_duration_seconds
- dlq_size
- event_cursor_lag
- agent_loop_heartbeat_timestamp
- Instrument startEventListener(), event processing pipeline, DLQ retries, and agent loop with metrics.
- Add documentation on key metrics and recommended alerting thresholds.
- Optionally add sample Grafana dashboard panels in docs.
Acceptance criteria
- /metrics endpoint returns Prometheus-format metrics.
- Key metrics are collected and documented.
- Example alerts documented (e.g., dlq_size > 50, event_cursor_lag > X).
Summary
Provide observability: expose Prometheus-style metrics for event processing, agent loop health, processing latency, error counts, and DLQ size.
What to do
Acceptance criteria