Skip to content

Add operational metrics endpoint#12

Merged
sebastientaggart merged 1 commit intomainfrom
feature/operational-metrics
Apr 7, 2026
Merged

Add operational metrics endpoint#12
sebastientaggart merged 1 commit intomainfrom
feature/operational-metrics

Conversation

@sebastientaggart
Copy link
Copy Markdown
Member

Adds a lightweight Metrics counter module and a new unauthenticated GET /metrics endpoint exposing operational visibility for the service.

What changed

  • New src/deckhand/metrics.py holding in-memory counters: events total, actions (total/success/failure), signals (total + by name), and started_at for rate calculation.
  • EventBus, ActionRegistry, SignalRegistry, and Orchestrator accept an optional Metrics instance via constructor injection and increment counters at the appropriate call sites.
  • main.py instantiates Metrics in lifespan and wires it into the orchestrator and registries.
  • New GET /metrics endpoint (unauthenticated, like /health) returns a snapshot including uptime, events/s, action counters, signal counters, websocket client count, agent status distribution, and state store entry count.
  • Test coverage in tests/test_bridge.py verifies counters increment after triggering actions.

Closes #7

@sebastientaggart sebastientaggart linked an issue Apr 7, 2026 that may be closed by this pull request
@sebastientaggart
Copy link
Copy Markdown
Member Author

Review Summary

Verdict: APPROVE

Findings

  • [NOTE] SignalRegistry.handle only records the metric on success (after await handler(payload)), whereas ActionRegistry.run records both success and failure. Intentional asymmetry? Consider recording signal failures too for consistency, or document the difference.
  • [NOTE] metrics.py comment claims "increments are atomic at the bytecode level for simple ints" — true for CPython under the GIL for single ops, but dict.get(...)+1 then assignment in record_signal is not atomic. Fine here since the service is single-event-loop, but the justification in the docstring is slightly misleading.
  • [NOTE] In test_metrics_endpoint, reaching into client._transport.app to build a second client is a bit fragile; a dedicated unauth fixture would be cleaner. Non-blocking.

No correctness, security, or platform-compliance issues found. Existing test fixtures and docs that construct Orchestrator()/ActionRegistry(orchestrator)/SignalRegistry() remain compatible thanks to the new optional metrics parameter.

@sebastientaggart sebastientaggart merged commit 7766f1f into main Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add basic operational metrics

1 participant