Skip to content

Add /metrics endpoint and probe-based /health #24

@Loule95450

Description

@Loule95450

Context

The only health surface is GET /health returning a static { status: "ok" } (src/routes/health.ts). There are no counters for login failures, no histograms for OPAQUE latency, no gauges for sessions/devices/users-by-status.

Problem / Observation

Self-hosters who run keyfount alongside Prometheus/Grafana, or who use Posthog/Loki for visibility, have nothing to scrape. Their only visibility is grepping logs (already redacted to remove bodies) and direct SQLite queries. Operators won't notice a brute-force campaign in progress; they won't see registration backlogs growing; they won't see request latency creep.

/health itself is a static OK — it doesn't actually probe the database, so a DB-locked instance returns 200 to a healthcheck and only fails when a real request comes in.

Proposed approach

  • Make /health probe the DB (e.g. SELECT 1) and return { status, db_ok, uptime_s }.
  • Add /metrics (Prometheus exposition format) gated behind the admin instance — counters for login_total{result=success|failure|rate_limited}, register_total{status=pending|approved|rejected}, events_appended_total, bytes_stored_total{user="…"} (or just total), histogram for request_duration_seconds{route}.
  • Decide if metrics live on admin port (private) or behind a separate METRICS_TOKEN for scraping from outside.

Acceptance criteria

  • /health probes the DB
  • /metrics endpoint returns Prometheus exposition format
  • Metrics documented in docs/self-host.md
  • Test asserts metric incrementing on a login failure

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityenhancementNew feature or requestopsOperations / deployment / observability

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions