Skip to content

[Gastown] PR 22: Observability #228

@jrf0110

Description

@jrf0110

Parent: #204 | Phase 4: Hardening

Revised: Added container process logging and container-specific metrics.

Goal

Structured logging, real-time event streaming, and alerting for the Gastown system.

Components

Structured Logging

  • Structured logging in Gastown worker (Sentry integration)
  • Log all DO RPC calls with timing
  • Container process logs forwarded to Workers Logs
  • Agent lifecycle events (start, stop, crash, restart)

Event Streaming

  • Bead event stream for real-time dashboard (DO → WebSocket or SSE)
  • Agent status change events
  • Convoy progress events

Alerting

  • GUPP violations (agent stale for configurable threshold)
  • Escalation rate spikes
  • Review queue depth exceeding threshold
  • Agent restart loops (same agent restarting N times in M minutes)
  • Container OOM events
  • Container cold start frequency

Usage Metrics

  • Beads per day (by type, rig, town)
  • Agents per day (by role, rig)
  • LLM cost per bead
  • Average bead completion time
  • Container uptime and resource utilization

Dependencies

  • All Phase 1–3 PRs

Tech

  • Cloudflare analytics engine (as much as possible, put tons of data in analytics engine. over log report in analytics engine)
  • Cloudflare Workers logs

Acceptance Criteria

  • Structured logging with Sentry integration
  • Container process logs in Workers Logs
  • Real-time bead event stream
  • Configurable alerting rules
  • Usage metrics collection and dashboard
  • Container health metrics (uptime, OOM events, cold starts)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions