Skip to content

Add Asynq queue metrics (depth, active, lag, retries, DLQ) #172

@tayebmokni

Description

@tayebmokni

Summary

Expose Asynq queue metrics from §5.3 (queue depth, active, job duration, retries, failures, DLQ size, processing lag). These metrics drive the worker HPA (#81) and the §12 SLO alerts on processing lag and DLQ growth.

Design reference

  • docs/10-observability.md §5.3 (Background jobs section)
  • docs/12-jobs-cron.md §11.3 (Metrics)

Acceptance criteria

  • gonext_asynq_queue_depth{queue}, gonext_asynq_active_jobs{queue} gauges
  • gonext_asynq_job_duration_seconds{task_type} histogram
  • gonext_asynq_job_retries_total{task_type} counter
  • gonext_asynq_job_failed_total{task_type, kind} counter (kind: error/panic/timeout)
  • gonext_asynq_dlq_size gauge
  • gonext_asynq_processing_lag_seconds{queue} gauge — age of oldest pending job
  • gonext_jobs_enqueued_total{type,queue}, gonext_jobs_idempotency_skips_total{type}, gonext_jobs_unique_conflicts_total{type} counters
  • task_type cardinality bounded (~30) — enforced by registry (Public-form CSRF tokens (HMAC + anon-cookie binding) #176)
  • Tests verify metrics emitted on enqueue, success, failure, retry

Dependencies

#150, #176

Complexity

M

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:jobsBackground jobs, cronarea:observabilityLogs, metrics, traces, RUMphase:P1-cms-corePhase 1 — CMS Corepriority:P1Important — should land in phaseskill:goGo programmingtype:featNew feature or implementation task

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions