Skip to content

Release 2593

Choose a tag to compare

@github-actions github-actions released this 03 Aug 07:22
fbf836e

Trello card

Trello-632

Context

Our delayed job worker occasionally hangs/crashes in production, resulting in candidate emails not going out. We have monitoring to catch this at the moment however its not firing reliably (we use pending job count however GSE sometimes bulk-enqueues a batch meaning we have a high-threshold).

The purpose of this job is to simply provide a 'heart beat' counter metric that fires from a delayed job on a consistent schedule; that way we can infer that the worker has crashed if the metric does not increment over a short period of
time (a couple of minutes). We should then get a near-instant notification when our worker fails.

Changes proposed in this pull request

  • Add a heart beat scheduled job

Guidance to review