Release Release 2369 · DFE-Digital/schools-experience

Trello card

Context

The Prometheus metrics work fine in the main application, but not all of the DelayedJob metrics are working.

Whilst we run the metrics server and DelayedJob command in the same process, under the hood daemonize ends up forking a new process which means the default Prometheus store is not going to work.

Currently the only alternative is to use a DirectFileStore which works across forked processes. Sidekiq doesn't have this issue because you can run the metrics server on each Sidekiq server, but I can't find a way of doing the equivalent with DelayedJob (it doesn't have a hook for configuring a server in the pool).

Increases the disk quota as we'll be writing metrics to disk; we were sitting at ~75% disk usage so this should give us plent of headroom.

Changes proposed in this pull request

Switch to DirectFileStore for Prometheus metrics

Guidance to review

I've manually checked this works in review by booting a delayed-job worker for my review app and ssh'ing on the curling the metrics endpoint.

When we used the DirectFileStore in GiT we noticed the CPU usage steadily increase until it maxed out; we're going to have to monitor this to see if its still a problem. We can monitor the CPU here and disk usage here.

Relevant docs on Prometheus client built-in stores: https://github.com/prometheus/client_ruby#built-in-stores

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release 2369

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Trello card

Context

Changes proposed in this pull request

Guidance to review

Uh oh!