Skip to content

Release 2666

Choose a tag to compare

@github-actions github-actions released this 09 Sep 11:44
b812447

Trello card

Trello-641

Context

DelayedJob is not working out well for us at the moment; the workers have decent job runtime but for reasons unknown the throughput is awful. We also experience intermittent/random worker failure and its not clear why this is, but it does appear to be database related (the connection may be dropping and not always re-establishing). On top of that the Yabeda integration is not well supported and we had to fork it to prevent PII leaking into our logs.

We want to migrate to Sidekiq to hopefully resolve most if not all of these issues.

Changes proposed in this pull request

  • Add Sidekiq with metrics

Add Sidekiq as a job queue adapter as well as well as yabeda-sidekiq to expose metrics to Prometheus.

  • Add additional instance for Sidekiq jobs

Add an additional instance for Sidekiq job workers; initially running a single worker in review for testing.

  • Enable Sidekiq web UI

Enable the web UI in non-production environments using the existing basic auth credentials for ease.

  • Switch dev/staging to Sidekiq

I've tested this in review and it seems to work well; next step is to roll it out to staging so we can check the analytics are coming though and give it a stress test.

Guidance to review

This PR will run Sidekiq in dev/staging initially so we can give it a thorough stress test before switching over in production. We should also check our Redis instance can handle the increased workload.

I've tested in a review app and the Sidekiq worker processed jobs well and served metrics on port 3000 so it should just work with our Prometheus instance once the target is added.

I'm not sure if docker-compose is even used any more but I've updated to be consistent; I may look at removing it completely in a later PR.