Fix cc.requests.outstanding.gauge when using puma web server #3841

Samze · 2024-06-12T21:04:33Z

#1312 introduced cc.requests.outstanding.gauge which holds the counter in memory. With the introduction of puma there may be multiple processes, so each would emit its own value for this metric. This would cause the gauge to flop between values. This metric is listed as an important kpi for capi scaling https://docs.cloudfoundry.org/running/managing-cf/scaling-cloud-controller.html#cloud_controller_ng.

This fix for puma will instead uses Redis for the gauge.

Refactor requests_metric to use statsd_updater
Statsd_updater now contains statsd requests logic
Add Redis/Inmemory store to statsd_updager
Add a missing mutex to counter for thread safety for the in memory implementation.
Chose to prefix the entry in redis with metrics: but open to feedback here.

Inspired by 4539e59

An alternative considered, was to read the prometheus metric and re-emit that to StatsD, however we observed performance degradation. Presumably because of the number of reads from disk for the DirectFileStorage to aggregate the metric across processes and so Redis seemed the best approach.

cc @sethboyles / @pivotalgeorge

I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the main branch
I have run all the unit tests using bundle exec rake
I have run CF Acceptance Tests

cloudfoundry#1312 introduced `cc.requests.outstanding.gauge` which holds the counter in memory. With the introduction of puma there may be multiple processes, so each would emit its own value for this metric. This would cause the gauge to flop between values. This metric is listed as an important kpi for capi scaling https://docs.cloudfoundry.org/running/managing-cf/scaling-cloud-controller.html#cloud_controller_ng. This fix for puma will instead uses Redis for the gauge. * Move requests_metric to use statsd_updater * Statsd_updater now contains statsd request logic * Add Redis/Inmemory store to statsd_updager * Add a missing mutux to counter for thread safety for the in memory implementation. * Chose to prefix the entry in redis with `metrics:` but open to feedback here. Inspired by cloudfoundry@4539e59 An alternative considered, was to read the prometheus metric and re-emit that to StatsD, however we observed performance degradation. Presumably because of the number of reads from disk for the [DirectFileStorage](https://github.com/prometheus/client_ruby?tab=readme-ov-file#directfilestore-caveats-and-things-to-keep-in-mind) to aggregate the metric across processes.

sethboyles · 2024-06-13T22:46:40Z

Code looks good, I'll do acceptance tomorrow

joaopapereira

LGTM, my only concern here would be that this store can only save one metric, but looking at the statsd_updater it looks like we only have 1 type of metric so it should be fine.

Samze force-pushed the puma_cc.requests.outstanding.gauge_redis_refactor branch 2 times, most recently from cb4dbaf to 8a280d7 Compare June 12, 2024 21:44

Samze force-pushed the puma_cc.requests.outstanding.gauge_redis_refactor branch from 8a280d7 to 9b0ed8f Compare June 12, 2024 21:46

tcdowney requested a review from a team June 13, 2024 22:30

philippthun approved these changes Jun 14, 2024

View reviewed changes

joaopapereira approved these changes Jun 14, 2024

View reviewed changes

sethboyles merged commit 82db58e into cloudfoundry:main Jun 14, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cc.requests.outstanding.gauge when using puma web server #3841

Fix cc.requests.outstanding.gauge when using puma web server #3841

Samze commented Jun 12, 2024 •

edited

Loading

sethboyles commented Jun 13, 2024

joaopapereira left a comment

Fix cc.requests.outstanding.gauge when using puma web server #3841

Fix cc.requests.outstanding.gauge when using puma web server #3841

Conversation

Samze commented Jun 12, 2024 • edited Loading

sethboyles commented Jun 13, 2024

joaopapereira left a comment

Choose a reason for hiding this comment

Samze commented Jun 12, 2024 •

edited

Loading