You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I currently have my own implementation of a worker pool for consuming events from the websocket, but I'd like to be able to move towards an "official" implementation.
My main sticking point, is that the ConsumerPool lacks any significant observability (in both the realms of tracing and metrics).
I think the most important metric would be one that allows the "busyness" of the worker pool to be evaluated. I'll need to know when I need to increase the number of workers. This can probably be tracked in one of two ways:
Report a gauge metric showing the number of work items currently queued
Report a gauge metric showing the count of workers in each state (waiting for work vs working)
Either of these allows an operator to evaluate if the worker pool size needs to be adjusted, I don't have a significant lean between these two options.
Ideally, this metric would either be using OpenTelemetry metrics or just prometheus/client_golang directly. I've found so far that OpenTelemetry metrics are pretty immature and most folks are comfortable with providing a Prometheus registry for a worker pool to use or just having the worker pool register its metrics against the default registry.
Less important, but interesting metrics that would be nice to have included:
Work items processed (counter), with labels for outcome (success vs failure) and perhaps type of event (commit vs delete etc)
Summary or histogram of the duration a worker spends on work items
I'm more than happy to contribute to this, but obviously aware bandwidth at Bluesky for reviewing work is extremely limited. I just wanted to collate my thoughts on what would be stopping me from using the official implementation of the ConsumerPool at this time.
The text was updated successfully, but these errors were encountered:
I currently have my own implementation of a worker pool for consuming events from the websocket, but I'd like to be able to move towards an "official" implementation.
My main sticking point, is that the
ConsumerPool
lacks any significant observability (in both the realms of tracing and metrics).I think the most important metric would be one that allows the "busyness" of the worker pool to be evaluated. I'll need to know when I need to increase the number of workers. This can probably be tracked in one of two ways:
Either of these allows an operator to evaluate if the worker pool size needs to be adjusted, I don't have a significant lean between these two options.
Ideally, this metric would either be using OpenTelemetry metrics or just
prometheus/client_golang
directly. I've found so far that OpenTelemetry metrics are pretty immature and most folks are comfortable with providing a Prometheus registry for a worker pool to use or just having the worker pool register its metrics against the default registry.Less important, but interesting metrics that would be nice to have included:
I'm more than happy to contribute to this, but obviously aware bandwidth at Bluesky for reviewing work is extremely limited. I just wanted to collate my thoughts on what would be stopping me from using the official implementation of the ConsumerPool at this time.
The text was updated successfully, but these errors were encountered: