Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

events.ConsumerPool observability improvements #180

Open
strideynet opened this issue May 22, 2023 · 0 comments
Open

events.ConsumerPool observability improvements #180

strideynet opened this issue May 22, 2023 · 0 comments

Comments

@strideynet
Copy link
Contributor

I currently have my own implementation of a worker pool for consuming events from the websocket, but I'd like to be able to move towards an "official" implementation.

My main sticking point, is that the ConsumerPool lacks any significant observability (in both the realms of tracing and metrics).

I think the most important metric would be one that allows the "busyness" of the worker pool to be evaluated. I'll need to know when I need to increase the number of workers. This can probably be tracked in one of two ways:

  • Report a gauge metric showing the number of work items currently queued
  • Report a gauge metric showing the count of workers in each state (waiting for work vs working)

Either of these allows an operator to evaluate if the worker pool size needs to be adjusted, I don't have a significant lean between these two options.

Ideally, this metric would either be using OpenTelemetry metrics or just prometheus/client_golang directly. I've found so far that OpenTelemetry metrics are pretty immature and most folks are comfortable with providing a Prometheus registry for a worker pool to use or just having the worker pool register its metrics against the default registry.

Less important, but interesting metrics that would be nice to have included:

  • Work items processed (counter), with labels for outcome (success vs failure) and perhaps type of event (commit vs delete etc)
  • Summary or histogram of the duration a worker spends on work items

I'm more than happy to contribute to this, but obviously aware bandwidth at Bluesky for reviewing work is extremely limited. I just wanted to collate my thoughts on what would be stopping me from using the official implementation of the ConsumerPool at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant