Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature to expose BB metrics based user query key #61

Open
mihaigalos opened this issue May 28, 2020 · 6 comments
Open

Feature to expose BB metrics based user query key #61

mihaigalos opened this issue May 28, 2020 · 6 comments

Comments

@mihaigalos
Copy link

This is a placeholder for a feature request for a gRPC "GetStats" channel for getting the statistics for remote execution, remote bazel cache.

For remote cache lookup using Build Barn, one can use the --remote_cache_header for a cache get/put/update. The "GetStats" could take the key used on the cache operation to get a statistic on how well the cache is performing (hit rate, hit rate %, etc.).

@EdSchouten
Copy link
Member

Are you aware that all Buildbarn components expose a huge number of Prometheus metrics? Just point your browser to /metrics on the HTTP endpoint.

Are there any statistics there that you feel are missing?

@mihaigalos
Copy link
Author

Yes, I am aware. We could not find any relevant data by consuming Prometheus data in Grafana.
The only alternative would be to have BB produce logs which we would then post-process. I like the gRPC approach better.

@mickael-carl
Copy link
Member

mickael-carl commented May 28, 2020

Which metrics would you be missing? It would be better to extend the current metrics set, rather than introduce another way to gather metrics, considering Prometheus is pretty much the standard for this usecase.

@mihaigalos
Copy link
Author

mihaigalos commented May 28, 2020

Let me start over, sorry for not making things clear in the first place.

We are using BB also as a bazel remote cache. bazel communicates with the frontend and fetches data from a scalable storage backend, which we can scale as we need.

The problem is, there is no way to ask BB how many hits in the cache it has for a particular job.
Of course we can do this on the bazel client side, but a better approach would be to have it centralized over at BB.

Here's the problem: BB doesn't know which cache entry belongs to which client. Clients may be individual developers and CI jobs.

By using the --remote_cache_header, one can specify a (key,value) which might hold some info as to who the client is. Example: --remote_cache_header=ci_job=foo or --remote_cache_header=dev=mihai.

Essentially, I would like to be able to query BB for the hit rate for the foo CI buildjob or the user mihai. I want to be able to ideally export that data to Prometheus.

@EdSchouten EdSchouten transferred this issue from buildbarn/bb-deployments May 28, 2020
@EdSchouten
Copy link
Member

What we could do is that we extend MetricsBlobAccess gains support for exposing metrics based on some user-defined key. That sounds like a fair request. A couple of things to look out for, though:

  • About 5% of all of our CPU cycles currently go to calling .WithLabelValues() on {Histogram,Counter}Vecs and .Observe() on Histograms. A change like this will likely make things even worse. I'm not saying that's a showstopper. It's merely something we need to be conscious about.
  • Allowing the client to pass in arbitrary labels puts us at risk of potentially exposing metrics with a very high cardinality. This is bad from a reliability point of view. We likely want some kind of whitelisting of permitted label values.

@mickael-carl
Copy link
Member

mickael-carl commented May 28, 2020

I like the idea of having white/blacklist for metrics, that's what for instance kube-state-metrics is doing, since it exposes a lot of metrics as well (see its --metric-allowlist and --metric-denylist flags).
Then it would also be a matter of what would be a sane default? Expose everything or only a subset?

@mihaigalos mihaigalos changed the title Feature gRPC get BB statistics Feature to expose BB metrics based user query key May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants