New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve metrics #3973
Comments
There are a lot of open tickets related to metrics improvements and prometheus. Did you search the existing tickets before opening this one? |
Related issues: Remove duplicate metrics (worker volumes and containers)
Proper labeling of job level metrics
Label all metrics with Concourse name |
Also #3958 |
Beep boop! This issue has been idle for long enough that it's time to check in and see if it's still important. If it is, what is blocking it? Would anyone be interested in submitting a PR or continuing the discussion to help move things forward? If no activity is observed within the next week, this issue will be |
Stalebot, sit! Good bot. Please keep this issue open. Metrics are still a bit of a mess and need unification and improvements per-provider. |
Hey, I started experimenting with trimming down the emitters to just Prometheus and have it covering more of the inner workings of Please let me know what you think (in the PR, please 😁)! The tl;dr is that by doing so we could:
I'd be very happy to "hear" your concerns / thoughts on it. thank you! |
Beep boop! This issue has been idle for long enough that it's time to check in and see if it's still important. If it is, what is blocking it? Would anyone be interested in submitting a PR or continuing the discussion to help move things forward? If no activity is observed within the next week, this issue will be |
stalebot pls |
Beep boop! This issue has been idle for long enough that it's time to check in and see if it's still important. If it is, what is blocking it? Would anyone be interested in submitting a PR or continuing the discussion to help move things forward? If no activity is observed within the next week, this issue will be |
Lemme just slap a label on here to calm the stale bot down. :P Seems like this is useful if not just as an aggregator. |
There are many things here, maybe we should split them after discussion
What challenge are you facing?
We are running multiple Concourse 5.1 (soon to be 5.2), our largest one has more than 180 teams, more than 13000 jobs per week, more than 160K resources checks/hour. We have peeks of more than 300 jobs/hour. We run 6 ATCs and 41 Workers.
We have a difficult time inferring how our Concourse is used by our users. Some existing metrics are missing important labels and it's very difficult to cross-reference performance issues in jobs with workers performance metrics.
We currently use Prometheus to collect metrics
What would make this better?
We've also seen a correlation between ATC performance and Prometheus. If for some reason, Prometheus is slow collecting the exported metrics, ATC memory utilization starts to rise.
Are you interested in implementing this yourself?
We would, but have little experience in Go today (and many other things to do).
The text was updated successfully, but these errors were encountered: