Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: make metrics easier to use with prometheus #4020

Merged
merged 10 commits into from
May 24, 2022

Conversation

cce
Copy link
Contributor

@cce cce commented May 20, 2022

Summary

Some changes to make it easier to find and use metrics with Prometheus

  • report HELP, TYPE, and zero-value counters for all declared Counters, even if they haven't been incremented yet (makes it easier to find them in Grafana/Prometheus, versus waiting for them to be incremented first)
  • allow tags used by TagCounter to be pre-declared at initialization time, so even zero-valued TagCounter metrics will show up with HELP and TYPE text and will be seen by Prometheus.
  • ensure TagCounter metrics are attributed to a host by handling "parentLabels" when writing Prometheus format
  • add counter TYPE line to each TagCounter, allowing our node_exporter to see TagCounters, because our node_exporter's gateway only handles counter and gauge metrics declared with TYPE.

Test Plan

New test added to assert Prometheus output from WriteMetric for TagCounter and Counter.

@codecov
Copy link

codecov bot commented May 20, 2022

Codecov Report

Merging #4020 (73556ec) into master (85a02f7) will increase coverage by 4.83%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4020      +/-   ##
==========================================
+ Coverage   49.97%   54.81%   +4.83%     
==========================================
  Files         393      373      -20     
  Lines       68211    47699   -20512     
==========================================
- Hits        34089    26146    -7943     
+ Misses      30417    19333   -11084     
+ Partials     3705     2220    -1485     
Impacted Files Coverage Δ
agreement/gossip/network.go 54.83% <ø> (ø)
agreement/pseudonode.go 70.07% <ø> (ø)
network/wsPeer.go 65.83% <ø> (-0.28%) ⬇️
util/metrics/counter.go 89.13% <100.00%> (+0.62%) ⬆️
util/metrics/tagcounter.go 82.35% <100.00%> (+31.58%) ⬆️
util/metrics/gauge.go 65.38% <0.00%> (-2.57%) ⬇️
ledger/tracker.go 73.39% <0.00%> (-1.29%) ⬇️
network/wsNetwork.go 62.99% <0.00%> (ø)
data/committee/msgp_gen.go
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85a02f7...73556ec. Read the comment docs.

@cce cce requested a review from winder May 20, 2022 18:04
@cce cce changed the title metrics: make TagCounter metrics easier to use with prometheus metrics: make metrics easier to use with prometheus May 20, 2022
@cce cce requested a review from algorandskiy May 20, 2022 18:22
algonautshant
algonautshant previously approved these changes May 20, 2022
Copy link
Contributor

@algonautshant algonautshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

buf.WriteRune('\n')
name = tc.Name + "_" + tag
}
buf.WriteString("# HELP ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value of adding HELP here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the HELP line provides a description of the metric for some UIs & backends use ... for example Grafana 6.6+ shows it alongside the metric name when you are browsing metrics https://grafana.com/blog/2020/06/15/how-we-made-working-with-prometheus-easier-with-metric-metadata-in-grafanas-explore-view/

algorandskiy
algorandskiy previously approved these changes May 20, 2022
util/metrics/counter.go Show resolved Hide resolved
brianolson
brianolson previously approved these changes May 23, 2022
@brianolson
Copy link
Contributor

Ok, but now you point out that .Deregister() is a good idea, doesn't that mean that really none of the tests should be using any globals anyways?

@cce
Copy link
Contributor Author

cce commented May 23, 2022

Yes, I didn't want to go that far into it because most of the other metrics tests start up a MetricService and wait for it to report metrics, which requires Register and the reporter accessing registered metrics through the defaultRegister global. But I suppose if you know your metric test is is not running in parallel with any others, these other tests could assign over the global, or refactor some metrics methods to make it easier to work with non-globals.

brianolson
brianolson previously approved these changes May 23, 2022
Copy link
Contributor

@brianolson brianolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked a little closer and I see that counter_test.go tests are kinda doing integration, but tagcounter_test.go could be isolated easily

-       tc := NewTagCounter("tc", "wat")
+       tc := TagCounter{Name: "tc", Description: "wat"}

but, at least the counter_test.go don't call t.Parallel() so they plan on being run serially and isolating the use of the global that way.

@algorandskiy
Copy link
Contributor

Why deleted segments. Not used but might be useful on some point...

@brianolson
Copy link
Contributor

Segment is unused, kinda like some profiling/trace patterns I've seen elsewhere, but we don't have any infrastructure for collecting or processing this data. I think when the time comes that we need something like this we should build what we actually need then. "Oh look, this exists, let's use it" is misleading in this case.

@cce
Copy link
Contributor Author

cce commented May 24, 2022

I talked with @brianolson and @winder about it and we figured we could bring something conceptually similar back some day in the future if we need it. I think if we were to reimplement it, we could have a sort of timing histogram system that didn't rely on metrics.Counter, perhaps.

@cce cce merged commit 280102c into algorand:master May 24, 2022
@cce cce deleted the fix-tagcounter-prometheus branch May 24, 2022 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants