Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: collect and report Go runtime.metrics #4041

Merged
merged 10 commits into from
May 26, 2022
Merged

Conversation

cce
Copy link
Contributor

@cce cce commented May 24, 2022

Summary

This adds a util.Metrics integration with the builtin Go runtime/metrics package, which provides metrics about memory, GC, and scheduling. It is only enabled if the new config flag EnableRuntimeMetrics is enabled.

Test Plan

New test added to assert metrics are created and formatted properly. Prometheus output is as below:

# HELP go_gc_cycles_automatic_gc_cycles Count of completed GC cycles generated by the Go runtime.
# TYPE go_gc_cycles_automatic_gc_cycles counter
go_gc_cycles_automatic_gc_cycles 11
# HELP go_gc_cycles_forced_gc_cycles Count of completed GC cycles forced by the application.
# TYPE go_gc_cycles_forced_gc_cycles counter
go_gc_cycles_forced_gc_cycles 0
# HELP go_gc_cycles_total_gc_cycles Count of all completed GC cycles.
# TYPE go_gc_cycles_total_gc_cycles counter
go_gc_cycles_total_gc_cycles 11
# HELP go_gc_heap_allocs_bytes Cumulative sum of memory allocated to the heap by the application.
# TYPE go_gc_heap_allocs_bytes counter
go_gc_heap_allocs_bytes 797388696
# HELP go_gc_heap_allocs_objects Cumulative count of heap allocations triggered by the application. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
# TYPE go_gc_heap_allocs_objects counter
go_gc_heap_allocs_objects 2146881
# HELP go_gc_heap_frees_bytes Cumulative sum of heap memory freed by the garbage collector.
# TYPE go_gc_heap_frees_bytes counter
go_gc_heap_frees_bytes 540502888
# HELP go_gc_heap_frees_objects Cumulative count of heap allocations whose storage was freed by the garbage collector. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
# TYPE go_gc_heap_frees_objects counter
go_gc_heap_frees_objects 1490695
# HELP go_gc_heap_goal_bytes Heap size target for the end of the GC cycle.
# TYPE go_gc_heap_goal_bytes gauge
go_gc_heap_goal_bytes 279787424
# HELP go_gc_heap_objects_objects Number of objects, live or unswept, occupying heap memory.
# TYPE go_gc_heap_objects_objects gauge
go_gc_heap_objects_objects 656186
# HELP go_gc_heap_tiny_allocs_objects Count of small allocations that are packed together into blocks. These allocations are counted separately from other allocations because each individual allocation is not tracked by the runtime, only their block. Each block is already accounted for in allocs-by-size and frees-by-size.
# TYPE go_gc_heap_tiny_allocs_objects counter
go_gc_heap_tiny_allocs_objects 385544
# HELP go_memory_classes_heap_free_bytes Memory that is completely free and eligible to be returned to the underlying system, but has not been. This metric is the runtime's estimate of free address space that is backed by physical memory.
# TYPE go_memory_classes_heap_free_bytes gauge
go_memory_classes_heap_free_bytes 1212416
# HELP go_memory_classes_heap_objects_bytes Memory occupied by live objects and dead objects that have not yet been marked free by the garbage collector.
# TYPE go_memory_classes_heap_objects_bytes gauge
go_memory_classes_heap_objects_bytes 256885808
# HELP go_memory_classes_heap_released_bytes Memory that is completely free and has been returned to the underlying system. This metric is the runtime's estimate of free address space that is still mapped into the process, but is not backed by physical memory.
# TYPE go_memory_classes_heap_released_bytes gauge
go_memory_classes_heap_released_bytes 1966080
# HELP go_memory_classes_heap_stacks_bytes Memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use.
# TYPE go_memory_classes_heap_stacks_bytes gauge
go_memory_classes_heap_stacks_bytes 4784128
# HELP go_memory_classes_heap_unused_bytes Memory that is reserved for heap objects but is not currently used to hold heap objects.
# TYPE go_memory_classes_heap_unused_bytes gauge
go_memory_classes_heap_unused_bytes 3587024
# HELP go_memory_classes_metadata_mcache_free_bytes Memory that is reserved for runtime mcache structures, but not in-use.
# TYPE go_memory_classes_metadata_mcache_free_bytes gauge
go_memory_classes_metadata_mcache_free_bytes 6784
# HELP go_memory_classes_metadata_mcache_inuse_bytes Memory that is occupied by runtime mcache structures that are currently being used.
# TYPE go_memory_classes_metadata_mcache_inuse_bytes gauge
go_memory_classes_metadata_mcache_inuse_bytes 9600
# HELP go_memory_classes_metadata_mspan_free_bytes Memory that is reserved for runtime mspan structures, but not in-use.
# TYPE go_memory_classes_metadata_mspan_free_bytes gauge
go_memory_classes_metadata_mspan_free_bytes 26824
# HELP go_memory_classes_metadata_mspan_inuse_bytes Memory that is occupied by runtime mspan structures that are currently being used.
# TYPE go_memory_classes_metadata_mspan_inuse_bytes gauge
go_memory_classes_metadata_mspan_inuse_bytes 1513272
# HELP go_memory_classes_metadata_other_bytes Memory that is reserved for or used to hold runtime metadata.
# TYPE go_memory_classes_metadata_other_bytes gauge
go_memory_classes_metadata_other_bytes 13469856
# HELP go_memory_classes_os_stacks_bytes Stack memory allocated by the underlying operating system.
# TYPE go_memory_classes_os_stacks_bytes gauge
go_memory_classes_os_stacks_bytes 0
# HELP go_memory_classes_other_bytes Memory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more.
# TYPE go_memory_classes_other_bytes gauge
go_memory_classes_other_bytes 1534960
# HELP go_memory_classes_profiling_buckets_bytes Memory that is used by the stack trace hash map used for profiling.
# TYPE go_memory_classes_profiling_buckets_bytes gauge
go_memory_classes_profiling_buckets_bytes 1586088
# HELP go_memory_classes_total_bytes All memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes.
# TYPE go_memory_classes_total_bytes gauge
go_memory_classes_total_bytes 286582840
# HELP go_sched_goroutines_goroutines Count of live goroutines.
# TYPE go_sched_goroutines_goroutines gauge
go_sched_goroutines_goroutines 136

@@ -5,7 +5,7 @@
"TelemetryURI": "{{TelemetryURI}}",
"EnableMetrics": false,
"MetricsURI": "{{MetricsURI}}",
"ConfigJSONOverride": "{ \"TxPoolExponentialIncreaseFactor\": 1, \"DNSBootstrapID\": \"<network>.algodev.network\", \"DeadlockDetection\": -1, \"PeerPingPeriodSeconds\": 30, \"BaseLoggerDebugLevel\": 4, \"EnableProfiler\": true, \"CadaverSizeTarget\": 0 }",
"ConfigJSONOverride": "{ \"TxPoolExponentialIncreaseFactor\": 1, \"DNSBootstrapID\": \"<network>.algodev.network\", \"DeadlockDetection\": -1, \"PeerPingPeriodSeconds\": 30, \"BaseLoggerDebugLevel\": 4, \"EnableProfiler\": true, \"EnableRuntimeMetrics\": true, \"CadaverSizeTarget\": 0 }",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to regenerate all these node.json files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the makefiles use netgoal generate -t net to create them — it's now enabled by default in netgoal in https://github.com/algorand/go-algorand/pull/4041/files#diff-04beb8fc68bf6283bb3a6a9c66f42da27b56f2acea297027cf4658cffcbae369R249

util/metrics/registryCommon.go Outdated Show resolved Hide resolved
util/metrics/runtime_test.go Show resolved Hide resolved
@codecov
Copy link

codecov bot commented May 24, 2022

Codecov Report

Merging #4041 (bb395b5) into master (280102c) will increase coverage by 0.03%.
The diff coverage is 80.00%.

@@            Coverage Diff             @@
##           master    #4041      +/-   ##
==========================================
+ Coverage   54.43%   54.47%   +0.03%     
==========================================
  Files         390      391       +1     
  Lines       48549    48594      +45     
==========================================
+ Hits        26429    26471      +42     
- Misses      19897    19901       +4     
+ Partials     2223     2222       -1     
Impacted Files Coverage Δ
config/localTemplate.go 42.85% <ø> (ø)
daemon/algod/server.go 5.00% <0.00%> (-0.07%) ⬇️
netdeploy/networkTemplate.go 29.56% <0.00%> (-0.26%) ⬇️
util/metrics/runtime.go 85.36% <85.36%> (ø)
util/metrics/registryCommon.go 100.00% <100.00%> (ø)
network/netprio.go 69.56% <0.00%> (-8.70%) ⬇️
ledger/blockqueue.go 82.18% <0.00%> (-2.88%) ⬇️
network/wsNetwork.go 62.99% <0.00%> (+0.19%) ⬆️
network/wsPeer.go 71.66% <0.00%> (+0.27%) ⬆️
catchup/service.go 68.88% <0.00%> (+0.74%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 280102c...bb395b5. Read the comment docs.

}

// AddMetric does not add runtime metrics to the map used for heartbeat metrics.
func (rm *RuntimeMetrics) AddMetric(_ map[string]float64) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least add a TODO: WRITEME comment or something? I get that this isn't the primary way we get data back from metrics, but, see

cmd/algod/main.go:348 	metrics.DefaultRegistry().AddMetrics(values)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was intentionally leaving it out of the heartbeat message, the reasoning being we only want this for performance testing and it's not super useful to collect at 10 minute resolution with heartbeat messages.. I hadn't considered whether we would want some of these metrics from real nodes reporting telemetry..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the flag for turning these metrics on, if they're on they should be on everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris is right, we probably shouldn't write these to the telemetry server. I also kinda agree with Brian on principle. The interface here isn't clear that AddMetric is only used for heartbeat. On the other hand, what are the odds that we attempt to use this feature for something besides heartbeat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments to the interface to explain what each method is for, and implemented AddMetric

@cce cce requested a review from algonautshant May 26, 2022 14:19
brianolson
brianolson previously approved these changes May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants