Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

add prometheus exporter plugin that exposes key nodeos metrics #9902

Open
matthewdarwin opened this issue Jan 13, 2021 · 6 comments
Open

add prometheus exporter plugin that exposes key nodeos metrics #9902

matthewdarwin opened this issue Jan 13, 2021 · 6 comments

Comments

@matthewdarwin
Copy link

matthewdarwin commented Jan 13, 2021

For monitoring, nodeos should have a prometheus exporter plugin (that runs on a separate port), which exposes key metrics that would be useful for monitoring.

Metrics such as

  • what is returned from /v1/chain/get_info (head block number, lib)
  • unapplied transaction queue sizes
  • blacklisted transactions size
  • subjective billing sizes
  • scheduled transaction size
  • number of forks by producer
  • number of unapplied blocks by producer
  • number of dropped blocks by producer
  • number of missed blocks (missed in a round) by producer
  • number of missing producers (missed 12 blocks in a round) by producer
  • number of double production (more than 12 blocks in a round) by producer
  • average (and last) block arrival time by producer
  • number of transactions per block by producer
  • amount of blockchain cpu used per block by producer
  • number of bytes per block by producer
  • number of bad actors which have exceeded their transaction limit (as per the changes introduced in nodeos 2.0.9)
  • number of clients connected (inbound, outbound, failed)
  • number of api (failed) requests/sec (by request type)
  • uptime
  • cpu usage by thread
  • disk space used, by volume (blocks, ship, state, trace)
  • disk space available
  • rocksdb related
  • blockvault related (produced block, vs did not produce because other nodeos "won")
  • replay status (when starting from snapshot and replaying blocks)
  • etc..

Where possible, attribute actions to a specific producer (eg when counting dropped blocks, specify which producer missed those blocks)

@matthewdarwin
Copy link
Author

I have added more examples since this enhancement was originally created.

@heifner
Copy link
Contributor

heifner commented Feb 10, 2021

How does this relate to #9996

@matthewdarwin
Copy link
Author

Well my ideal solution is that there is a prometheus exporter for all the metrics and no log scraping is needed.

However, if developing a prometheus exporter plugin is not on on the radar, then generate a log message instead and I'll scrape the log (as I do today).

So really there are 2 things: 1) generate useful metrics for debugging and 2) log them and/or expose via prometheus exporter plugin.

@matthewdarwin
Copy link
Author

Example, my log scraper currently generates something like this:

# HELP nodeos_avg_block_time Last round nodeos_avg_block_time by producer
# TYPE nodeos_avg_block_time gauge
nodeos_avg_block_time{producer="atticlabeosb"} 67.67
nodeos_avg_block_time{producer="big.one"} 275.75
nodeos_avg_block_time{producer="binancestake"} 300.08
nodeos_avg_block_time{producer="bitfinexeos1"} 209.08
nodeos_avg_block_time{producer="blockpooleos"} 180.58
nodeos_avg_block_time{producer="eoscannonchn"} -198.42
nodeos_avg_block_time{producer="eosdotwikibp"} 421.58
nodeos_avg_block_time{producer="eoseouldotio"} 147.17
nodeos_avg_block_time{producer="eosflytomars"} 94.75
nodeos_avg_block_time{producer="eoshuobipool"} 226.92
nodeos_avg_block_time{producer="eosinfstones"} 101.67
nodeos_avg_block_time{producer="eosiosg11111"} 148.92
nodeos_avg_block_time{producer="eoslambdacom"} 115.17
nodeos_avg_block_time{producer="eoslaomaocom"} 103.75
nodeos_avg_block_time{producer="eosnationftw"} -212.17
nodeos_avg_block_time{producer="eosrapidprod"} -215.25
nodeos_avg_block_time{producer="hashfineosio"} -80.83
nodeos_avg_block_time{producer="newdex.bp"} 164.92
nodeos_avg_block_time{producer="okcapitalbp1"} -50.33
nodeos_avg_block_time{producer="starteosiobp"} 264.83
nodeos_avg_block_time{producer="whaleex.com"} -120.83
nodeos_avg_block_time{producer="zbeosbp11111"} -146.50
# HELP nodeos_block_not_applied Number of block_not_applied events
# TYPE nodeos_block_not_applied counter
nodeos_block_not_applied 0
# HELP nodeos_connections connections nodeos_connections by type
# TYPE nodeos_connections gauge
# HELP nodeos_cpu CPU nodeos_cpu by thread
# TYPE nodeos_cpu gauge
nodeos_cpu{thread="nodeos"} 0
nodeos_cpu{thread="oc-monitor"} 0
nodeos_cpu{thread="oc-trampoline"} 0
# HELP nodeos_cpu_throttled Number of cpu_throttled events
# TYPE nodeos_cpu_throttled counter
nodeos_cpu_throttled 0
# HELP nodeos_dfuse_one_block_retry Number of dfuse_one_block_retry events
# TYPE nodeos_dfuse_one_block_retry counter
nodeos_dfuse_one_block_retry 0
# HELP nodeos_dfuse_truncated_action_trace Number of dfuse_truncated_action_trace events
# TYPE nodeos_dfuse_truncated_action_trace counter
nodeos_dfuse_truncated_action_trace 0
# HELP nodeos_disk Disk space nodeos_disk by volume
# TYPE nodeos_disk gauge
nodeos_disk{volume="blocks"} 2661407797470
nodeos_disk{volume="state"} 22562096450
nodeos_disk{volume="state-history"} 7198076386642
nodeos_disk{volume="traces"} 0
# HELP nodeos_double_produce Number of double_produce events by producer
# TYPE nodeos_double_produce counter
# HELP nodeos_dropped_block Number of dropped_block events
# TYPE nodeos_dropped_block counter
nodeos_dropped_block 0
# HELP nodeos_fork Number of fork events
# TYPE nodeos_fork counter
nodeos_fork 81
# HELP nodeos_last_block_time Last round nodeos_last_block_time by producer
# TYPE nodeos_last_block_time gauge
nodeos_last_block_time{producer="atticlabeosb"} -144
nodeos_last_block_time{producer="big.one"} 14
nodeos_last_block_time{producer="binancestake"} -91
nodeos_last_block_time{producer="bitfinexeos1"} -97
nodeos_last_block_time{producer="blockpooleos"} 19
nodeos_last_block_time{producer="eoscannonchn"} -383
nodeos_last_block_time{producer="eosdotwikibp"} -431
nodeos_last_block_time{producer="eoseouldotio"} -312
nodeos_last_block_time{producer="eosflytomars"} -30
nodeos_last_block_time{producer="eoshuobipool"} -245
nodeos_last_block_time{producer="eosinfstones"} -252
nodeos_last_block_time{producer="eosiosg11111"} -244
nodeos_last_block_time{producer="eoslambdacom"} -206
nodeos_last_block_time{producer="eoslaomaocom"} -253
nodeos_last_block_time{producer="eosnationftw"} -323
nodeos_last_block_time{producer="eosrapidprod"} -365
nodeos_last_block_time{producer="hashfineosio"} -168
nodeos_last_block_time{producer="newdex.bp"} -101
nodeos_last_block_time{producer="okcapitalbp1"} -282
nodeos_last_block_time{producer="starteosiobp"} -188
nodeos_last_block_time{producer="whaleex.com"} -274
nodeos_last_block_time{producer="zbeosbp11111"} -251
# HELP nodeos_late_block Number of late_block events by producer
# TYPE nodeos_late_block counter
nodeos_late_block{producer="big.one"} 1
nodeos_late_block{producer="binancestake"} 1
nodeos_late_block{producer="bitfinexeos1"} 1
# HELP nodeos_low_transaction Number of low_transaction events by producer
# TYPE nodeos_low_transaction counter
nodeos_low_transaction{producer="atticlabeosb"} 101
nodeos_low_transaction{producer="big.one"} 68
nodeos_low_transaction{producer="binancestake"} 34
nodeos_low_transaction{producer="bitfinexeos1"} 13
nodeos_low_transaction{producer="blockpooleos"} 7
nodeos_low_transaction{producer="eoscannonchn"} 1256
nodeos_low_transaction{producer="eosdotwikibp"} 2
nodeos_low_transaction{producer="eoseouldotio"} 3
nodeos_low_transaction{producer="eosflytomars"} 424
nodeos_low_transaction{producer="eoshuobipool"} 150
nodeos_low_transaction{producer="eosinfstones"} 124
nodeos_low_transaction{producer="eosiosg11111"} 127
nodeos_low_transaction{producer="eoslambdacom"} 80
nodeos_low_transaction{producer="eoslaomaocom"} 73
nodeos_low_transaction{producer="eosnationftw"} 33
nodeos_low_transaction{producer="eosrapidprod"} 23
nodeos_low_transaction{producer="hashfineosio"} 20
nodeos_low_transaction{producer="newdex.bp"} 27
nodeos_low_transaction{producer="okcapitalbp1"} 28
nodeos_low_transaction{producer="starteosiobp"} 142
nodeos_low_transaction{producer="whaleex.com"} 78
nodeos_low_transaction{producer="zbeosbp11111"} 101
# HELP nodeos_missing_block Number of missing_block events by producer
# TYPE nodeos_missing_block counter
nodeos_missing_block{producer="atticlabeosb"} 1
nodeos_missing_block{producer="blockpooleos"} 1
nodeos_missing_block{producer="eosdotwikibp"} 1
# HELP nodeos_missing_producer Number of missing_producer events by producer
# TYPE nodeos_missing_producer counter
nodeos_missing_producer{producer="eoslambdacom"} 9
# HELP nodeos_ram RAM nodeos_ram by thread
# TYPE nodeos_ram gauge
nodeos_ram{thread="nodeos"} 6609040
nodeos_ram{thread="oc-monitor"} 12776
nodeos_ram{thread="oc-trampoline"} 3608
# HELP nodeos_schedule_change Number of schedule_change events
# TYPE nodeos_schedule_change counter
nodeos_schedule_change 9
# HELP nodeos_total_transactions Last round nodeos_total_transactions by producer
# TYPE nodeos_total_transactions gauge
nodeos_total_transactions{producer="atticlabeosb"} 243
nodeos_total_transactions{producer="big.one"} 283
nodeos_total_transactions{producer="binancestake"} 232
nodeos_total_transactions{producer="bitfinexeos1"} 256
nodeos_total_transactions{producer="blockpooleos"} 278
nodeos_total_transactions{producer="eoscannonchn"} 77
nodeos_total_transactions{producer="eosdotwikibp"} 352
nodeos_total_transactions{producer="eoseouldotio"} 300
nodeos_total_transactions{producer="eosflytomars"} 216
nodeos_total_transactions{producer="eoshuobipool"} 298
nodeos_total_transactions{producer="eosinfstones"} 363
nodeos_total_transactions{producer="eosiosg11111"} 257
nodeos_total_transactions{producer="eoslambdacom"} 242
nodeos_total_transactions{producer="eoslaomaocom"} 297
nodeos_total_transactions{producer="eosnationftw"} 286
nodeos_total_transactions{producer="eosrapidprod"} 204
nodeos_total_transactions{producer="hashfineosio"} 215
nodeos_total_transactions{producer="newdex.bp"} 299
nodeos_total_transactions{producer="okcapitalbp1"} 221
nodeos_total_transactions{producer="starteosiobp"} 239
nodeos_total_transactions{producer="whaleex.com"} 177
nodeos_total_transactions{producer="zbeosbp11111"} 155
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 651.23
# HELP process_cpu_system_seconds_total Total system CPU time spent in seconds
# TYPE process_cpu_system_seconds_total counter
process_cpu_system_seconds_total 130.04
# HELP process_cpu_user_seconds_total Total user CPU time spent in seconds
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 521.19
# HELP process_max_fds Maximum number of allowed file handles
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file handles
# TYPE process_open_fds gauge
process_open_fds 11
# HELP process_resident_memory_bytes Resident memory size in bytes
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 116568064
# HELP process_start_time_seconds Unix epoch time the process started at
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1612804202.92
# HELP process_virtual_memory_bytes Virtual memory size in bytes
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 944009216

@eosusa
Copy link

eosusa commented Feb 13, 2021

+1

We also pull data out from the nodes via get info and scraping the configs/supported_apis/etc.:

nodeos_abi_serial_max{nodeoschain="EOS"} 2000
# HELP nodeos_blocklog_first Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_blocklog_first untyped
nodeos_blocklog_first{nodeoschain="EOS"} 1.62606473e+08
# HELP nodeos_blocklog_last Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_blocklog_last untyped
nodeos_blocklog_last{nodeoschain="EOS"} 1.68074435e+08
# HELP nodeos_console_log Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_console_log untyped
nodeos_console_log{nodeoschain="EOS"} 1
# HELP nodeos_get_accounts_by_authorizers Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_get_accounts_by_authorizers untyped
nodeos_get_accounts_by_authorizers{nodeoschain="EOS"} 1
# HELP nodeos_headblock Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_headblock untyped
nodeos_headblock{nodeoschain="EOS"} 1.68074763e+08
# HELP nodeos_headtime Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_headtime untyped
nodeos_headtime{nodeoschain="EOS"} 1.613189101e+09
# HELP nodeos_http_max_resp Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_http_max_resp untyped
nodeos_http_max_resp{nodeoschain="EOS"} 100
# HELP nodeos_lastblockoffset Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_lastblockoffset untyped
nodeos_lastblockoffset{nodeoschain="EOS"} -300000
# HELP nodeos_lib Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_lib untyped
nodeos_lib{nodeoschain="EOS"} 1.68074435e+08
# HELP nodeos_oc_enable Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_oc_enable untyped
nodeos_oc_enable{nodeoschain="EOS"} 0
# HELP nodeos_plugin_http Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_plugin_http untyped
nodeos_plugin_http{nodeoschain="EOS"} 1
# HELP nodeos_plugin_net Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_plugin_net untyped
nodeos_plugin_net{nodeoschain="EOS"} 0
# HELP nodeos_plugin_prod Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_plugin_prod untyped
nodeos_plugin_prod{nodeoschain="EOS"} 0
# HELP nodeos_plugin_prod_api Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_plugin_prod_api untyped
nodeos_plugin_prod_api{nodeoschain="EOS"} 0
# HELP nodeos_plugin_ship Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_plugin_ship untyped
nodeos_plugin_ship{nodeoschain="EOS"} 1
# HELP nodeos_runtime Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_runtime untyped
nodeos_runtime{nodeoschain="EOS",runtime="eos-vm-jit"} 1
# HELP nodeos_shiplog_chainstate_first Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_shiplog_chainstate_first untyped
nodeos_shiplog_chainstate_first{nodeoschain="EOS"} 1.62606473e+08
# HELP nodeos_shiplog_trace_first Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_shiplog_trace_first untyped
nodeos_shiplog_trace_first{nodeoschain="EOS"} 1.62606473e+08
# HELP nodeos_version Metric read from /opt/metrics/nodestats.prom
# TYPE nodeos_version untyped
nodeos_version{nodeoschain="EOS",version="v2.0.9"} 1
```

@kj4ezj
Copy link
Contributor

kj4ezj commented Mar 2, 2021

This would be incredibly helpful to Automation, and no-doubt to the community at-large since Prometheus is a common mechanism for vacuuming up metrics.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants