Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Add more debugging information about blocks produced #8477

Open
matthewdarwin opened this issue Jan 20, 2020 · 3 comments
Open

Add more debugging information about blocks produced #8477

matthewdarwin opened this issue Jan 20, 2020 · 3 comments

Comments

@matthewdarwin
Copy link

To ease troubleshooting when blocks get lost it would be very helpful to log more information about the size of the block in nodeos standard output

Today it looks like this:

Received block 2804890f34cb6cd3... #101125076 @ 2020-01-20T16:11:58.500 signed by eoseouldotio [trxs: 8, lib: 101124757, conf: 0, latency: 265 ms]

Request is to add things like:

  • size of the block (number of bytes) as seen by the producer in question
  • amount of CPU used by the producer in question

If computing this information will take additional CPU, the probably it should be optionally controlled by either a configuration or logging option.

@matthewdarwin matthewdarwin changed the title Add mored debugging information about blocks produced Add more debugging information about blocks produced Jan 29, 2020
@matthewdarwin
Copy link
Author

The rationale for this request is to understand if latency is correlated to size of the block. Or if maybe blocks are taking a long time for some other reason (eg poor peering).

@matthewdarwin
Copy link
Author

matthewdarwin commented Jan 29, 2020

I hacked nodeos to add logging, and came out with these results as a test:

19:21:36 nodeos >> zbeosbp11111: 91 ms => 452 ms, 54 trx in 12 blk 9.1 Kb/blk
19:23:43 nodeos >> zbeosbp11111: 88 ms => 837 ms, 171 trx in 10 blk 31.6 Kb/blk
19:25:48 nodeos >> zbeosbp11111: -50 ms => 478 ms, 101 trx in 12 blk 18.5 Kb/blk
19:27:54 nodeos >> zbeosbp11111: -66 ms => 505 ms, 114 trx in 12 blk 18.6 Kb/blk
19:30:00 nodeos >> zbeosbp11111: -100 ms => 393 ms, 88 trx in 12 blk 17.1 Kb/blk
19:32:06 nodeos >> zbeosbp11111: -160 ms => 802 ms, 104 trx in 11 blk 16.2 Kb/blk
19:34:19 nodeos >> zbeosbp11111: -32 ms => 420 ms, 96 trx in 11 blk 16.7 Kb/blk
19:36:24 nodeos >> zbeosbp11111: -123 ms => 483 ms, 131 trx in 12 blk 19.0 Kb/blk
19:38:24 nodeos >> zbeosbp11111: -93 ms => 434 ms, 88 trx in 11 blk 17.0 Kb/blk
19:40:30 nodeos >> zbeosbp11111: -133 ms => 474 ms, 77 trx in 10 blk 14.8 Kb/blk
19:42:42 nodeos >> zbeosbp11111: -147 ms => 563 ms, 67 trx in 11 blk 9.9 Kb/blk
19:44:48 nodeos >> zbeosbp11111: -61 ms => 411 ms, 97 trx in 12 blk 14.8 Kb/blk
19:46:48 nodeos >> zbeosbp11111: 133 ms => 478 ms, 108 trx in 12 blk 24.5 Kb/blk
227.8/13 = 17.5 Kb/blk

19:22:54 nodeos >> eosnationftw: 219 ms => 1567 ms, 80 trx in 12 blk 29.6 Kb/blk
19:25:00 nodeos >> eosnationftw: 206 ms => 1147 ms, 68 trx in 12 blk 23.3 Kb/blk
19:27:06 nodeos >> eosnationftw: -184 ms => 1114 ms, 166 trx in 12 blk 37.9 Kb/blk
19:29:12 nodeos >> eosnationftw: -154 ms => 1078 ms, 141 trx in 12 blk 28.6 Kb/blk
19:31:18 nodeos >> eosnationftw: 102 ms => 1110 ms, 57 trx in 12 blk 19.3 Kb/blk
19:33:24 nodeos >> eosnationftw: 53 ms => 1139 ms, 51 trx in 12 blk 18.0 Kb/blk
19:35:30 nodeos >> eosnationftw: 52 ms => 1644 ms, 186 trx in 12 blk 34.0 Kb/blk
19:37:36 nodeos >> eosnationftw: -234 ms => 1245 ms, 141 trx in 12 blk 32.9 Kb/blk
19:39:42 nodeos >> eosnationftw: -216 ms => 1171 ms, 138 trx in 12 blk 31.8 Kb/blk
19:41:48 nodeos >> eosnationftw: 111 ms => 1108 ms, 71 trx in 12 blk 24.2 Kb/blk
19:43:54 nodeos >> eosnationftw: -199 ms => 1188 ms, 146 trx in 12 blk 34.9 Kb/blk
19:46:00 nodeos >> eosnationftw: -225 ms => 1191 ms, 142 trx in 12 blk 28.0 Kb/blk
342.5/12 = 28.5 Kb/blk

So zbeosbp11111 blocks get to the reporting point faster than eosnationftw, but they're also about 1/3 smaller.

Initial analysis of the logs might assume that eosnationftw needs to improve peering to get the blocks out faster, but actually the block size is quiet different so maybe should look into that instead.

@ank-everstake
Copy link

I suppose the RPC must be extended to expose more valuable metrics. It's not so convenientto parse logs. And when I say "expose" I mean it's time to add prometheus metrics support as de-facto current standard.
Some metrics to mention we lack:

  • current peer count as seen in 2.0.1 log
  • latency from each bp as a label
  • trx count in block by each bp as a label
  • info about produced / received blocks (last produced block timestamp)

Also, I see that get_schedule cleos call isn't available as RPC call, but it obviously should.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants