cometbft · sergio-mena · May 24, 2023 · May 23, 2023 · May 24, 2023 · May 24, 2023
@@ -17,7 +17,15 @@ range of ABCI++ functionality (ABCI 2.0), and other several improvements
 described in the
 [CHANGELOG](https://github.com/cometbft/cometbft/blob/v0.38.0-alpha.2/CHANGELOG.md).
 
-## Testbed
+## Issues discovered
+
+* (critical, fixed) [\#539] and [\#546] - This bug causes the proposer to crash in
+  `PrepareProposal` because it does not have extensions while it should.
+  This happens mainly when the proposer was catching up.
+* (critical, fixed) [\#562] - There were several bugs in the metrics-related
+  logic that were causing panics when the testnets were started.
+
+## 200 Node Testnet
 
 As in other iterations of our QA process, we have used a 200-node network as
 testbed, plus nodes to introduce load and collect metrics.
@@ -75,7 +83,7 @@ saturation point is beyond the diagonal defined by `r=200,c=2` and `r=100,c=4`.
 | r=100 | 8900  | 17800     | **35600** |
 | r=200 | 17800 | **35600** | 38660     |
 
-## Latencies
+### Latencies
 
 The following figure plots the latencies of the experiment carried out with the
 configuration `c=1,r=400`.
@@ -93,12 +101,12 @@ from this small experiment, we can say that the latencies measured on the two
 versions are equivalent, or at least that the version under test is not worse
 than the baseline.
 
-## Prometheus Metrics on the Chosen Experiment
+### Prometheus Metrics on the Chosen Experiment
 
 This section further examines key metrics for this experiment extracted from
 Prometheus data regarding the chosen experiment with configuration `c=1,r=400`.
 
-### Mempool Size
+#### Mempool Size
 
 The mempool size, a count of the number of transactions in the mempool, was
 shown to be stable and homogeneous at all full nodes. It did not exhibit any
@@ -124,7 +132,7 @@ The behavior is similar to the observed in the baseline, presented next.
 ![mempool-avg-baseline](img37/200nodes_cmt037/avg_mempool_size.png)
 
 
-### Peers
+#### Peers
 
 The number of peers was stable at all nodes. It was higher for the seed nodes
 (around 140) than for the rest (between 20 and 70 for most nodes). The red
@@ -138,7 +146,7 @@ than 50 peers is due to [\#9548].
 ![peers](img37/200nodes_cmt037/peers.png)
 
 
-### Consensus Rounds per Height
+#### Consensus Rounds per Height
 
 Most heights took just one round, that is, round 0, but some nodes needed to
 advance to round 1.
@@ -150,7 +158,7 @@ The following specific run of the baseline required some nodes to reach round 1.
 ![rounds](img37/200nodes_cmt037/rounds.png)
 
 
-### Blocks Produced per Minute, Transactions Processed per Minute
+#### Blocks Produced per Minute, Transactions Processed per Minute
 
 The following plot shows the rate in which blocks were created, from the point
 of view of each node. That is, it shows when each node learned that a new block
@@ -180,10 +188,10 @@ similar to above.
 ![total-txs-baseline](img37/200nodes_cmt037/total_txs_rate.png)
 
 
-### Memory Resident Set Size
+#### Memory Resident Set Size
 
 The following graph shows the Resident Set Size of all monitored processes, with
-maximum memory usage of 1.6GB, slighlty lower than the baseline shown after.
+maximum memory usage of 1.6GB, slightly lower than the baseline shown after.
 
 ![rss](img38/200nodes/memory.png)
 
@@ -196,9 +204,9 @@ The memory of all processes went down as the load is removed, showing no signs
 of unconstrained growth.
 
 
-### CPU utilization
+#### CPU utilization
 
-#### Comparison to baseline
+##### Comparison to baseline
 
 The best metric from Prometheus to gauge CPU utilization in a Unix machine is
 `load1`, as it usually appears in the [output of
@@ -212,7 +220,7 @@ The baseline had a similar behavior.
 
 ![load1-baseline](img37/200nodes_cmt037/cpu.png)
 
-#### Impact of vote extension signature verification
+##### Impact of vote extension signature verification
 
 It is important to notice that the baseline (`v0.37.x`) does not implement vote extensions,
 whereas the version under test (`v0.38.0-alpha.2`) _does_ implement them, and they are
@@ -228,7 +236,7 @@ has no performance impact in the current version of the system: the bottlenecks
 Thus, we should focus on optimizing other parts of the system: the ones that cause the current
 bottlenecks (mempool gossip duplication, leaner proposal structure, optimized consensus gossip).
 
-## Test Results
+### Test Results
 
 The comparison against the baseline results show that both scenarios had similar
 numbers and are therefore equivalent.
@@ -238,7 +246,121 @@ commit versions used in the experiments.
 
 | Scenario | Date       | Version                                                    | Result |
 | -------- | ---------- | ---------------------------------------------------------- | ------ |
-| CometBFT | 2023-05-21 | v0.38.0-alpha.2 (1f524d12996204f8fd9d41aa5aca215f80f06f5e) | Pass   |
+| 200-node | 2023-05-21 | v0.38.0-alpha.2 (1f524d12996204f8fd9d41aa5aca215f80f06f5e) | Pass   |
+
+
+## Rotating Node Testnet
+
+We use `c=1,r=400` as load, which can be considered a safe workload, as it was close to (but below)
+the saturation point in the 200 node testnet. This testnet has less nodes (10 validators and 25 full nodes).
+
+Importantly, the baseline considered in this section is `v0.37.0-alpha.2` (Tendermint Core),
+which is **different** from the one used in the [previous section](#200-node-testbed).
+The reason is that this testnet was not re-tested for `v0.37.0-alpha.3` (CometBFT),
+since it was not deemed necessary.
+
+Unlike in the baseline tests, the version of CometBFT used for these tests is _not_ affected by [\#9539],
+which was fixed right after having run rotating testnet for `v0.37`.
+As a result, the load introduced in this iteration of the test is higher as transactions do not get rejected.
+
+### Latencies
+
+The plot of all latencies can be seen here.
+
+![rotating-all-latencies](img38/rotating/rotating_latencies.png)
+
+Which is similar to the baseline.
+
+![rotating-all-latencies](img37/200nodes_tm037/v037_rotating_latencies.png)
+
+The average increase of about 1 second with respect to the baseline is due to the higher
+transaction load produced (remember the baseline was affected by [\#9539], whereby most transactions
+produced were rejected by `CheckTx`).
+
+### Prometheus Metrics
+
+The set of metrics shown here roughly match those shown on the baseline (`v0.37`) for the same experiment.
+We also show the baseline results for comparison.
+
+#### Blocks and Transactions per minute
+
+This following plot shows the blocks produced per minute.
+
+![rotating-heights](img38/rotating/rotating_block_rate.png)
+
+This is similar to the baseline, shown below.
+
+![rotating-heights-bl](img37/rotating/rotating_block_rate.png)
+
+The following plot shows only the heights reported by ephemeral nodes, both when they were blocksyncing
+and when they were running consensus.
+The second plot is the baseline plot for comparison. The baseline lacks the heights when the nodes were
+blocksyncing as that metric was implemented afterwards.
+
+![rotating-heights-ephe](img38/rotating/rotating_eph_heights.png)
+
+![rotating-heights-ephe-bl](img37/rotating/rotating_eph_heights.png)
+
+We seen that heights follow a similar pattern in both plots: they grow in length as the experiment advances.
 
+The following plot shows the transactions processed per minute.
+
+![rotating-total-txs](img38/rotating/rotating_txs_rate.png)
+
+For comparison, this is the baseline plot.
+
+![rotating-total-txs-bl](img37/rotating/rotating_txs_rate.png)
+
+We can see the rate is much lower in the baseline plot.
+The reason is that the baseline was affected by [\#9539], whereby `CheckTx` rejected most transactions
+produced by the load runner.
+
+#### Peers
+
+The plot below shows the evolution of the number of peers throughout the experiment.
+
+![rotating-peers](img38/rotating/rotating_peers.png)
+
+This is the baseline plot, for comparison.
+
+![rotating-peers-bl](img37/rotating/rotating_peers.png)
+
+The plotted values and their evolution are comparable in both plots.
+
+For further details on these plots, see the [this section](./TMCore-QA-34.md#peers-1).
+
+#### Memory Resident Set Size
+
+The average Resident Set Size (RSS) over all processes is notably bigger on `v0.38.0-alpha.2` than on the baseline.
+The reason for this is, again, the fact that `CheckTx` was rejecting most transactions submitted on the baseline
+and therefore the overall transaction load was lower on the baseline.
+This is consistent with the difference seen in the transaction rate plots
+in the [previous section](#blocks-and-transactions-per-minute).
+
+![rotating-rss-avg](img38/rotating/rotating_avg_memory.png)
+
+![rotating-rss-avg-bl](img37/rotating/rotating_avg_memory.png)
+
+#### CPU utilization
+
+The plots show metric `load1` for all nodes for `v0.38.0-alpha.2` and for the baseline.
+
+![rotating-load1](img38/rotating/rotating_cpu.png)
+
+![rotating-load1-bl](img37/rotating/rotating_cpu.png)
+
+In both cases, it is contained under 5 most of the time, which is considered normal load.
+The load seems to be more important on `v0.38.0-alpha.2` on average because of the bigger
+number of transactions processed per minute as compared to the baseline.
+
+### Test Result
+
+| Scenario | Date       | Version                                                    | Result |
+| -------- | ---------- | ---------------------------------------------------------- | ------ |
+| Rotating | 2023-05-23 | v0.38.0-alpha.2 (e9abb116e29beb830cf111b824c8e2174d538838) | Pass   |
 
+[\#9539]: https://github.com/tendermint/tendermint/issues/9539
 [\#9548]: https://github.com/tendermint/tendermint/issues/9548
+[\#539]: https://github.com/cometbft/cometbft/issues/539
+[\#546]: https://github.com/cometbft/cometbft/issues/546
+[\#562]: https://github.com/cometbft/cometbft/issues/562
@@ -79,7 +79,8 @@ python3 latency_plotter.py /path/to/csv/files/raw.csv
 2. Tweak the script to your needs
    1. Adjust the time window
    2. Select the right fork 
-   3. Tweak/add/remove metrics 
+   3. Select the right test case
+   4. Tweak/add/remove metrics
 3. Run the script as follows
    ```bash
    # Do the following while ensuring that the virtual environment is activated (see

@@ -5,6 +5,7 @@
 
 import matplotlib as mpl
 import matplotlib.pyplot as plt
+import matplotlib.dates as md
 
 import numpy as np 
 import pandas as pd 
@@ -14,16 +15,21 @@
 
 from prometheus_pandas import query
 
-release = 'v0.37.x-alpha3'
+#release = 'v0.37.0-alpha.2'
+release = 'v0.38.0-alpha.2'
 path = os.path.join('imgs')
 prometheus = query.Prometheus('http://localhost:9090')
 
 # Time window
 #window_size = dict(seconds=150) #CMT 0.37.x-alpha3
 #window_size = dict(seconds=126) #TM v0.37 (200 nodes) baseline
-window_size = dict(seconds=115) #CMT v0.38 (200 nodes) baseline
+#window_size = dict(hours=1, minutes=28, seconds=25) #TM v0.37.0-alpha.2 (rotating)
 #window_size = dict(seconds=130) #homogeneous
 #window_size = dict(seconds=127) #baseline
+#window_size = dict(seconds=115) #CMT v0.38.0-alpha.2 (200 nodes)
+window_size = dict(hours=1, minutes=46) #CMT v0.38.0-alpha.2 (rotating)
+
+
 ext_window_size = dict(seconds=145)
 
 # Use the time provided by latency_plotter for the selected experiment.
@@ -32,10 +38,12 @@
 #left_end = '2023-02-14T15:18:00Z' #cmt1 tm1
 #left_end = '2023-02-07T18:07:00Z' #homogeneous
 #left_end = '2022-10-13T19:41:23Z' #baseline
-#left_end = '2023-02-22T18:56:29Z' #CMT 0.37.x-alpha3
+#left_end = '2023-02-22T18:56:29Z' #CMT v0.37.x-alpha3
 #left_end = '2022-10-13T15:57:50Z' #TM v0.37 (200 nodes) baseline
 #left_end = '2023-03-20T19:45:35Z' #feature/abci++vef merged with main (7d8c9d426)
-left_end = '2023-05-22T09:39:20Z' #CMT 0.38.0-alpha.2
+#left_end = '2023-05-22T09:39:20Z' #CMT v0.38.0-alpha.2 - 200 nodes
+#left_end = '2022-10-10T15:47:15Z' #TM v0.37.0-alpha.2 - rotating
+left_end = '2023-05-23T08:09:50Z' #CMT v0.38.0-alpha.2 - rotating
 
 right_end = pd.to_datetime(left_end) + pd.Timedelta(**window_size)
 time_window = (left_end, right_end.strftime('%Y-%m-%dT%H:%M:%SZ'))
@@ -48,8 +56,8 @@
 fork='cometbft'
 #fork='tendermint'
 
-# Do prometheus queries
-queries = [ 
+# Do prometheus queries, depending on the test case
+queries200Nodes = [
     (( fork + '_mempool_size',                     time_window[0], time_window[1], '1s'), 'mempool_size',              dict(ylabel='TXs',               xlabel='time (s)', title='Mempool Size',                   legend=False, figsize=(10,6), grid=True, kind='area',stacked=True), False),
     (( fork + '_p2p_peers',                        time_window[0], time_window[1], '1s'), 'peers',                     dict(ylabel='# Peers',           xlabel='time (s)', title='Peers',                          legend=False, figsize=(10,6), grid=True), True),
     (( 'avg(' + fork + '_mempool_size)',                time_window[0], time_window[1], '1s'), 'avg_mempool_size',          dict(ylabel='TXs',               xlabel='time (s)', title='Average Mempool Size',           legend=False, figsize=(10,6), grid=True), False),
@@ -71,23 +79,39 @@
     (( 'rate(' + fork + '_consensus_total_txs[20s])*60', ext_time_window[0], ext_time_window[1], '1s'), 'total_txs_rate',    dict(ylabel='TXs/min',           xlabel='time (s)', title='Rate of transaction processing', legend=False, figsize=(10,6), grid=True), True),
 ]
 
-for (query, file_name, pandas_params, plot_average)  in queries:
+queriesRotating = [
+    (( 'rate(' + fork + '_consensus_height[20s])*60',    time_window[0], time_window[1], '1s'), 'rotating_block_rate',    dict(ylabel='blocks/min',     xlabel='time', title='Rate of Block Creation',         legend=False, figsize=(10,6), grid=True), False),
+    (( 'rate(' + fork + '_consensus_total_txs[20s])*60', time_window[0], time_window[1], '1s'), 'rotating_txs_rate',      dict(ylabel='TXs/min',        xlabel='time', title='Rate of Transaction processing', legend=False, figsize=(10,6), grid=True), False),
+    (( fork + '_consensus_height{job=~"ephemeral.*"} or ' + fork + '_blocksync_latest_block_height{job=~"ephemeral.*"}',
+                                                         time_window[0], time_window[1], '1s'), 'rotating_eph_heights',   dict(ylabel='height',         xlabel='time', title='Heights of Ephemeral Nodes',     legend=False, figsize=(10,6), grid=True), False),
+    (( fork + '_p2p_peers',                              time_window[0], time_window[1], '1s'), 'rotating_peers',         dict(ylabel='# peers',        xlabel='time', title='Peers',                          legend=False, figsize=(10,6), grid=True), False),
+    (( 'avg(process_resident_memory_bytes)',             time_window[0], time_window[1], '1s'), 'rotating_avg_memory',    dict(ylabel='memory (bytes)', xlabel='time', title='Average Memory Usage',           legend=False, figsize=(10,6), grid=True), False),
+    (( 'node_load1',                                     time_window[0], time_window[1], '1s'), 'rotating_cpu',           dict(ylabel='load',           xlabel='time', title='Node Load',                      legend=False, figsize=(10,6), grid=True), False),
+]
+
+#queries = queries200Nodes
+queries = queriesRotating
+
+
+for (query, file_name, pandas_params, plot_average) in queries:
     print(query)
 
     data_frame = prometheus.query_range(*query)
     #Tweak the x ticks
-    delta_index = pd.to_timedelta(data_frame.index.strftime('%H:%M:%S'))
-    data_frame = data_frame.set_index(delta_index)
+    data_frame = data_frame.set_index(md.date2num(data_frame.index))
+
 
-    data_frame.plot(**pandas_params)
+    pandas_params["title"] += "  -  " + release
+    ax = data_frame.plot(**pandas_params)
     if plot_average:
         average = data_frame.mean(axis=1)
         data_frame['__average__'] = average
         pandas_params['lw'] = 8
         pandas_params['style'] = ['--']
         pandas_params['color'] = ['red']
-        data_frame['__average__'].plot(**pandas_params)
+        ax = data_frame['__average__'].plot(**pandas_params)
 
+    ax.xaxis.set_major_formatter(md.DateFormatter('%H:%M:%S'))
     plt.savefig(os.path.join(path, file_name + '.png'))
     plt.plot()