Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark transaction processing times #3627

Closed
MaciejBaj opened this issue May 15, 2019 · 3 comments

Comments

4 participants
@MaciejBaj
Copy link
Member

commented May 15, 2019

Expected behavior

After the objective "Improve transaction processing efficiency" is completed, benchmark the processing time of all transaction types and compare the results with Lisk Core 1.6.0.

Which version(s) does this affect? (Environment, OS, etc...)

2.0

@limiaspasdaniel

This comment has been minimized.

Copy link
Member

commented May 27, 2019

These tests have been performed by executing stress test transactions type 0 tests from lisk-core-qa repository and changing stressAmount variable which corresponds to Tx sent column so the tx_sent/tx_block is always 20.

Machine: Macbook Pro 15', 2018 16GB RAM, 2.2Ghz Intel Core i7
Network: devnet (local)
Versions: 1.5, 2.0
I couldn't perform the tests on 1.6 because New Relic is not able to instrument for some reasons. I will take a look in the future.

Transaction Pool settings used during all tests (v2.0) to keep it consistent:

  • maxTransactionsPerQueue: 5000 (maximum)
  • receivedTransactionsLimitPerProcessing: MAX_TRANSACTIONS_PER_BLOCK*,
  • validatedTransactionsLimitPerProcessing: MAX_TRANSACTIONS_PER_BLOCK,
  • verifiedTransactionsLimitPerProcessing: MAX_TRANSACTIONS_PER_BLOCK,
  • pendingTransactionsProcessingLimit: MAX_TRANSACTIONS_PER_BLOCK

*MAX_TRANSACTIONS_PER_BLOCK corresponds to the values of the second column

Results on 2.0

delegatesNextForge (ms)
Tx sent Tx/block Tx/s Slot time (s) Version #Slots missed #Forged blocks #Full blocks min max avg SD
500 25 2.5 10 2.0 0 20 20 0.473 1360 58.2 179
2000 100 10 10 2.0 0 20 20 0.143 6020 411 968
5000 250 25 10 2.0 2 20 20 0.113 16900 794 3420
10000 500 50 10 2.0 20 20 19 0.118 35900 1030 4630
20000 1000 100 10 2.0 11 4 2 0.517 45000 266 3040
100000 5000 500 10 2.0 2 4 0 0.460 74000 414 4620

*Standard Deviation

Results on 1.5

delegatesNextForge (ms)
Tx sent Tx/block Tx/s Slot time (s) Version #Slots missed #Forged blocks #Full blocks min max avg SD
500 25 2.5 10 1.5 0 20 20 0.348 931 94.3 245
2000 100 10 10 1.5 0 20 20 0.319 8750 486 1220
5000 250 25 10 1.5 8 20 20 0.297 21900 754 1460
10000 500 50 10 1.5 20 11 11 0.371 44900 593 4050
20000 1000 100 10 1.5 4 5 4 0.35 62800 400 4470
100000 5000 500 10 1.5 - 0 - 0.342 53.5 6.93 8.04

On 1.5 when transactions per block were <= 250 results were similar, however, blocks with over 250 transactions performed much worse, missing slots more often and missing a greater amount of transactions than in 2.0. With 5000 tx/block the application didn't manage to fill any block.

Here's a comparison of the block table between 1.5 and 2.0:

1.5:

version timestamp height numberOfTransactions
1 94837200 11 250
1 94837230 12 250
1 94837250 13 250
1 94837270 14 250
1 94837280 15 250
1 94837300 16 250
1 94837310 17 250
1 94837320 18 250
1 94837330 19 250
1 94837350 20 250
1 94837360 21 250
1 94837370 22 250
1 94837380 23 250
1 94837400 24 250
1 94837410 25 250
1 94837420 26 250
1 94837430 27 250
1 94837450 28 250
1 94837460 29 250
1 94837470 30 250

2.0:

version timestamp height numberOfTransactions
1 94839580 31 250
1 94839590 32 250
1 94839610 33 250
1 94839630 34 250
1 94839640 35 250
1 94839650 36 250
1 94839660 37 250
1 94839670 38 250
1 94839680 39 250
1 94839690 40 250
1 94839700 41 250
1 94839710 42 250
1 94839720 43 250
1 94839730 44 250
1 94839740 45 250
1 94839750 46 250
1 94839760 47 250
1 94839770 48 250
1 94839780 49 250
1 94839790 50 250

You can observe how the number of missed slots in 1.5 is 8, however in 2.0 the number of missed slots reduces to 2, which give us room for better optimization to reduce it consistently to 0. Of course, these results came only from this particular test, but repeated tests have achieved approximate results concluding that 2.0 is more stable when forging blocks with 250 transactions than on 1.5, but is still unreliable.

I have observed the following behaviors when comparing the two versions:

  • delegatesNextForge is faster on 2.0 out performing 1.5 in all the four indicators
  • When tx/block <= 100 then 1.5 and 2.0 perform very similar, blocks are forged, fully filled and slots are not missed.
  • When tx/block > 100:
    • 2.0 performs better than 1.5 but it is still unstable (it misses slots and blocks)
    • 2.0 is able to noticeably forge more blocks with 500 transactions than 1.5
    • 2.0 is even able to forge some blocks with 1000 transactions each.
  • Both versions perform stable when tx/block == 100 ~ 10tx/s.

While working with @usman and reading the reports of New Relic and manual logs we came up to the conclusion that the new Transaction Pool was not a bottleneck, however, we noticed that on full blocks with 250 transactions or more, Postgres activity was spiking. New Relic reported that a triple nested SELECT query on mem_accounts2delegates table was being called repeatedly causing the slowdown. This query is executed when the extended flag of AccountStore.cache method is set to true.

Query:

SELECT "address", ENCODE("publicKey", 'hex')  AS "publicKey", ENCODE("secondPublicKey", 'hex')  AS "secondPublicKey", "username", "isDelegate"::int::boolean, "secondSignature"::int::boolean, "balance", "asset", "multimin" AS "multiMin", "multilifetime" AS "multiLifetime", "nameexist"::int::boolean AS "nameExist", "missedBlocks", "producedBlocks", "rank", "fees", "rewards", "vote", case when "producedBlocks" + "missedBlocks" = 0 then 0 else ROUND((("producedBlocks"::float / ("producedBlocks" + "missedBlocks")) * 100.0)::numeric, 2)::float end AS productivity, (SELECT array_agg("dependentId")  FROM mem_accounts2delegates WHERE "accountId" = mem_accounts.address )  AS "votedDelegatesPublicKeys", (SELECT array_agg("dependentId")  FROM mem_accounts2multisignatures WHERE "accountId" = mem_accounts.address )  AS "membersPublicKeys" FROM mem_accounts WHERE ("address" = '16313739661670634666L')  OR ("address" = '13457048459696651162L')  ORDER BY "balance" ASC, "address" ASC LIMIT 101 OFFSET 0

By setting the flag to false for Type 0 transactions, performance increases significantly.

Further todo:

  • See why New Relic is not able to track web and non-web transactions on 1.6. It's doing it fine on 1.5.
  • Run these tests multiple times for each block size and transaction type on 1.6 and 2.0 in an isolated machine, possibly in parallel to speed things up. Average the results and compute the Standard Deviation for each one to identify spikes.
  • Send transactions using RPC instead of the HTTP API and compare the results.
  • Try to run the HTTP API as a child process and compare the results.
  • Investigate Postgres activity on big blocks on the above query.
@nazarhussain

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

@limiaspasdaniel Its great that you looked into this issue with extensive details. I would suggest few points if you had to iterate this once again.

For benchmarking any indicator for the network.

  1. Please use a Linux machine and not use the MacBook. As that's what environment most users are running.
  2. Always use an empty machine, where you know exact what is running on the system. On a development system, you never know when Spotify hangs CPU, or when system is doing file indexing , so your results could be mis-leading.
  3. Always use the binary builds and not the source code, because in binary build we optimize the PostgreSQL specifically for our use case of blockchain. So no wonder if you face any issue on PostgreSQL in development, actually replicate on a binary build.

And most importantly, mention the exact steps how you came up with the final numbers. So any one in team and community can repeat the steps to compare their self.

Apparently you may think its just internal comparison, but once you put those numbers together, unintentionally in team and publicly people will talk about these numbers, as TPS is a vital indicator for any blockchain performance. Also many future decisions would be influenced by these numbers. So its better to do it very accurate and in the right environment with the right configurations.

@limiaspasdaniel

This comment has been minimized.

Copy link
Member

commented May 30, 2019

Thanks for your input @nazarhussain. I will have this in mind for the next iteration.

@shuse2 shuse2 closed this Jun 3, 2019

Version 2.0.0 automation moved this from To do to Done Jun 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.