Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bench] Run benchmark using Batch Transaction #953

Merged
merged 2 commits into from
Mar 20, 2022
Merged

[Bench] Run benchmark using Batch Transaction #953

merged 2 commits into from
Mar 20, 2022

Conversation

lxfind
Copy link
Contributor

@lxfind lxfind commented Mar 19, 2022

Added option --batch-size that can specify the size of a batch. If it's 1, we will send normal Single transactions.
Renamed --num-accounts to --num-transactions since accounts don't make sense here anymore.
Numbers are too good to be true. Did I miss anything?

xun@XMac fastnft-upstream % cargo run --release --bin=bench -- --num-transactions 100000 --tcp-connections 9 --committee-size 4 --batch-size 100
    Finished release [optimized] target(s) in 0.23s
     Running `target/release/bench --num-transactions 100000 --tcp-connections 9 --committee-size 4 --batch-size 100`
2022-03-19T01:45:39.514621Z  INFO bench: Starting benchmark: TransactionsAndCerts
2022-03-19T01:45:39.514693Z  INFO bench: Preparing accounts.
2022-03-19T01:45:39.515964Z  INFO bench: Open database on path: "/var/folders/59/sj45rzrs6ys39m45wt4wbx580000gn/T/DB_964B3568C20AAA7CC474DFEC5F54ECAF5065466B"
2022-03-19T01:45:39.569313Z  INFO bench: Init Authority.
2022-03-19T01:45:39.585389Z  INFO bench: Generate empty store with Genesis.
2022-03-19T01:45:39.857285Z  INFO bench: Preparing transactions.
2022-03-19T01:45:39.908551Z  INFO sui_network::transport: Listening to TCP traffic on 127.0.0.1:9555
2022-03-19T01:45:40.910623Z  INFO bench: Number of TCP connections: 9
2022-03-19T01:45:40.910697Z  INFO bench: Sending requests.
2022-03-19T01:45:40.911669Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.912023Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.912115Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.912172Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.912218Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.912723Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.912824Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.912995Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.913090Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.913232Z  INFO sui_network::network: Sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:40.927318Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.530585Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.537289Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.539340Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.547351Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.549084Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.555469Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.564625Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.571089Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.571493Z  INFO sui_network::network: Done sending TCP requests to 127.0.0.1:9555
2022-03-19T01:45:41.571523Z  INFO bench: Received 2000 responses.
2022-03-19T01:45:41.611013Z  WARN bench: Completed benchmark for TransactionsAndCerts
Total time: 660805us, items: 100000, tx/sec: 151330.57407253273

@gdanezis
Copy link
Collaborator

gdanezis commented Mar 19, 2022

I confirmed that I also see a dramatic speed up (although less dramatic on my linux, not M1 laptop):

$ RUST_LOG="debug" cargo run --release --bin=bench -- --num-transactions 400000 --tcp-connections 9 --committee-size 4 --batch-size 100
...
2022-03-19T11:08:31.614927Z DEBUG process_tx{tx_digest=t#a826f32751814df4d985227ba0fd56a878633a121e67fcb91af04decda7e3fe7 tx_kind="Batch"}: sui_core::authority: Checked locks and found mutable objects num_mutable_objects=101
2022-03-19T11:08:31.616170Z DEBUG process_cert{tx_digest=t#a826f32751814df4d985227ba0fd56a878633a121e67fcb91af04decda7e3fe7 tx_kind="Batch"}: sui_core::authority: Read inputs for transaction from DB num_inputs=101
2022-03-19T11:08:31.616556Z DEBUG process_cert{tx_digest=t#a826f32751814df4d985227ba0fd56a878633a121e67fcb91af04decda7e3fe7 tx_kind="Batch"}: sui_core::authority: Finished execution of transaction with status Success { gas_used: 1800, results: [] } gas_used=1800
...
Total time: 8673162us, items: 400000, tx/sec: 46119.28152616082

As compared to:

$ cargo run --release --bin=bench -- --num-transactions 400000 --tcp-connections 9 --committee-size 4 --batch-size 1
Total time: 37697289us, items: 400000, tx/sec: 10610.842599317952

I also took the freedom to push an additional debug! log line to this PR so that you can see the objects written on the debug log:

$ RUST_LOG="debug" cargo run --release --bin=bench -- --num-transactions 80 --tcp-connections 4 --committee-size 4 --batch-size 10
...
2022-03-19T11:29:28.761337Z DEBUG process_cert{tx_digest=t#8592361bb3b860125c7710095321c11f3e0a5d7aeb2af7ba706933b57e126b2e tx_kind="Batch"}:db_update_state: sui_core::authority::authority_store: Writing objects: [(0000000000000000000000000000271300000000, SequenceNumber(1)), (000000000000272E000000000000000000000000, SequenceNumber(1)), (000000000000272F000000000000000000000000, SequenceNumber(1)), (0000000000002730000000000000000000000000, SequenceNumber(1)), (0000000000002731000000000000000000000000, SequenceNumber(1)), (0000000000002732000000000000000000000000, SequenceNumber(1)), (0000000000002733000000000000000000000000, SequenceNumber(1)), (0000000000002734000000000000000000000000, SequenceNumber(1)), (0000000000002735000000000000000000000000, SequenceNumber(1)), (0000000000002736000000000000000000000000, SequenceNumber(1)), (0000000000002737000000000000000000000000, SequenceNumber(1))]
...

So it seems they are really being written to disk :) . The only thing to check now is whether an invalid signature is also accepted, to ensure that we do the verification (no reason not, this PR does not change this path).

@gdanezis
Copy link
Collaborator

The results from the AWS machine are also interesting:

# 19 March, 2022 -- Benchmarking the batch transaction functionality

# Machine c5.metal, 48 Physical / 96 logical CPUs, 189G Memory
# Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
# AWS cost: $4.08 / h US East (Ohio)
# sui commit 5620cb97d8c22f63e1ecfce8d0ceb04212e720fb

# Using 4 core, move, and batch = 100

$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --use-move --tcp-connections 4 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 12356028us, items: 480000, tx/sec: 38847.435437990265

# Using 8 core, move, and batch = 100

$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --use-move --tcp-connections 8 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 7085610us, items: 480000, tx/sec: 67742.93250686956

# Using 12 core, move, and batch = 100 

(***)
$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --use-move --tcp-connections 12 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 5798002us, items: 480000, tx/sec: 82787.13943182497

# Using 16 cores, move, and batch = 100

$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --use-move --tcp-connections 16 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 6344959us, items: 480000, tx/sec: 75650.6070409596

# Using 24 core, move, and batch = 100

$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --use-move --tcp-connections 24 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 10012650us, items: 480000, tx/sec: 47939.3567137571

# Using 48 core, move, and batch = 100

$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --use-move --tcp-connections 48 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 11828940us, items: 480000, tx/sec: 40578.44574408189

# No move, batch = 100

$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --tcp-connections 12 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 11790216us, items: 480000, tx/sec: 40711.72232976902

$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --tcp-connections 8 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 10878306us, items: 480000, tx/sec: 44124.51718125965

(***)
$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --tcp-connections 4 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 4887605us, items: 480000, tx/sec: 98207.608839094

$ cargo +nightly run --release --bin=bench -- --num-transactions 480000 --tcp-connections 1 --db-dir /dataother/db --db-cpus 4 --batch-size 100
> Total time: 14700361us, items: 480000, tx/sec: 32652.259356079758

I read the above as meaning that with batches we are now no more bottlenecked a s before on CPU (see the baseline going up as we throw more cpu) but instead now it is lock contention and synchronization to access the DB that costs us. This is not the DB itself being a bottleneck, as it can get to 80K TPS on fewer cores writing, but rather the machinery around it to allow for multi thread access. It is also clear to me that given this limitation Macs M1 + MacOS handles contention much better than linix (maybe we should run authorities on BSD? :) ).

Copy link
Collaborator

@gdanezis gdanezis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive.

sui/src/bench.rs Show resolved Hide resolved
@gdanezis
Copy link
Collaborator

I am also adding here the baseline perf we get for main right now as a comparison:

# Baseline using main (commit 6169655aa13ad2ed92cc0e38d1e8002ecf208f57)

$ cargo +nightly run --release --bin=bench -- --use-move --tcp-connections 12 --db-dir /dataother/db --db-cpus 4 --num-accounts 480000
> Total time: 24637351us, items: 480000, tx/sec: 19482.614019664696

$ cargo +nightly run --release --bin=bench -- --use-move --tcp-connections 24 --db-dir /dataother/db --db-cpus 4 --num-accounts 480000
> Total time: 13756117us, items: 480000, tx/sec: 34893.56771245839

$ cargo +nightly run --release --bin=bench -- --use-move --tcp-connections 48 --db-dir /dataother/db --db-cpus 4 --num-accounts 480000
> Total time: 10425709us, items: 480000, tx/sec: 46040.03430366223

(***)
$ cargo +nightly run --release --bin=bench -- --use-move --tcp-connections 96 --db-dir /dataother/db --db-cpus 4 --num-accounts 480000
> Total time: 9778699us, items: 480000, tx/sec: 49086.284382002144

$ cargo +nightly run --release --bin=bench -- --use-move --tcp-connections 192 --db-dir /dataother/db --db-cpus 4 --num-accounts 480000
> Total time: 9890012us, items: 480000, tx/sec: 48533.81371023614

Base automatically changed from batch-tx to main March 19, 2022 19:03
@lxfind lxfind changed the base branch from main to fix-bench-invariant March 19, 2022 19:05
Base automatically changed from fix-bench-invariant to main March 20, 2022 16:07
@lxfind lxfind merged commit 3ac9ae1 into main Mar 20, 2022
@lxfind lxfind deleted the benchmark-batch branch March 20, 2022 17:30
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 29, 2022
…ystenLabs#953)

Bumps [jidicula/clang-format-action](https://github.com/jidicula/clang-format-action) from 4.8.0 to 4.9.0.
- [Release notes](https://github.com/jidicula/clang-format-action/releases)
- [Commits](jidicula/clang-format-action@v4.8.0...v4.9.0)

---
updated-dependencies:
- dependency-name: jidicula/clang-format-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants