Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fastx linux perf] Authority Store Mutex leads to poor performance on Linux multi-core #158

Closed
gdanezis opened this issue Jan 12, 2022 · 2 comments · Fixed by #160
Closed
Assignees

Comments

@gdanezis
Copy link
Collaborator

gdanezis commented Jan 12, 2022

The authority handlers for order and certificate need to write to the DB if some of the locks exist, and are set to specific values. Right now this is accomplished through a Mutex making the read operation of the locks and the write operation atomic.

However this leads to appalling performance on large multi-core Linux servers, which are our target deployment platform (funnily osx seems to have very different characteristics). For example on a 72 logical core Linux system (AWS c5n.metal) the benchmark performance is about 3K TPS with the Mutex, but with the Mutex removed (UNSAFE!) it goes up to 25K TPS (good but still less than expected in my books -- I was expecting over double that ie ~60K TPS).

Therefore we need a different strategy to ensure that the critical regions in authority store are atomic. Luckily, we foresaw this issue:

  • When processing orders we know exactly which locks (a small number, one per input mutable object) need to be checked.
  • When processing a certificate we may do away with checking locks, and only check if this certificate was already processed by Transaction_digest.
  • Conflicts of locks should be very rare, as they indicate (at some level) equivocation. So we are paying a heavy price for something we do not expect all that often (but we should since we are building a secure system.)
  • Instead of using the database to check locks we can use Atomics in memory, to check locks.

We can have a discussion below on the best approach.

@huitseeker
Copy link
Contributor

https://github.com/rust-rocksdb/rust-rocksdb/blob/522b4967389881749405b5872f46d0b8d0625b81/src/db_options.rs#L805-L823

You almost definitely want to call this function if your system is bottlenecked by RocksDB.

@gdanezis
Copy link
Collaborator Author

https://github.com/rust-rocksdb/rust-rocksdb/blob/522b4967389881749405b5872f46d0b8d0625b81/src/db_options.rs#L805-L823

You almost definitely want to call this function if your system is bottlenecked by RocksDB.

Good call, and indeed now I do call it. However we are bottle necked by our own mutex sadly, rather than rocksdb. The good news is of course that we foresaw this, so I have a solution in #160

@gdanezis gdanezis self-assigned this Jan 12, 2022
mwtian pushed a commit that referenced this issue Sep 12, 2022
mwtian pushed a commit to mwtian/sui that referenced this issue Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants