Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign DAO state monitoring to avoid performance degradation #5779

Closed
chimp1984 opened this issue Oct 23, 2021 · 3 comments · Fixed by #5782
Closed

Redesign DAO state monitoring to avoid performance degradation #5779

chimp1984 opened this issue Oct 23, 2021 · 3 comments · Fixed by #5782

Comments

@chimp1984
Copy link
Contributor

The DAO state monitoring is creating after each block a hash from the DAO state (excluding the blocks) + the latest block (as that contains a hash to the prev. block that is enough to ensure the whole chain is correct).
Some data like the spentInfoMap is already pretty large (about 80k entries) and serializing the data to a bytearray via protobuf serialisation and hashing it takes quite some time. This is the reason why parsing of even empty blocks takes about 100-200 ms. We could optimize a bit by serializing a big part of the data only if there have been actual changes (if no tx was added there is usually no change beside the blockheight and an empty block added), but that would only reduce the required time by about 50 ms. As we still need to add the block height and the latest block all the rest needs to be redone still (hashing, concatenating with prev hash,...).

So I do not see a good solution how to fix that, specially of we do not want to break the hash compatibility (if redesigning dao state hash creation some optimisations might help a bit, but I doubt it will be much better).

It takes about 10-15 min for syncing for a user who downloads the app after a 1 months release (about 4000 blocks). This is a pretty bad user experience.

When deactivating the monitoring it takes about 4 seconds to parse 4000 blocks!

I would recommend that we add a flag to the preferences and disable dao state monitoring by default, and let users choose to enable it if wanted. For users who are regularily online the parsing costs are not critical, but for new users or users who have not used the app since a while its a real problem IMO.

The DAO state is anyway an area which will require some larger refactoring to not cause more scaling issues. In such a process the monitoring could be redesigned as well. Not an easy task though....

Any input/feedback?

@chimp1984
Copy link
Contributor Author

I found a good compromise.
We can request the missing daoStateHashes from the seed node (we do that anyway for the past 10, I extend that to the missing number of hashes). We add those missing hashes to our list and mark them with a flag that we have not created them by ourslef (just informational). At every new block we create the hash ourself (the performance penalty of 200 ms is here not an issue). As the hash is created with the previous hash + the current dao state we would detect if our local dao state would be out of sync with the network, just as like we detect it if we would have created the missing hashes by ourself. We can do that also for the latest block, so we can be sure after sync that our local dao state is in sync.
This eliminates besically the problems with not having the DAO state monitored. We still leave the toggle in the preferences to enable/disable that kind of "full monitoring" mode. Seed nodes have that enabled by default and power users can/should od that as well. But for casual users there is basically no disadvantage to not have it enabled but removes the high performance degradation at syncing.

@chimp1984 chimp1984 changed the title Deactivate DAO state monitoring as it is too slow Redesign DAO state monitoring to avoid performance degradation Oct 25, 2021
chimp1984 added a commit to chimp1984/bisq that referenced this issue Oct 26, 2021
@chimp1984
Copy link
Contributor Author

The PR #5782 is now ready for review.

Here a specification of the new behaviour.

There is a new boolean field in PreferencePayload useFullModeDaoMonitor which indicates if the user wants to create the daoStateHashes by themself or if they take the hashes received from seed nodes during initial BSQ block parsing.

As descibed above that reduces dramatically costs for parsing. For 4000 blocks (about 1 month) it takes about 10-15 min. with useFullModeDaoMonitor is set to true and about 2-4 sec. if useFullModeDaoMonitor is set to false (default).

If useFullModeDaoMonitor is disabled (default for normal users) it behaves like following:

  • User gets the missing BSQ blocks from seed nodes. Parsing is super fast (usually < 1 ms). We do not create the daoStateHash at each new block.
  • Once initial BSQ block parsing is done we request the DaoStateHashes from the seed nodes from the last hash we have. The data is about 26 bytes for one hash (as we removed the prevHash its 50% smaller now)
  • When we receive the hashes we create our DaoStateBlock from the peers DaoStateHash and mark the isSelfCreated field with false as information that we have not created the hash by ourself.
  • We apply the checks and add the peer into the peersMap
  • At the last block we do instead the hashing ourself by using our daoState and the previous hash (from the peer) and add the peer into our peersMap and apply the conflic checks
  • After that we create a snapshot of our current state
  • At each new block we create the hash now by ourself. The shortcut with the peers hashes was only applied during parsing to avoid the performance penalty
  • The snapshotting will happen then after the next snapshot interval (20 blocks grid)
  • If the user shuts down and starts up again they might see some of the new blocks where we had created the hash by ourself now delivered by the peer again. That is because the new blocks have not been added to the next snapshop yet (only after about 20 blocks have passed). But the important thing is that the current block is always created by ourself and by that we can verify that the created hash, based on the previous hash from the peer is not in conflict with the network.

If useFullModeDaoMonitor is enabled (default for headless apps like seed nodes) it behaves like following (behaviour has not really changed):

  • User gets the missing BSQ blocks from seed nodes (if lite mode, otherwise from bitcoind rcp. Parsing is super fast (usually < 1 ms) but creating the hash at each new block takes about 100-200 ms. This is because the daoState data is already quite large. The spentInfoMap for instance has about 80k items. There are several steps which accumulate to the costs.
      1. Serializing the daoState excluding the blockchain.
      1. Adding the last block
      1. Concatenating the data as bytes with the previous hash
      1. Creating a hash out of that data
    • Some small part could be optimized but would improve overall cost by only 20-30%
  • Spapshots are created at each snapshot interval. This is done on background threads but still has costs of about 1-4 sec. as the daoState is about 150 MB. Writing to disc includes serialisation and write operation, both are slow for that large data.
  • After initial sync is done at each new block we create our hash and at each new snapshot interval we create a new snapshot
  • We request the hashes from the seed nodes and apply the checks to see if we are in sync

What are the drawbacks if useFullModeDaoMonitor is disabled?

  • If all nodes have it disabled potential conflicts might not be discovered as there would be very few nodes which are actually creating the hash from their daoState. At least the seed nodes are running in enabled mode. But it would be good that developers and power users also have the mode enabled to get a bit more resiliance.
  • As long there are seed nodes with correct hashes a potential conflict should be discovered. All seed nodes need to have the same hashes, if not we have to investigate and find out the issue.

Any other option to fix that performance problem?

The DAO state scaling problem need to be addressed as some point. When redesigning that there might be new options how to deal with it. But at the moment I don't see any easy solution and the suggested one here comes with quite low risks but improves the user experience a lot.

@cd2357
Copy link
Contributor

cd2357 commented Oct 28, 2021

Sounds very promising for new and casual users 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants