Conversation
3cb3f29 to
4fe22bf
Compare
d8a1870 to
19b7365
Compare
|
Access Ops ( against 500 Accounts:
|
bobbinth
left a comment
There was a problem hiding this comment.
Looks good! Thank you! Not a full review, but I left some comments inline.
crates/store/src/accounts/mod.rs
Outdated
| #[derive(Debug, Clone)] | ||
| struct HistoricalOverlay { | ||
| block_number: BlockNumber, | ||
| rev_set: AccountMutationSet, |
There was a problem hiding this comment.
We could probably split AccountMutationSet into individual fields. It could look something like:
struct HistoricalOverlay {
block_number: BlockNumber,
root: Word,
node_mutations: HashMap<NodeIndex, Word>,
account_updates: HashMap<Word, Word>,
}Using hash maps should be more efficient for lookups, but may take more time to construct.
There was a problem hiding this comment.
If we'd use the hashmaps feature of miden-crypto that'd be the case. Do we expect that to be enabled by default?
There was a problem hiding this comment.
I think we can enable the hashmaps feature by default here, but I think this is separate from that. That is, we could use a hashmap (maybe from hashbrown) here without enabling the hashmaps feature. This would still make node looks faster.
But again, we can probably do both.
This is for calling
What are these benchmarks measuring? |
These are the measurments per additional overlay (=block) going back in history @ 500 accounts in the store, creating an opening for one of them. After 0xMiden/protocol#2006 lands I'll migrate to |
eeb3c7a to
2d0081e
Compare
|
Benchmarks (excerpt, @ For 500 accounts present in the account tree, we get the following performance across the
Benchmarks (excerpt, @ |
Reduces noise significantly on testnet, and we don't have many practical error cases _yet_. In the future we should increment it again.
bobbinth
left a comment
There was a problem hiding this comment.
Looks good! Thank you. Not a full review, but I left a few comments inline.
Also, I tried to make this branch work with the latest next from miden-base - but it seems like there are some inconsistencies and code duplication (e.g., AccountTreeBackend).
Regarding performance (based on the numbers in #1292) - it seems like we are taking quite a significant performance hit. Even comparing it with bigger trees (based on 0xMiden/crypto#438 (comment)) it seems like performance degradation for lookups is close to 10x. Maybe using hash maps instead of BTreeMap in HistoricalOverlay will fix this - but if not, we'll need to try to figure out how to fix this in a follow up.
The mental target I have is that lookup for a tree with 100M accounts at depth 20 should be under 50 µs (currently, with a non-historical tree, I believe this is about 20 µs). But again, beyond using hashmaps instead of BTreeMaps - this is not something to solve in this PR.
Another aspect of performance is how the update times change. Given that the extra work is basically just cloning a mutation set, I think it should be negligible - but would be good to confirm.
crates/store/src/historical/mod.rs
Outdated
| /// Trait abstracting operations over different account tree backends. | ||
| pub trait AccountTreeBackend { |
There was a problem hiding this comment.
How is this different from the AccountTreeBackend that we have in miden-base? Do we need both?
There was a problem hiding this comment.
We do, they're different. For better disambiguation I renamed the one here to trait AccountTreeStorage. The miden-base trait AccountTreeBackend we need to parameterize the AccountTree over storage backends aka Smt implementations.
Here we do parameterizer the AccountTreeWithHistory over implementations over a minimal set of primitives required to allow for it to work, i.e. AccountTree<S: AccountTreeBackend>.
The names are far from optimal, my current approach is to call miden-base::AccounTreeBackend ~~~> AccountTreeStorage and minde-node::AccountTreeStorage ~~~> LatestAccountTreeImpl. I don't really like these names either; open to suggestions
crates/store/src/accounts/mod.rs
Outdated
| #[derive(Debug, Clone)] | ||
| struct HistoricalOverlay { | ||
| block_number: BlockNumber, | ||
| rev_set: AccountMutationSet, |
There was a problem hiding this comment.
I think we can enable the hashmaps feature by default here, but I think this is separate from that. That is, we could use a hashmap (maybe from hashbrown) here without enabling the hashmaps feature. This would still make node looks faster.
But again, we can probably do both.
crates/store/src/historical/mod.rs
Outdated
| /// Returns the oldest block still in history. | ||
| pub fn oldest_block_num(&self) -> BlockNumber { |
There was a problem hiding this comment.
AFAICT, this method does not mutate the state - so, it should probably not be in the "public mutators" section.
Similar comment for contains_account_id_prefix() method below.
There was a problem hiding this comment.
We do use it, it wasn't shown as used before since I wanted to hold off on using AccountTreeWithHistory with the remaining integration points.
crates/store/src/state.rs
870- let new_account_id_prefix_is_unique = if account_commitment.is_empty() {
871: Some(!inner.account_tree.contains_account_id_prefix(account_id.prefix()))
872- } else {
873- None
874- };
That is correct and intentional, since we talked about postponing the required schema change to a follow up PR (the second piece of disabling is https://github.com/0xMiden/miden-node/blob/8c1b34e42de3ec29a71d8b7891b36fcdd2c8a178/crates/store/src/state.rs#L944 )
So the intent here is to keep public API as is and de-risk planned demos? We indeed could do this and combine it with the schema change. tl;dr I'll carve out that part from the PR and do follow-up for those pieces combined with the schema change |
Yeah, basically keep the schema the same, but error out if the endpoint is invoked for a historical block. |
…-tree-with-history
…-tree-with-history
bobbinth
left a comment
There was a problem hiding this comment.
Looks good! Thank you! I left a few more comments inline. The main one is about making block_num an optional parameter for the endpoint and, for now, returning an error if it is provided.
Also, would be great to see the latest performance numbers, and I'd love to get one more review either from @sergerad or from @igamigo.
| pub struct AccountTreeWithHistory<S> | ||
| where | ||
| S: AccountTreeStorage, | ||
| { | ||
| /// The current block number (latest state). | ||
| block_number: BlockNumber, | ||
| /// The latest account tree state. | ||
| latest: S, | ||
| /// Historical overlays indexed by block number, storing reversion data. | ||
| overlays: BTreeMap<BlockNumber, HistoricalOverlay>, | ||
| } |
There was a problem hiding this comment.
Maybe not for this PR, but I think it should be possible to define AccountTreeWithHistory as:
pub struct AccountTreeWithHistory<S: AccountTreeBackend> {
block_number: BlockNumber,
latest: AccountTree<S>,
overlays: BTreeMap<BlockNumber, HistoricalOverlay>,
}Because we probably won't have any type besides AccountTree to use here for latest.
And this should allow us to get rid of AccountTreeStorage trait and associated code.
4dcd48e to
8ac3183
Compare
8ac3183 to
503ac11
Compare
|
| Size | Time |
|---|---|
| 1 | 1.31 µs |
| 10 | 1.35 µs |
| 50 | 1.42 µs |
| 100 | 1.50 µs |
| 500 | 1.51 µs |
| 1000 | 1.48 µs |
[Access] Account Tree Historical vs Vanilla (n Accounts = 500)
| Depth | Historical | Vanilla |
|---|---|---|
| 0 | 1.61 µs | 1.51 µs |
| 5 | 8.20 µs | 1.51 µs |
| 10 | 13.95 µs | 1.51 µs |
| 20 | 25.18 µs | 1.51 µs |
| 32 | 39.75 µs | 1.51 µs |
Summary: Historical access adds ballpark 1µs per depth level
Account Tree Historical Access by Depth and Size
| Depth | Size 10 | Size 100 | Size 500 | Size 2500 |
|---|---|---|---|---|
| 0 | 1.40 µs | 1.47 µs | 1.61 µs | 1.55 µs |
| 5 | 7.69 µs | 7.91 µs | 8.20 µs | 8.30 µs |
| 10 | 13.73 µs | 13.49 µs | 13.95 µs | 14.56 µs |
| 20 | 25.51 µs | 24.36 µs | 25.18 µs | 26.88 µs |
| 32 | 40.10 µs | 37.80 µs | 39.75 µs | 40.92 µs |
Summary: Historical access time dominated by depth, size has minimal impact
|
Very nice! Thank you for running these! Comparing #1292 (comment) with #1292 (comment) it seems like w/e we did recently resulted in almost a 10x improvement - e.g., accessing a path at depth 20 now takes ~25 µs vs. ~210 µs before. Is this how you interpret this as well? |
My interpretation is the same. |
| pub block_num: u32, | ||
| /// Block at which we'd like to get this data. If present, must be close to the chain tip. | ||
| #[prost(message, optional, tag = "2")] | ||
| pub block_num: ::core::option::Option<super::blockchain::BlockNumber>, |
There was a problem hiding this comment.
Do we have a corresponding PR in client live or coming up?
There was a problem hiding this comment.
Not yet, CC @igamigo do you want to do it, otherwise I'll do it today/tomorrow
bobbinth
left a comment
There was a problem hiding this comment.
All looks good! Thank you!
igamigo
left a comment
There was a problem hiding this comment.
Looks great! Not the most in-depth review but the critical functionality looks good and so do tests
…27-account-tree-with-history
An alternative to
SmtWithHistory, but uses a similar pattern.Depends on
#2002 in miden-basemergedand #596 in miden-cryptoclosedSee #1227
The idea here in contrast to
SmtWithHistoryis not to change the underpinnings ofAccountTreebut wrap it.The advantage is a minimal API surface to be implemented, i.e. we really ever only need
AccountTree'::open_at(block, account)which is significantly smaller to implement compared to representing a fullSmtat past blockBlockNumber.Impl approach:
Use the latest opening, and override elements in it using the reversion mutation sets (so the inverse of the applied mutation on block number increment / block applied). This allows for traversing the reversion mutation sets if they contain inner nodes and leaf updates for relevant leaves, and as such update the merkle path, and hence opening.
Caveats:
Review approach:
The meat of changes is in
accounts/mod.rsand some additional changes around the database in thestore/*. Ignore the fact that we still don't return the actually correct account data, but that of the latest block, the remaining changes would alter the schema some and bloat the PR a little more, which is already large enough.