-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMT implementation #197
AMT implementation #197
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See initial comments, will have to re-review as there is a lot in this PR that I need more time understanding.
By the way, I hacked together a generic bound AMT variant I just pushed here: 5b2fbe4 The benefit is type safety because the AMT would be bound on this type across all reads and writes. The disadvantage is that it would have to be explicitly defined when used and would require that type to be an owned value. In this case, the root nodes values would be of the type and only serialized when stored in the database. Benefit is that the values would not require to be heap allocated (very small benefit) but there would be a big difference in the enum variant between a link node and leaf node depending on the type which would definitely be a negative. Don't look too in depth with that commit, just bringing up if you guys think it is a good idea. There is another option which is that I could just bind the type of AMT but only constrain the getters and setters to be of that type, which keeps everything else the same (this is the option I'm thinking of changing to since I don't like switching the implementation until benchmarks exist). |
So to update, I did just add a commit to constrain the AMT values to one type: ca43a60 The only function signature that could change in the future if implemented is the set function, where the value would be moved in instead of passing a reference and cloning. I kept as is because I didn't want to make the bulk set function have to clone everything (only usage as of now) |
07352ae
to
850e923
Compare
Ended up going with the generic and typesafe version because the other way was actually incompatible, so sorry for making changes again but the way it was is going to be janky to serialize and deserialize correctly. Also this fixes #203 because I'm using a fork of multihash where I added this functionality and also checked the cid for AMTs against ones generated through Lotus |
blockchain/blocks/Cargo.toml
Outdated
@@ -10,7 +10,7 @@ crypto = { path = "../../crypto" } | |||
message = { package = "forest_message", path = "../../vm/message" } | |||
clock = { path = "../../node/clock" } | |||
cid = { package = "forest_cid", path = "../../ipld/cid" } | |||
multihash = "0.9.3" | |||
multihash = { git = "https://github.com/austinabell/rust-multihash", rev = "56a8304b1b47697660dba7252d214be9829e137d" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we can update these back given latest merge
let bz: Vec<u8> = serde_bytes::Deserialize::deserialize(deserializer)?; | ||
|
||
// Get bitmap byte from serialized bytes | ||
let bmap: BitMap = bz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to bind to bmap here? Can't just return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust will actually compile each identically, this is just for readability
/// Root of an AMT vector, can be serialized and keeps track of height and count | ||
#[derive(PartialEq, Debug)] | ||
pub(super) struct Root<V> { | ||
pub(super) height: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought. But these fields can be pub
instead of pub(super)
since your struct is pub(super)
, right? Doesn't matter though, just a though :~)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more explicit for if the Root visibility changes in future. All of this is internal so no need for that <|:)
Summary of changes
Changes introduced in this pull request:
In summary, this is a sharded array, which underlying nodes get cbor serialized and stored to the underlying database. The BlockStore trait here is a wrapper around our DB traits to abstract the need to serialize and generate a cid specifically every root and node.
A node always has 8 items, but can either be a leaf node where all items are values or a link node where the values can be empty, a cid to a node (which can be pulled from the blockstore when needed), or a cached node (heap allocated). To not require every change in the AMT to be persisted to the DB, the AMT should only be flushed (remove cached nodes) when the Cid or the AMT needs to be persisted.
Reference issue to close (if applicable)
Closes #188
Other information and links