Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMT implementation #197

Merged
merged 47 commits into from
Feb 4, 2020
Merged

AMT implementation #197

merged 47 commits into from
Feb 4, 2020

Conversation

austinabell
Copy link
Contributor

@austinabell austinabell commented Jan 29, 2020

Summary of changes
Changes introduced in this pull request:

  • Implements AMT IPLD structure Implement AMT structure #188
    • Allows it to be used in Added Fetch and Load Methods #196 and other PR/ changes using an AMT
    • Will try to make the code more readible in another PR, wanted to get this in ASAP to be able to be used (Code isn't janky it just isn't very readible)
    • Will open an issue to benchmark this
    • Cids of nodes will not match Lotus, as they are using Blake2b256 which isn't available in rust multihash (but I will fork and PR in other changes)

In summary, this is a sharded array, which underlying nodes get cbor serialized and stored to the underlying database. The BlockStore trait here is a wrapper around our DB traits to abstract the need to serialize and generate a cid specifically every root and node.

A node always has 8 items, but can either be a leaf node where all items are values or a link node where the values can be empty, a cid to a node (which can be pulled from the blockstore when needed), or a cached node (heap allocated). To not require every change in the AMT to be persisted to the DB, the AMT should only be flushed (remove cached nodes) when the Cid or the AMT needs to be persisted.

Reference issue to close (if applicable)

Closes #188

Other information and links

Copy link
Contributor

@dutterbutter dutterbutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See initial comments, will have to re-review as there is a lot in this PR that I need more time understanding.

ipld/amt/src/amt.rs Outdated Show resolved Hide resolved
ipld/amt/src/amt.rs Show resolved Hide resolved
ipld/amt/src/amt.rs Outdated Show resolved Hide resolved
ipld/amt/tests/amt_tests.rs Outdated Show resolved Hide resolved
ipld/amt/tests/amt_tests.rs Outdated Show resolved Hide resolved
@austinabell
Copy link
Contributor Author

By the way, I hacked together a generic bound AMT variant I just pushed here: 5b2fbe4

The benefit is type safety because the AMT would be bound on this type across all reads and writes. The disadvantage is that it would have to be explicitly defined when used and would require that type to be an owned value. In this case, the root nodes values would be of the type and only serialized when stored in the database. Benefit is that the values would not require to be heap allocated (very small benefit) but there would be a big difference in the enum variant between a link node and leaf node depending on the type which would definitely be a negative.

Don't look too in depth with that commit, just bringing up if you guys think it is a good idea. There is another option which is that I could just bind the type of AMT but only constrain the getters and setters to be of that type, which keeps everything else the same (this is the option I'm thinking of changing to since I don't like switching the implementation until benchmarks exist).

@austinabell
Copy link
Contributor Author

So to update, I did just add a commit to constrain the AMT values to one type: ca43a60

The only function signature that could change in the future if implemented is the set function, where the value would be moved in instead of passing a reference and cloning. I kept as is because I didn't want to make the bulk set function have to clone everything (only usage as of now)

@austinabell
Copy link
Contributor Author

Ended up going with the generic and typesafe version because the other way was actually incompatible, so sorry for making changes again but the way it was is going to be janky to serialize and deserialize correctly. Also this fixes #203 because I'm using a fork of multihash where I added this functionality and also checked the cid for AMTs against ones generated through Lotus

@@ -10,7 +10,7 @@ crypto = { path = "../../crypto" }
message = { package = "forest_message", path = "../../vm/message" }
clock = { path = "../../node/clock" }
cid = { package = "forest_cid", path = "../../ipld/cid" }
multihash = "0.9.3"
multihash = { git = "https://github.com/austinabell/rust-multihash", rev = "56a8304b1b47697660dba7252d214be9829e137d" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we can update these back given latest merge

let bz: Vec<u8> = serde_bytes::Deserialize::deserialize(deserializer)?;

// Get bitmap byte from serialized bytes
let bmap: BitMap = bz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to bind to bmap here? Can't just return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust will actually compile each identically, this is just for readability

/// Root of an AMT vector, can be serialized and keeps track of height and count
#[derive(PartialEq, Debug)]
pub(super) struct Root<V> {
pub(super) height: u32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought. But these fields can be pub instead of pub(super) since your struct is pub(super), right? Doesn't matter though, just a though :~)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more explicit for if the Root visibility changes in future. All of this is internal so no need for that <|:)

@austinabell austinabell merged commit 8b1b61b into master Feb 4, 2020
@austinabell austinabell deleted the austin/amt branch February 4, 2020 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement AMT structure
3 participants