Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement table-based MAST #1349

Merged
merged 109 commits into from
Jun 18, 2024
Merged

Implement table-based MAST #1349

merged 109 commits into from
Jun 18, 2024

Conversation

plafer
Copy link
Contributor

@plafer plafer commented Jun 5, 2024

Closes #1217

core/src/mast/mod.rs Outdated Show resolved Hide resolved
core/src/mast/mod.rs Outdated Show resolved Hide resolved
MastNode::Call(node) => self.execute_call_node(node, mast_forest),
MastNode::Dyn => self.execute_dyn_node(mast_forest),
MastNode::External(node_digest) => {
// TODOP: Is this how we do it? Is an `External` guaranteed to be part of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take: an External node can be in the MastForest, or it might not be, in which case we would need to try to load the given MAST root from the MAST object store (presumably to be added once this PR lands and we've finished spec'ing out the final details of that), and if both attempts fail, then raise an error. I think External is a bit of a misnomer now, Proxy is probably more appropriate, since it reflects more precisely what it represents - but I don't have any strong feelings about it.

We could maybe have an explicit MastNode variant for the local case, and when loading a MastForest, resolve proxy nodes to either a Local or External variant, but in my opinion it is probably cleaner to just handle references-by-digest with a single variant, since there isn't any performance benefit to splitting them here, unlike in a traditional VM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an External node can be in the MastForest, or it might not be

Under what circumstances could we end up with an External node which is in the MastForest? It seems to me that if that happens, we can always do a pass over the MastForest to remove such external nodes and replace them with direct references (by ID). For example: Call -> External -> Join could be replaced with just Call -> Join. Or am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we add the object store (probably in a future PR), executing of external nodes could look like this:

MastNode::External(node_digest) => {
    let mast_forest = mast_forest.loader.get(node_digest)?;
    let node_id = mast_forest.get_node_id_by_digest(node_digest)?;
    self.execute_mast_node(node_id, mast_forest)
}

The above assumes that we do guarantee that External nodes are not in the current MAST forest - but even if not, the above code would get just slightly more complicated.

This also assumes that on the object store we have the following method:

trait ObjectStore {
    /// Returns MAST forest containing a non-external node with the specified digest.
    fn get(&self, node_digest: Digest) -> Option<&MastForest)>
}

The above does assume that a given node digest resolves to a single MAST forest - so, we'll need to come up with a strategy to handle MastForests in the object sore which have overlapping set of digests - but I don't think that should be all that challenging.

Copy link
Contributor

@bitwalker bitwalker Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see much of an advantage in trying to ensure External nodes aren't in the current MastForest, you can simply always check if the digest is in the current MastForest before attempting to resolve via the loader. We're not talking about a significant amount of overhead here, and it would be better to avoid additional overhead in the loader, than the negligible overhead of the lookup during execution of the node. Traversing the whole MastForest to try and remove External nodes which are currently in the forest could be quite expensive.

As for how this happens in the first place, there are a couple of ways AFAIK:

  1. During compilation, we resolve any invocation of a procedure to its MAST root, and use a proxy node to avoid cloning the call graph under that callee at every callsite. Thus, as I understand it, we will end up with External nodes that aren't actually external.
  2. During compilation, an invocation of a procedure for which we have a MAST root, but not the code, is referenced via proxy as above. However, if the module containing that MAST root is later added to the compilation graph, it will end up in the same MastForest, so again, the code is not actually external at that point.
  3. One can merge multiple MastForests together into a single forest, so what was previously external, might not be post-merge.

As an aside, I think it would be useful to be able to use MastForest as a more dynamic structure, i.e. like a read-through cache, loading more of the forest as-needed during program execution. Even more ideal would be to allow parts of the forest to be unloaded if rarely used, but that may be impractical without making it overly complex. Assuming the read-through cache behavior, an External node could always theoretically be in the current MastForest.

The above does assume that a given node digest resolves to a single MAST forest

By definition this must always be the case, as a given digest is the root of a MAST tree.

..we'll need to come up with a strategy to handle MastForests in the object store which have overlapping set of digests

Remember, my suggestion for the object store was that it would break apart MastForests into smaller ones containing individual trees (corresponding to procedures) when storing them on disk (using the MAST root of that procedure as the key), with references to other procedures done by proxy, not stored inline. The object store would then maintain an in-memory cache of heavily used objects, loading them from disk at a granular level as needed. The VM (and perhaps even the object store) can maintain a MastForest that contains all of the objects loaded for a given program. Loading into that MastForest can be done eagerly or lazily, but this design is optimized for the lazy case, by using the proxy/external references to drive loading new objects into the forest as they are reached during execution - first by checking if they are in the forest already, second by checking if they are loaded in the object store (and simply be copying them into the current forest), and lastly, by loading them into the object store cache (or directly into the current forest, depending on the caching heuristic).

The data structure used for individual objects (which again, are assumed here to represent code for a single procedure) would be a MastForest, which is why I was saying elsewhere that it is important to be able to merge them, as well as split them apart into separate forests (where you would be splitting on procedure boundaries). Doing so efficiently relies on not inlining references to MAST roots within the forest.

To the extent that multiple procedures reference MAST nodes (which do not correspond to a procedure) with the same digest, they would be present in the MastForest for all of those procedures (when split apart), but de-duplicated when merging those forests together, since every MAST node in the forest with a given digest is only stored once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see much of an advantage in trying to ensure External nodes aren't in the current MastForest

The way I was thinking about it, the advantage was related to serialization/deserialization, but also reading the above I think we may have slightly different views of the role of the MastForest struct. I am thinking about it as a relatively static struct, while you are thinking of it is a dynamic struct.

Remember, my suggestion for the object store was that it would break apart MastForests into smaller ones containing individual trees (corresponding to procedures) when storing them on disk

This clarifies things. I was thinking of MastForest as corresponding to modules or maybe even entire libraries - but I think splitting everything into procedures makes sense. I do wonder, however, if this implies that MastForests need to be "dynamic". I think it makes sense that the ObjectStore would need to dynamically load/unload some MAST forests but I'm not sure there would be a need to combine MAST forests together.

I'm also thinking that maybe ObjectStore does not need to be contained in the MastForest (as described in #1226 (comment)), but rather it should be a an object with which the VM is instantiated (e.g., a property of the Process struct).

So, the overall architecture could look something like this:

The ObjectStore trait would still be defined something like this:

trait ObjectStore {
    /// Returns MAST forest corresponding to the specified digest.
    fn get(&self, node_digest: Digest) -> Option<&MastForest>
}

At the time when a new MAST forest is added the object store, it would be broken up into smaller MAST forests corresponding to individual procedures. For this, we may need to modify the current implementation of MastForest to make identifying procedures easier (more on this later). From that point on, we can assume that any time we get a MastForest from the object store, it corresponds to a single procedure. How object store maintains the procedures internally (e.g., what's on disc vs. what's in memory) is an implementation detail of a specific object store.

During program execution, when we come across an external node, we could do something like this:

MastNode::External(node_digest) => {
    match mast_forest.get_node_id_by_digest(node_digest) {
        Some(node_id) => self.execute_mast_node(node_id, mast_forest),
        None => {
            let mast_forest = self.loader.get(node_digest)?;
            let node_id = mast_forest.get_node_id_by_digest(node_digest)?;
            self.execute_mast_node(node_id, mast_forest)
        }
    }
}

The above code first checks if the node_digest is in the current MAST forest, and if not, tries to get the corresponding MAST forest from the object store. However, I'm still not sure if that's actually needed. And the concern here is not the overhead of performing this extra check (agreed that it would be minimal), but rather the need to maintain a map of all node_digest -> node_id within MastForest.

Specifically, I'm not sure we'll need to have node_id_by_hash field in MastForest. Instead, we could have roots field containing all "exported" roots from a given MAST forest. So, for example, for a MAST forest containing a single procedure, roots would contain a single value corresponding to the root of this procedure. (for a MAST forest corresponding to a module, roots would contain MAST roots of all procedures).

So, assuming ObjectStore always gives us MastForest for a single procedure, if External nodes always imply nodes not present in the current MAST forest, we can get rid of node_id_by_hash map from MastForest.

Traversing the whole MastForest to try and remove External nodes which are currently in the forest could be quite expensive.

This could be the case, but:

  1. In most cases this would be done "at compile time" - and performance here is much less critical than at runtime.
  2. This can probably be integrated into the procedure of splitting large MAST forests into smaller MAST forests corresponding to individual procedures and the extra overhead there could be pretty small.
  1. During compilation, we resolve any invocation of a procedure to its MAST root, and use a proxy node to avoid cloning the call graph under that callee at every callsite. Thus, as I understand it, we will end up with External nodes that aren't actually external.

I think the same can be accomplished by adding MAST root of the procedure to the roots field I mentioned above. The cloning would also be avoided "by construction" (i.e., in the MastForest there will be no duplicate nodes).

  1. During compilation, an invocation of a procedure for which we have a MAST root, but not the code, is referenced via proxy as above. However, if the module containing that MAST root is later added to the compilation graph, it will end up in the same MastForest, so again, the code is not actually external at that point.

I think this can be resolved via a separate pass right before the final MastForest is output. Since this happens at compile time, I think we can take a performance hit (assuming it is not too big, which I don't think should be the case).

  1. One can merge multiple MastForests together into a single forest, so what was previously external, might not be post-merge.

If this happens at compile time, I think we can resolve the issue as mentioned above. However, I think we should try to avoid doing this at runtime because merging MastForests will require recomputing quite a few node indexes and that could be a relatively expensive procedure. We can of course do this if the benefits outweigh the costs, but I'm not yet seeing what we'd gain by merging MAST forests in the object store.

Comment on lines 57 to 61
pub(super) fn start_join_node(
&mut self,
node: &JoinNode,
mast_forest: &MastForest,
) -> Result<(), ExecutionError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should make this method (and other similar methods) oblivious to the fact that mast_forest exists. For example, this method could look like:

pub(super) fn start_join_node(
    &mut self,
    node: &JoinNode,
    children: [&MastNode; 2],
) -> Result<(), ExecutionError>

And then the job of fetching children of a given node would reside in the Process struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then the job of fetching children of a given node would reside in the Process struct.

Note that start_join_node() is also a method of Process.

I personally prefer how it is now, since it makes execute_join_node() (and others) very clean. Needing to fetch the children there would obfuscate that nice high-level view of what executing a JOIN node looks like.

@@ -172,7 +173,7 @@ fn test_falcon512_probabilistic_product_failure() {
expect_exec_error!(
test,
ExecutionError::FailedAssertion {
clk: 17490,
clk: 31615,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I changed the hardcoded value, as I expect the increase in clock cycles needed to be related to the fact that we don't merge basic blocks anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting! So, basically, because we are no longer merging basic blocks the cycle count went up almost 2x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, confirmed. I ran the test before and after the removal and combine_basic_blocks(), and the cycle count jumped 2x. Pretty massive!

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left some comments inline - most are pretty small, and the bigger ones can probably be addressed in the future PR.

One thing I'm wondering: how did this affect program execution time (e.g., for running a Blake3 hash example for 200 iterations). Given some info you listed in one of the previous comments, I'm expecting cycle count and execution time to double (because we are no longer merging basic blocks) - but would be good to confirm this.

core/src/mast/errors.rs Outdated Show resolved Hide resolved
core/Cargo.toml Outdated
Comment on lines 34 to 37
miette = { version = "7.1.0", git = "https://github.com/bitwalker/miette", branch = "no-std", default-features = false, features = [
"fancy-no-syscall",
"derive",
] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is this needed for pretty-printing? If so, should we put it into miden-formatting somehow? (not in this PR).

Copy link
Contributor Author

@plafer plafer Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only needed for here (and another similar place), but this would be better answered by @bitwalker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The miette crate is used for error handling (i.e. defining error types), and also exports the trait referenced by @plafer's link (WrapErr). That stuff is all re-exported from the miden-assembly crate though (in the diagnostics namespace), so a direct dependency on miette is only needed when miden-assembly isn't a dependency.

I believe the reason it is added to core here is that @plafer defined a new error type using the new diagnostics infrastructure based on miette, but core can't depend on miden-assembly itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the dependency miette dependency from core: d5c14c6

miden/README.md Outdated Show resolved Hide resolved
miden/src/examples/blake3.rs Outdated Show resolved Hide resolved
miden/src/examples/fibonacci.rs Outdated Show resolved Hide resolved
assembly/src/assembler/context.rs Outdated Show resolved Hide resolved
Comment on lines 29 to +30
ctx: &mut AssemblyContext,
) -> Result<Option<CodeBlock>, AssemblyError> {
mast_forest: &mut MastForest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: should mast_forest be a part of the AssemblyContext? Seems like most of the time they are passed around together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood AssemblyContext to be more about metadata, and so put MastForest in Assembler instead. But I'd be curious to see what @bitwalker thinks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about it as more like "anything that's needed during assembly of a specific program but doesn't need to be persisted after that." So, in my mind, MAST of the program being currently assembled would go into AssemblyContext - but also, we can come back to this in the next PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stuff that can be reused across compilations should go in Assembler IMO, so as to avoid recomputing a bunch of stuff we already know - but the degree to which that is used/useful at the moment is pretty limited, as we are typically instantiating/compiling/discarding the Assembler for each program individually in one go. The AssemblyContext is, as @plafer pointed out, mostly about configuration and what not for a single invocation of the Assembler APIs.

In my opinion, we should define how the Assembler is meant to be used (single-use vs multi-use compilation, the latter being useful when compiling many programs that use the same, or many of the same, dependencies). Once that decision is made, we can clean up the Assembler API and tailor it for how it is meant to be used. Right now it is having a bit of an identity crisis.

IMO, we should probably just make the Assembler single-use, and get rid of AssemblyContext. If we want to support the multiple-use case, we probably want to move more stuff into the AssemblyContext.

assembly/src/assembler/mod.rs Show resolved Hide resolved
assembly/src/assembler/mod.rs Show resolved Hide resolved
assembly/src/assembler/mod.rs Show resolved Hide resolved
@bobbinth
Copy link
Contributor

Oh - and let's also update the changelog.

@plafer
Copy link
Contributor Author

plafer commented Jun 16, 2024

Oh - and let's also update the changelog.

Done.

Given some info you listed in one of the previous comments, I'm expecting cycle count and execution time to double (because we are no longer merging basic blocks) - but would be good to confirm this.

It can get much worse than 2x. I think the worst case scenario is with repeat, where all iterations used to get combined into a single basic block, but not instead each iteration is a separate basic block (joined by a tree of JOINs).

Our generate_fibonacci_program() is exactly this. For a repeat.16 instantiation of the program,

  • before PR: 50 steps (64 steps, 21% padded)
  • after PR: 235 (256 steps, 8% padded)

A ~5x increase in pre-padding trace length.

@plafer
Copy link
Contributor Author

plafer commented Jun 16, 2024

@bitwalker was the purpose of this line (#[cfg(feature = "nope")]) to disable the tests? Currently, the code in there no longer compiles, as it is not tracked neither by the compiler nor rust-analyzer. Should I remove it?

@bobbinth
Copy link
Contributor

I think all looks good here. There are a couple of small questions for @bitwalker (i.e., this one and this one) - but other than that, we should be good to merge.

@bitwalker
Copy link
Contributor

bitwalker commented Jun 18, 2024

@bitwalker was the purpose of this line (#[cfg(feature = "nope")]) to disable the tests? Currently, the code in there no longer compiles, as it is not tracked neither by the compiler nor rust-analyzer. Should I remove it?

@plafer I'm actually not 100% sure why that is still in there, I use that fake feature trick to disable a section of code temporarily, but never as a permanent thing. That said, I'm pretty sure the end goal was to remove that code entirely, because the serialization referred to there is basically DOA - we're switching to MAST, so serializing the Miden Assembly syntax tree isn't important. In fact, we probably should just remove all of the serialization-related code from the assembly crate once the MAST refactoring is complete, for now it is still used by the code that emits .masl files (which itself is going away, once the new package format is finalized).

@plafer
Copy link
Contributor Author

plafer commented Jun 18, 2024

I'm actually not 100% sure why that is still in there, I use that fake feature trick to disable a section of code temporarily, but never as a permanent thing.

Removed the disabled tests; we can remove any other test that still needs to be removed in the next PR

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I just left a couple of small comments. Also, let's make the CI green.

miden/README.md Outdated Show resolved Hide resolved
miden/src/repl/mod.rs Outdated Show resolved Hide resolved
miden/src/tools/mod.rs Outdated Show resolved Hide resolved
@bobbinth bobbinth merged commit 34fde66 into next Jun 18, 2024
15 checks passed
@bobbinth bobbinth deleted the plafer-table-based-mast branch June 18, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants