-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/refactor: general improvements to ir, workspace, and implementation of new analyses/transforms #18
Conversation
This commit contains a number of changes intended to lower the impedance mismatch between Miden IR and Miden Assembly, while also providing instructions that will aid in lowering from WebAssembly to Miden IR. In addition to those sorts of changes, there are also a few other general quality of life improvements. * Identifiers are now interned symbols to reduce unnecessary allocations and to make identifiers more suitable for use in hash tables in place of arbitrary generated 32-bit ids. * There is now a `Program` structure, intended to be the top-level container for compiled programs in the IR. * Support for global variables, and various forms of global variable access patterns common in frontend languages. These are linked together in a `Program` prior to code generation. * Some changes to the instruction set: * More precise integer immediate opcodes * Added a variant of `ret` for returning immediates * Added `incr`, i.e. the increment operation * Added `unreachable`, i.e. an assertion that always fails when an unreachable code section is entered * Added `syscall`, a call instruction for invoking kernel functions in Miden Assembly * Added `memory.grow`, intended to be the equivalent of the Wasm instruction * Added an instruction to represent global ops * Removal of the `assert.test` instruction, as it has no corollary in MASM * Removal of the `addrof` instruction, replaced with `alloca`, which is used to allocate a temporary of a given type, returning its address. This removal is due to the fact that there is no meaningful translation of `addrof` to MASM. * There are number of changes to the `Module` and `Function` APIs, either to reduce duplication, incorporate some of the changes mentioned above, or to better support parallelization of the compiler: * Functions can be built independently of a module and added later, and modules can likewise be built independently of a program and added later. * Function signatures now incorporate more information about the behavior of parameters and results, and the `Visibility` type was replaced with a more general concept of `Linkage`. * Function calling conventions have been extended with `Fast` and `Kernel` conventions. The latter is of particular note, as it makes possible the definition of kernel modules in Miden IR, as well as the ability to perform syscalls. * Both modules and functions are now kernel-aware, i.e. validation will ensure that kernel functions are defined in kernel modules, and that kernel modules do not export functions without the kernel calling convention. * There is a new ModuleBuilder API intended to make imperatively constructing a `Module` and its functions more fluent, and provide some additional conveniences useful in certain situations, such as testing. * Support for arbitrary constant data has been added, currently only for use with globals, but can be exposed for use in instructions in later iterations. There are a number of very small changes that are too numerous to list here, but are also unlikely to be noticeable unless you are working with the internals of the IR.
This commit contains the implementations for a set of transformation passes intended to prepare Miden IR for stackification/code generation. * `SplitCriticalEdges`, does what it says on the tin; it splits critical edges in the control flow graph by introducing new blocks between a predecessor block with multiple successors and a successor with multiple predecessors. This eases analysis of the control flow graph. * `Treeify`, this converts a control flow graph (a directed, acyclic graph) and ensures that it is a tree by duplicating subtrees of the graph as needed, such that no block has more than a single predecessor. This transformation does not modify loop headers however, as by definition those introduce cycles in the graph. That suits us just fine though, as the purpose here is to ensure that the control flow graph for a function can be trivially lowered to Miden Assembly, which does not have jumps, and thus requires programs to form a tree. We handle the translation of loops using the high-level looping ops in Miden Assembly - as long as the body of the loop is a tree, we're good. * `InlineBlocks`, is applied after `Treeify` to simplify the control flow graph, by removing redundant blocks/branches which were either introduced in the original IR, as a result of critical edge splitting, or due to duplicating blocks during treeification that were previously join points in the CFG, but aren't anymore.
c9f94a1
to
f80bcf5
Compare
Sorry, but this isn't going to happen. The PR contains around 10,000 lines of code, and there is just no way this gets reviewed properly in a day (especially not a day when github has issues). You can have a proper review in the time it takes to do a proper review, or you can spend some time separating the PR into multiple, smaller PRs (which is Best Practice). Also, do you mind describing the actual changes this PR makes, so that I have a proper chance of figuring out what is going on in the code? It is not particularly helpful to refer to it as "a big set of changes containing much of my work from the last month or so" - not only is that not helpful to the reviewer, but we also need the description in the commit history so that we can track the reasons why changes were made. |
That's fine, we should merge this anyway. I was quite clear that these changes are meant to be merged without an expectation of in-depth review, and that any issues raised here, whether before or after the changes are merged, would be addressed in subsequent bug reports/PRs. The entire point was to get us back in a cycle of small changes, ASAP.
I have no intention of breaking up this PR into smaller ones - I'm fine with these changes as-is, at least at this point in time. To the extent that there are issues with the code here, I expect those to be raised as bugs; addressed with subsequent PRs that introduce missing or stubbed-out functionality (e.g. the linker); or are addressed in the course of making more general code quality improvements, after things stabilize and we have time to polish. Whether you wish to do a proper review is your prerogative, but that was not the purpose of this PR. I would prefer that you sign off, and review at your leisure, so that @greenhat and I can start working on merging our streams of work together as soon as possible - but if you do not want to do that, then another alternative to consider is pairing up and walking through the changes together to try and simplifiy the process, since having me on hand to explain and provide context will save a lot of turnaround time, and likely be more useful to you anyway. Aside from that, I think you'll need to justify your reticence to merge this PR, aside from questions of process, if you expect that your review will push beyond this week to complete. If need be, I'll merge these changes directly, but I'm only inclined to do that if it starts blocking other work that is in progress. For the time being, I appreciate any review you have time to provide.
You'll have to be more specific - the individual commit messages are descriptive, in some cases quite verbose, and the code itself is well documented, especially the analyses and transforms. I also pointed out in my initial PR description the specific areas relevant to what you are working on that you may want to look at. What areas of the code in particular are you finding unclear and could be improved with additional documentation? In an effort to save time and perhaps anticipate some contextual questions you may have, here is some additional (perhaps redundant) summary:
If I've left something out, feel free to ask questions. |
I rebased my branch on this one and I'm going through the process of exploring your changes and adopting my code. I expect to finish it today (your morning) and provide feedback. |
Awesome! Let me know if you have any questions, identify any bugs, or find anything missing/awkward to work with. We can address those changes/improvements in subsequent PRs, but I'll make sure to prioritize anything that is a blocker. |
I adopted my code and fixed the build and tests (except the test for unsupported ops). I pushed my branch in my PR #17
Also, I see binary and comparison ops expect only integers now. Makes total sense. Following our call, I think it's better to remove all the float ops translations and return the unsupported error. Sound good? Besides that, I did not find anything missing or awkward. I have not tried to implement globals and data segments yet. That would be my next move after I fix some minor todos I left in the code on rebase. |
Yep, that sounds good for now, we'll probably need to add dedicated floating point operations anyway when it comes time to support that in the future, if we ever decide to do so. As an aside, you probably noticed, but just in case it got lost in the sea of changes, I did add support for a few of the unsupported ops you had listed, namely
We should probably discuss data segments before you dig in to the implementation there, shoot me a message when you have time, and we can set up a quick call to work through it quick, make sure we have what we need, and that it will play nice with how things are going to be represented in MASM. As for global variables, I think everything you need should be there now, but there may be some kinks to work out, or improvements to the ergonomics of the APIs we can make. I should note that a |
Thanks for reminding me about
They seem like a "nice to have" ops.
Sure! Let me clean up my todo list (new ops, globals, etc.), and I'll ping you when I get close to it.
Sounds good! I'll dig into it. |
I've pushed a commit that expands our type system with signed and unsigned equivalents, all of the standard operators (comparison, arithmetic, etc.) will be compiled based on the semantics of the types involved. I still have to work out what to do about signed integers in general - our only native option in MASM for signed integers are field elements, the support for u32 and u64 is only useful for unsigned ops, as you'd expect. That probably means that we either need to implement our own primitives for representing, say, signed 32-bit integers, or promote all signed 32-bit integers to field elements, but that comes with its own complexities. I suspect for now we'll need some combination of the two, i.e. using field elements for signed integers, with some additional generated code around ops involving them to protect the range of the type in question. I think we can probably kick this can down the road a bit though.
I think from your end, you can use
I've got these on my todo list, because I think I'll need them for some of the low-level stuff I'm doing in the address space translation layer, but not sure exactly when I'll get them implemented. I'll keep you posted.
I'm assuming this is a standard arithmetic shift (i.e. sign-extending the high bits after shifting). There isn't native support for this in MIden, but this is something we can emit an instruction sequence for pretty easily, just like we will for If you see any instructions coming through from Rust-compiled Wasm modules that are blocking simple programs, can you leave a comment on #19? We can track what is unsupported, but wanted there. |
Nice! I'll rebase and try them out.
Yeah,
Yep, here's the spec for it:
Sure. Will do. |
@@ -127,6 +127,13 @@ impl DataFlowGraph { | |||
self.values[v].ty() | |||
} | |||
|
|||
pub fn value_span(&self, v: Value) -> SourceSpan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also ValueData::span()
, which now seems redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, since it only provides the span for a single variant, it probably can go away
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, as agreed in today's meeting.
This is a big set of changes containing much of my work from the last month or so that I would like to get merged today/tomorrow. See the individual commits for information on the changes contained within.
This is going to get merged pretty much as-is, and iterated on further in more targeted PRs later. That said, feel free to use this opportunity to leave comments, ask questions, etc.
NOTE: There are still a few minor details that are still work-in-progress. Namely the
Program
andLinker
structs, which go together, and are largely stubbed out at the moment until I finish up work on the codegen crate, which is not present in this changeset. That will follow in its own PR in the next couple days.@greenhat This likely affects you in a couple (relatively minor) ways - I'm most interested in your feedback on whether there are things you'd like to see, or that you find awkward. Some of the public interfaces, particularly around the builders are not super refined yet, so this is definitely the time to tweak how those work.
@jjcnn There are a few small changes to how the IR gets output (we have additional data that needs to be represented now), let me know if you have any questions about that when it comes to the parser for the IR. Some of the changes should have actually made it easier for constructing the IR in the parser, since there is basically no dependency between a
Function
and its parentModule
anymore. The main thing to be aware of is the newIdent
,FunctionIdent
, andSymbol
types, used for identifiers/interned strings.