feat/refactor: general improvements to ir, workspace, and implementation of new analyses/transforms #18

bitwalker · 2023-09-04T21:31:28Z

This is a big set of changes containing much of my work from the last month or so that I would like to get merged today/tomorrow. See the individual commits for information on the changes contained within.

This is going to get merged pretty much as-is, and iterated on further in more targeted PRs later. That said, feel free to use this opportunity to leave comments, ask questions, etc.

NOTE: There are still a few minor details that are still work-in-progress. Namely the Program and Linker structs, which go together, and are largely stubbed out at the moment until I finish up work on the codegen crate, which is not present in this changeset. That will follow in its own PR in the next couple days.

@greenhat This likely affects you in a couple (relatively minor) ways - I'm most interested in your feedback on whether there are things you'd like to see, or that you find awkward. Some of the public interfaces, particularly around the builders are not super refined yet, so this is definitely the time to tweak how those work.

@jjcnn There are a few small changes to how the IR gets output (we have additional data that needs to be represented now), let me know if you have any questions about that when it comes to the parser for the IR. Some of the changes should have actually made it easier for constructing the IR in the parser, since there is basically no dependency between a Function and its parent Module anymore. The main thing to be aware of is the new Ident, FunctionIdent, and Symbol types, used for identifiers/interned strings.

This commit contains a number of changes intended to lower the impedance mismatch between Miden IR and Miden Assembly, while also providing instructions that will aid in lowering from WebAssembly to Miden IR. In addition to those sorts of changes, there are also a few other general quality of life improvements. * Identifiers are now interned symbols to reduce unnecessary allocations and to make identifiers more suitable for use in hash tables in place of arbitrary generated 32-bit ids. * There is now a `Program` structure, intended to be the top-level container for compiled programs in the IR. * Support for global variables, and various forms of global variable access patterns common in frontend languages. These are linked together in a `Program` prior to code generation. * Some changes to the instruction set: * More precise integer immediate opcodes * Added a variant of `ret` for returning immediates * Added `incr`, i.e. the increment operation * Added `unreachable`, i.e. an assertion that always fails when an unreachable code section is entered * Added `syscall`, a call instruction for invoking kernel functions in Miden Assembly * Added `memory.grow`, intended to be the equivalent of the Wasm instruction * Added an instruction to represent global ops * Removal of the `assert.test` instruction, as it has no corollary in MASM * Removal of the `addrof` instruction, replaced with `alloca`, which is used to allocate a temporary of a given type, returning its address. This removal is due to the fact that there is no meaningful translation of `addrof` to MASM. * There are number of changes to the `Module` and `Function` APIs, either to reduce duplication, incorporate some of the changes mentioned above, or to better support parallelization of the compiler: * Functions can be built independently of a module and added later, and modules can likewise be built independently of a program and added later. * Function signatures now incorporate more information about the behavior of parameters and results, and the `Visibility` type was replaced with a more general concept of `Linkage`. * Function calling conventions have been extended with `Fast` and `Kernel` conventions. The latter is of particular note, as it makes possible the definition of kernel modules in Miden IR, as well as the ability to perform syscalls. * Both modules and functions are now kernel-aware, i.e. validation will ensure that kernel functions are defined in kernel modules, and that kernel modules do not export functions without the kernel calling convention. * There is a new ModuleBuilder API intended to make imperatively constructing a `Module` and its functions more fluent, and provide some additional conveniences useful in certain situations, such as testing. * Support for arbitrary constant data has been added, currently only for use with globals, but can be exposed for use in instructions in later iterations. There are a number of very small changes that are too numerous to list here, but are also unlikely to be noticeable unless you are working with the internals of the IR.

This commit contains the implementations for a set of transformation passes intended to prepare Miden IR for stackification/code generation. * `SplitCriticalEdges`, does what it says on the tin; it splits critical edges in the control flow graph by introducing new blocks between a predecessor block with multiple successors and a successor with multiple predecessors. This eases analysis of the control flow graph. * `Treeify`, this converts a control flow graph (a directed, acyclic graph) and ensures that it is a tree by duplicating subtrees of the graph as needed, such that no block has more than a single predecessor. This transformation does not modify loop headers however, as by definition those introduce cycles in the graph. That suits us just fine though, as the purpose here is to ensure that the control flow graph for a function can be trivially lowered to Miden Assembly, which does not have jumps, and thus requires programs to form a tree. We handle the translation of loops using the high-level looping ops in Miden Assembly - as long as the body of the loop is a tree, we're good. * `InlineBlocks`, is applied after `Treeify` to simplify the control flow graph, by removing redundant blocks/branches which were either introduced in the original IR, as a result of critical edge splitting, or due to duplicating blocks during treeification that were previously join points in the CFG, but aren't anymore.

jjcnn · 2023-09-05T20:57:09Z

I would like to get merged today/tomorrow.

Sorry, but this isn't going to happen. The PR contains around 10,000 lines of code, and there is just no way this gets reviewed properly in a day (especially not a day when github has issues).

You can have a proper review in the time it takes to do a proper review, or you can spend some time separating the PR into multiple, smaller PRs (which is Best Practice).

Also, do you mind describing the actual changes this PR makes, so that I have a proper chance of figuring out what is going on in the code? It is not particularly helpful to refer to it as "a big set of changes containing much of my work from the last month or so" - not only is that not helpful to the reviewer, but we also need the description in the commit history so that we can track the reasons why changes were made.

bitwalker · 2023-09-06T04:50:26Z

Sorry, but [review by today/tomorrow] isn't going to happen

That's fine, we should merge this anyway. I was quite clear that these changes are meant to be merged without an expectation of in-depth review, and that any issues raised here, whether before or after the changes are merged, would be addressed in subsequent bug reports/PRs. The entire point was to get us back in a cycle of small changes, ASAP.

You can have a proper review in the time it takes to do a proper review, or you can spend some time separating the PR into multiple, smaller PRs (which is Best Practice).

I have no intention of breaking up this PR into smaller ones - I'm fine with these changes as-is, at least at this point in time. To the extent that there are issues with the code here, I expect those to be raised as bugs; addressed with subsequent PRs that introduce missing or stubbed-out functionality (e.g. the linker); or are addressed in the course of making more general code quality improvements, after things stabilize and we have time to polish. Whether you wish to do a proper review is your prerogative, but that was not the purpose of this PR. I would prefer that you sign off, and review at your leisure, so that @greenhat and I can start working on merging our streams of work together as soon as possible - but if you do not want to do that, then another alternative to consider is pairing up and walking through the changes together to try and simplifiy the process, since having me on hand to explain and provide context will save a lot of turnaround time, and likely be more useful to you anyway. Aside from that, I think you'll need to justify your reticence to merge this PR, aside from questions of process, if you expect that your review will push beyond this week to complete. If need be, I'll merge these changes directly, but I'm only inclined to do that if it starts blocking other work that is in progress. For the time being, I appreciate any review you have time to provide.

Also, do you mind describing the actual changes this PR makes, so that I have a proper chance of figuring out what is going on in the code?

You'll have to be more specific - the individual commit messages are descriptive, in some cases quite verbose, and the code itself is well documented, especially the analyses and transforms. I also pointed out in my initial PR description the specific areas relevant to what you are working on that you may want to look at. What areas of the code in particular are you finding unclear and could be improved with additional documentation?

In an effort to save time and perhaps anticipate some contextual questions you may have, here is some additional (perhaps redundant) summary:

To the extent that there is a notion of a complete feature(s) here, it is in the implementation of the analyses and transforms that operate on the IR, which are well documented. If there are critical docs missing, let me know and I'll make sure to add them. The transforms in particular are pre-requisites for the stackification pass, which will come in a follow-on PR.
Everything else is either something required by the Wasm frontend, or falls out from working out the kinks in the IR that was initially sketched out; mostly as a result of working through lowering the full instruction set to Miden Assembly.
Support for globals is required by Wasm (and many languages more generally). Those in turn require us to introduce a linking step, in order to de-duplicate and lay out globals on the heap prior to code generation, since we need to translate access of global symbols into absolute addresses. There is no native notion of globals in Miden, so we are implementing it in software for now.
Most changes to Module and Function are intended to make it easier to parellelize the compiler, allowing one to compile using a function per-thread, or a module per-thread, or a combination of the two, while making it easier to link them all together later.
There are changes to Signature to better represent certain low-level aspects of arguments and results, describe linkage, and to make the signature itself independent of function name
The Visibility bitflags were removed in favor of the Linkage enum, which is used for both functions and globals.
ExternalFunction was introduced to represent a dependency on another function, from within a Function. It consists of just the name and signature, and corresponds to a function declaration, rather than definition.
Function is now always created with an entry block that matches the function signature, as it was never necessary to represent bodyless functions anyway.
There is no purpose in explaining each of the instruction changes, because the initial instruction set was a rough draft to begin with - with a number of missing operations, insufficient specificity, incorrect typing rules, etc. However, some of these changes are mentioned in the relevant commit message. Most of them are obvious, but if you have specific changes you are curious about, I'm happy to provide more details.
The introduction of the ModuleBuilder is intended to make imperatively building a module and it's functions more concise, as well as provide additional validation when doing so. This is primarily for the benefit of constructing IR for testing, but obviously it is useful more generally as well. FunctionBuilder still exists, and can be used either to build up a function from scratch, or modify an already constructed function.
A significant number of asserts/validations were added to the InstBuilderBase trait, to protect against the construction of invalid IR. Some supporting macros were added to make expressing common validations easier.
Symbol/string interning was added to make it possible to use natural identifiers to reference modules/functions efficiently, rather than arbitrary integer ids, while retaining the ability to get the string representation, order by name, etc. All uses of String for identifiers was replaced with an appropriate interned equivalent. This also was important for decoupling Module and Function for multi-threaded compilation.
Our Rust toolchain was out of date, that was updated to the latest stable release

If I've left something out, feel free to ask questions.

greenhat · 2023-09-06T05:23:46Z

@greenhat This likely affects you in a couple (relatively minor) ways - I'm most interested in your feedback on whether there are things you'd like to see, or that you find awkward. Some of the public interfaces, particularly around the builders are not super refined yet, so this is definitely the time to tweak how those work.

I rebased my branch on this one and I'm going through the process of exploring your changes and adopting my code. I expect to finish it today (your morning) and provide feedback.

bitwalker · 2023-09-06T06:01:10Z

I rebased my branch on this one and I'm going through the process of exploring your changes and adopting my code. I expect to finish it today (your morning) and provide feedback.

Awesome! Let me know if you have any questions, identify any bugs, or find anything missing/awkward to work with. We can address those changes/improvements in subsequent PRs, but I'll make sure to prioritize anything that is a blocker.

greenhat · 2023-09-06T15:03:14Z

I rebased my branch on this one and I'm going through the process of exploring your changes and adopting my code. I expect to finish it today (your morning) and provide feedback.

Awesome! Let me know if you have any questions, identify any bugs, or find anything missing/awkward to work with. We can address those changes/improvements in subsequent PRs, but I'll make sure to prioritize anything that is a blocker.

I adopted my code and fixed the build and tests (except the test for unsupported ops). I pushed my branch in my PR #17
I had to make two changes to your code (see comments in my PR):

Get FunctionBuilder from ModuleFunctionBuilder Wasm -> Miden IR translation #17 (review)
Fix type mismatch for Load Miden IR op Wasm -> Miden IR translation #17 (review)

Also, I see binary and comparison ops expect only integers now. Makes total sense. Following our call, I think it's better to remove all the float ops translations and return the unsupported error. Sound good?

Besides that, I did not find anything missing or awkward. I have not tried to implement globals and data segments yet. That would be my next move after I fix some minor todos I left in the code on rebase.

bitwalker · 2023-09-06T15:37:17Z

Following our call, I think it's better to remove all the float ops translations and return the unsupported error. Sound good?

Yep, that sounds good for now, we'll probably need to add dedicated floating point operations anyway when it comes time to support that in the future, if we ever decide to do so.

As an aside, you probably noticed, but just in case it got lost in the sea of changes, I did add support for a few of the unsupported ops you had listed, namely unreachable and select. I can't recall if there were any others. I need to dig into the semantics of some of the Wasm instructions in the unsupported list to determine whether or not they can be handled with the instructions that are there now, or if we need new ones, but I don't know that any of them are particularly critical to implement at this point, but let me know if you think there are some big ones that would be nice to have.

Besides that, I did not find anything missing or awkward. I have not tried to implement globals and data segments yet. That would be my next move after I fix some minor todos I left in the code on rebase.

We should probably discuss data segments before you dig in to the implementation there, shoot me a message when you have time, and we can set up a quick call to work through it quick, make sure we have what we need, and that it will play nice with how things are going to be represented in MASM.

As for global variables, I think everything you need should be there now, but there may be some kinks to work out, or improvements to the ergonomics of the APIs we can make. I should note that a Function has no notion of the actual global variable data itself, instead it has a notion of a global value, like in Cranelift, and global variables are actually defined/exported at the Module level. The linker, which isn't fully implemented yet, will be responsible for stitching together the global data, validating that all symbols are present, de-duplicating, that sort of thing. Then, during code generation, we'll compute absolute addresses to the global data based on the layout chosen by the linker. One of the things you can assume is that all global variables will be laid out in memory so that their addresses are either word- or felt-aligned, so that we can always use aligned loads/stores with them. We may even be able to use native pointer types with them, but that's an optimization we can look into later.

greenhat · 2023-09-07T04:02:44Z

As an aside, you probably noticed, but just in case it got lost in the sea of changes, I did add support for a few of the unsupported ops you had listed, namely unreachable and select. I can't recall if there were any others. I need to dig into the semantics of some of the Wasm instructions in the unsupported list to determine whether or not they can be handled with the instructions that are there now, or if we need new ones, but I don't know that any of them are particularly critical to implement at this point, but let me know if you think there are some big ones that would be nice to have.

Thanks for reminding me about unreachable and select! I added them to my PR's todo list. I've scanned the unsupported list, and the following ops caught my eye:

signed integer comparison ops: gt_s, le_s, etc.
i32.wrap_i64 (i64_val mod 2^32 should work, so probably no need for separate IR op);
ctz, clz;
shr_s (bitwise shift right signed).

They seem like a "nice to have" ops.

We should probably discuss data segments before you dig in to the implementation there, shoot me a message when you have time, and we can set up a quick call to work through it quick, make sure we have what we need, and that it will play nice with how things are going to be represented in MASM.

Sure! Let me clean up my todo list (new ops, globals, etc.), and I'll ping you when I get close to it.

As for global variables, I think everything you need should be there now, but there may be some kinks to work out, or improvements to the ergonomics of the APIs we can make. I should note that a Function has no notion of the actual global variable data itself, instead it has a notion of a global value, like in Cranelift, and global variables are actually defined/exported at the Module level. The linker, which isn't fully implemented yet, will be responsible for stitching together the global data, validating that all symbols are present, de-duplicating, that sort of thing. Then, during code generation, we'll compute absolute addresses to the global data based on the layout chosen by the linker. One of the things you can assume is that all global variables will be laid out in memory so that their addresses are either word- or felt-aligned, so that we can always use aligned loads/stores with them. We may even be able to use native pointer types with them, but that's an optimization we can look into later.

Sounds good! I'll dig into it.

bitwalker · 2023-09-07T04:50:51Z

I've scanned the unsupported list, and the following ops caught my eye:
..
signed integer comparison ops: gt_s, le_s, etc.

I've pushed a commit that expands our type system with signed and unsigned equivalents, all of the standard operators (comparison, arithmetic, etc.) will be compiled based on the semantics of the types involved. I still have to work out what to do about signed integers in general - our only native option in MASM for signed integers are field elements, the support for u32 and u64 is only useful for unsigned ops, as you'd expect. That probably means that we either need to implement our own primitives for representing, say, signed 32-bit integers, or promote all signed 32-bit integers to field elements, but that comes with its own complexities. I suspect for now we'll need some combination of the two, i.e. using field elements for signed integers, with some additional generated code around ops involving them to protect the range of the type in question. I think we can probably kick this can down the road a bit though.

i32.wrap_i64 (i64_val mod 2^32 should work, so probably no need for separate IR op);

I think from your end, you can use trunc for this, since I think the semantics are basically discarding the upper 32-bits; but I'm just throwing that out there off-hand, it may turn out we need a dedicated opcode for this.

ctz, clz

I've got these on my todo list, because I think I'll need them for some of the low-level stuff I'm doing in the address space translation layer, but not sure exactly when I'll get them implemented. I'll keep you posted.

shr_s (bitwise shift right signed)

I'm assuming this is a standard arithmetic shift (i.e. sign-extending the high bits after shifting). There isn't native support for this in MIden, but this is something we can emit an instruction sequence for pretty easily, just like we will for ctz/clz and others. I think for now we can leave this unsupported, but adding support for it should be straightforward when we have a bit of time for it.

If you see any instructions coming through from Rust-compiled Wasm modules that are blocking simple programs, can you leave a comment on #19? We can track what is unsupported, but wanted there.

greenhat · 2023-09-07T10:33:10Z

I've scanned the unsupported list, and the following ops caught my eye:
..
signed integer comparison ops: gt_s, le_s, etc.

I've pushed a commit that expands our type system with signed and unsigned equivalents, all of the standard operators (comparison, arithmetic, etc.) will be compiled based on the semantics of the types involved. I still have to work out what to do about signed integers in general - our only native option in MASM for signed integers are field elements, the support for u32 and u64 is only useful for unsigned ops, as you'd expect. That probably means that we either need to implement our own primitives for representing, say, signed 32-bit integers, or promote all signed 32-bit integers to field elements, but that comes with its own complexities. I suspect for now we'll need some combination of the two, i.e. using field elements for signed integers, with some additional generated code around ops involving them to protect the range of the type in question. I think we can probably kick this can down the road a bit though.

Nice! I'll rebase and try them out.

i32.wrap_i64 (i64_val mod 2^32 should work, so probably no need for separate IR op);

I think from your end, you can use trunc for this, since I think the semantics are basically discarding the upper 32-bits; but I'm just throwing that out there off-hand, it may turn out we need a dedicated opcode for this.

Yeah, trunc should do the job.

shr_s (bitwise shift right signed)

I'm assuming this is a standard arithmetic shift (i.e. sign-extending the high bits after shifting). There isn't native support for this in MIden, but this is something we can emit an instruction sequence for pretty easily, just like we will for ctz/clz and others. I think for now we can leave this unsupported, but adding support for it should be straightforward when we have a bit of time for it.

Yep, here's the spec for it:
https://webassembly.github.io/JS-BigInt-integration/core/exec/numerics.html#op-ishr-s

If you see any instructions coming through from Rust-compiled Wasm modules that are blocking simple programs, can you leave a comment on #19? We can track what is unsupported, but wanted there.

Sure. Will do.

greenhat · 2023-09-07T12:36:52Z

hir/src/dataflow.rs

@@ -127,6 +127,13 @@ impl DataFlowGraph {
        self.values[v].ty()
    }

+    pub fn value_span(&self, v: Value) -> SourceSpan {


There is also ValueData::span(), which now seems redundant.

Yeah, since it only provides the span for a single variant, it probably can go away

jjcnn

Approved, as agreed in today's meeting.

bitwalker added 7 commits August 27, 2023 13:33

chore: move cfg to analysis module

d38644c

feat: add preorder dominance tree and dominance frontier analyses

16f26f1

feat: add infra to support decorating textual format

7e8b097

feat: implement liveness analysis

72df4f4

chore: move ir to hir

f32d35b

chore: split up hir crate

1c54a52

feat: add hir-symbol crate

212f3fc

bitwalker self-assigned this Sep 4, 2023

bitwalker added 6 commits September 5, 2023 01:02

feat: implement select instruction

991c160

feat: distinguish bitwise vs logical boolean operations

6f37dc8

fix: hir and hir-analysis tests

2635987

ci: update toolchain

f80bcf5

bitwalker force-pushed the bitwalker/wip branch from c9f94a1 to f80bcf5 Compare September 5, 2023 05:04

ci: update cargo.lock

ee07d3c

bitwalker requested a review from jjcnn September 5, 2023 15:34

feat: distinguish signed/unsigned types, native/emulated pointers

bf684c1

bitwalker mentioned this pull request Sep 6, 2023

Implement liveness analysis #13

Closed

3 tasks

feat: add dataflow apis for getting value/inst spans

d1a1724

greenhat reviewed Sep 7, 2023

View reviewed changes

jjcnn approved these changes Sep 7, 2023

View reviewed changes

bitwalker merged commit 62598b1 into main Sep 7, 2023
2 checks passed

bitwalker deleted the bitwalker/wip branch September 7, 2023 20:43

bitwalker restored the bitwalker/wip branch September 8, 2023 14:13

greenhat mentioned this pull request Sep 9, 2023

Wasm -> Miden IR translation #22

Merged

bitwalker mentioned this pull request Oct 16, 2023

Implement compiler executable #30

Closed

bitwalker deleted the bitwalker/wip branch September 6, 2024 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/refactor: general improvements to ir, workspace, and implementation of new analyses/transforms #18

feat/refactor: general improvements to ir, workspace, and implementation of new analyses/transforms #18

bitwalker commented Sep 4, 2023

jjcnn commented Sep 5, 2023

bitwalker commented Sep 6, 2023

greenhat commented Sep 6, 2023

bitwalker commented Sep 6, 2023

greenhat commented Sep 6, 2023

bitwalker commented Sep 6, 2023

greenhat commented Sep 7, 2023

bitwalker commented Sep 7, 2023

greenhat commented Sep 7, 2023

greenhat Sep 7, 2023

bitwalker Sep 7, 2023

jjcnn left a comment

feat/refactor: general improvements to ir, workspace, and implementation of new analyses/transforms #18

feat/refactor: general improvements to ir, workspace, and implementation of new analyses/transforms #18

Conversation

bitwalker commented Sep 4, 2023

jjcnn commented Sep 5, 2023

bitwalker commented Sep 6, 2023

greenhat commented Sep 6, 2023

bitwalker commented Sep 6, 2023

greenhat commented Sep 6, 2023

bitwalker commented Sep 6, 2023

greenhat commented Sep 7, 2023

bitwalker commented Sep 7, 2023

greenhat commented Sep 7, 2023

greenhat Sep 7, 2023

Choose a reason for hiding this comment

bitwalker Sep 7, 2023

Choose a reason for hiding this comment

jjcnn left a comment

Choose a reason for hiding this comment