-
Notifications
You must be signed in to change notification settings - Fork 13
Conversation
…e than 6 arguments
This lays the groundwork for other on-the-fly optimisations, like passing literals through in order to do const folding in linear time, while compiling.
…rbage values from the stack)
…al if its value will be changed Currently the implementation of materializing locals causes compilation to be non-linear in degenerate cases
…DO comments (v. important)
fc6d6c7
to
ab8c6e1
Compare
ab8c6e1
to
f1d9ccb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now made an initial scan through the code, with a few comments inline, and one organizational issue here. My initial impression is that microwasm looks simple and flexible, and I'm interested to see where it goes.
This PR also removes the work-in-progress support for standard wasm, which is unfortunate. One of the things I think would be really valuable from microwasm is the ability to directly compare microwasm with wasm along various dimensions. Also, many potential Wasmtime users will need to support standard wasm for the foreseeable future. You're welcome to your own priorities, so I would just ask if it's at least feasible to organize the code such that standard wasm could be supported as an option, alongside microwasm, so that someone who wishes to work on the standard wasm support could do so.
Also, for the record, for anyone following along: I'm not necessarily endorsing microwasm at this time; for now, I myself am approaching this as an experiment where I'm interested in seeing the results.
wabt = "0.7" | ||
lazy_static = "1.2" | ||
quickcheck = "0.7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can quickcheck
be a dev-dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and it should be. Good catch, I always forget to make things dev-dependencies
impl Block { | ||
fn should_serialize_args(&self) -> bool { | ||
self.calling_convention.is_none() | ||
&& (self.num_callers != Some(1) || self.has_backwards_callers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be ok with me if you wanted to drop the "callers" terminology within the implementation and call them "predecessors".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually quite like the "callers" terminology from the funclets proposal - we reuse a lot of code between function calls and block calls. The only difference between the two is that a block can continue after a function call and so we have more bookkeeping to do. "Predecessors" seems more opaque to me, even if it's the more widely-used terminology in compiler literature.
calling_convention | ||
.as_ref() | ||
.map(Either::as_ref) | ||
.and_then(Either::left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is admittedly an aesthetic preference, but I prefer declaring a custom enum rather than using either, because I always forget which arm is "left" and which is "right".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that this would make this much more readable.
; jmp =>label.0 | ||
); | ||
#[derive(Debug, Clone)] | ||
pub struct VirtualCallingConvention { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that a VirtualCallingConvention
is a convention for a block that is only reached from within the function, rather than called from other functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's just a really poor choice of name (it actually derives from a previous version of the code where I just had VirtualStack
but then I added StackDepth
too). This actually represents a calling convention that doesn't specify physical locations for its arguments, and so can have arguments in duplicated locations or constants for arguments. This is for blocks that we are certain have precisely one caller. For example, for a loop header we always have to allocate unique locations for each argument since we don't know whether two arguments that are in overlapping slots at the time that we're translating the first call will always be called with overlapping arguments.
A compiler that produces Microwasm directly could avoid this problem, since if a block is always called with overlapping arguments it can simply take fewer arguments and use pick
at the beginning of the block, likewise if one of the arguments is always const
then it could simply emit the const
inside the block itself. This is really only a problem when we're translating from Wasm.
use super::{Signedness, SignfulInt, Size}; | ||
|
||
pub const I32: SignfulInt = SignfulInt(Signedness::Signed, Size::_32); | ||
pub const I64: SignfulInt = SignfulInt(Signedness::Signed, Size::_64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to name these S32
and S64
, since wasm's i32
and i64
are not signed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Naming these constants is actually quite difficult since we need to differentiate between a type that could be signed but isn't and a type for which the sign is irrelevant. Ideally I'd use a generic const fn but that's not possible in Rust right now
pub const SU32: SignfulType = Type::Int(sint::U32); | ||
pub const SU64: SignfulType = Type::Int(sint::U64); | ||
pub const SF32: SignfulType = Type::Float(Size::_32); | ||
pub const SF64: SignfulType = Type::Float(Size::_64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using S
as a prefix for these names makes them awkward to read. I was thinking "What's a signed unsigned 64?" when reading the code ;-). Would it work to just drop the leading S
for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have mod signful
/mod signless
with constants that have overlapping names, but unfortunately I can't have a generic constant/generic constant function that can return either since Rust doesn't support const fns in traits. The prefix is because we have "signless" and "signful" versions.
|
||
RuntimeFunc { | ||
func_start: start, | ||
sig_hash: quickhash(ty) as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that using quickcheck
quickhash
here is just a temporary solution to get the code up and running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I want to transition to having Lightbeam support Wasmtime's method of signature checking but I haven't got around to it yet. This is just a stopgap that allows us to check that signature checking works in theory.
It's not clear for me whether the “microwasm format” is the same than the “wasm format”, i.e. if it is strictly a subset of Wasm minus some constructions. Did I get it right? If that's correct, then any compiler can produce a microwasm (wasm-compatible) binary, which will run on any runtimes. Is that correct? |
@sunfishcode Maybe it's not clear, but this absolutely does not drop support for Wasm. Right now Microwasm is just an MIR for Lightbeam, and the ultimate goal is to have both have equally-first-class support. Since Microwasm is far simpler, compiling Wasm to it as an intermediate step makes implementing new features (right now I'm implementing floats) far simpler. It does change the API of |
@Hywan The plan is that any compiler that can produce Wasm can ignore the fact that Microwasm exists entirely. Although we could produce better assembly if we were provided Microwasm by an optimising compiler, the conversion to Microwasm can be treated as an implementation detail for now, simply a method to allow Lightbeam to reduce the complexity of the backend so we can implement new features such as floats and SSE without making either the code unmaintainable or the output low-quality. For example, before Microwasm it would have been untenably complex to implement incremental stack expansion, so that we can store volatile registers on the stack before calls etc. After the work was done to implement Microwasm, doing so was so simple that I implemented it in just an hour. Likewise for adding code to free up scratch registers if we run out - I haven't implemented it just yet because I don't have any code to test it on, but I already know how I would implement it if it became necessary. It also massively simplifies the handling of unreachable code - previously we'd generate unreachable assembly, now (AFAIK) we never do. It's not necessary for Microwasm to be binary-compatible with Wasm because Wasm can be converted to Microwasm on-the-fly with 0-element lookahead and minimal runtime cost. |
OK, it's for internal usage only. Thanks for the clarification. However, how is the cost for 1Mb Wasm binary? The Microwasm equivalent would more or less be 1Mb in-memory, is that correct? I'm wondering if Lightbeam could not a provide a tool to “compress” Wasm to Microwasm. This assumption holds if and only if Microwasm is binary-compatible with Wasm. And thus, under those conditions, if Microwasm should not be implemented in Walrus for instance. I'm thinking out loud. If Microwasm is for internal usage only, then this paragraph is pointless. |
For now it's for internal use, but the intention is that eventually a compiler could produce Microwasm directly and Lightbeam could consume it - essentially making Lightbeam agnostic over Wasm or Microwasm. Currently there is no specified binary encoding for Microwasm, so it would be difficult to write a tool that outputs it to a file, but if and when we get to the point where we want tools like LLVM to output it then we'll have to start specifying it. Certainly the metadata ( As for whether support should be implemented in other libraries - be my guest! Do be aware though that the format is in flux right now and it's unlikely to see support outside of Lightbeam until compilers start producing it directly - as a compiler IR it's quite constrained by the need to produce it on-the-fly. A non-streaming compiler like Cranelift or SpiderMonkey can use an IR that is less constrained and to my knowledge no other optimising streaming compilers exist. |
OK. So Microwasm only exists in-memory, it's not intended to be written down in a binary file. Is that correct? |
For now, yes, but we will have to specify a binary encoding eventually |
Thanks for the clarification 👍. |
Talking with @Vurich, I realized that I misunderstood how the wasm -> microwasm translation was structured. wasm -> microwasm -> native code happens in a single pass, which is great. So merging this sounds good so we can iterate on trunk instead of in huge PR :-). |
@Vurich, is there work going on anywhere to specify microwasm? i'm curious if there's more that can be done with it, some of the unnecessary performance limitations of webassembly are really irritating |
This implements the "microwasm" MIR for lightbeam. An example of what this IR looks like is below (using straw-man text syntax).
Here's the Wasm for a simple fibonacci function (this was generated by Rustc):
Now here's that compiled to Microwasm. Note that this compilation is single-pass and streaming.
To allow us to do streaming compilation, the "def" block type/metadata definitions seen at the top are actually interleaved with the rest of the assembly, but in the disassembly I put them at the top for clarity.
The main differences are:
br .return
. This simplifies a lot of the backend because now we only have to care about a single way of passing values and we have no callstack.pick
andswap
instructions to emulateget_local
andset_local
/tee_local
respectively. One nice side-effect of this is that the*_local
functions become no-ops at runtime sincepick
andswap
only affect the virtual stack.Although the quality of code that we can produce in a streaming compiler is not improved by this PR since we're still working with the same source information, this does make it so that we can massively simplify the backend and therefore produce better code without making it unmaintainable. The Wasm->Microwasm compiler is simple (and I've realised some things that could make it simpler still but I haven't implemented them yet) and the Microwasm->native compiler is simple too. Deleting hundreds of lines of code related to locals and block returns was an extremely cathartic experience.
The improvement in output even before I start writing actual optimisations is incredible. Here's a simple Wasm function:
Before Microwasm, this compiled to the following assembly:
Even with Microwasm V1, the implementation of which is way simpler than the implementation that gave us the above code, we get vastly superior output:
The two inefficiencies that I can see here:
add ecx, eax; mov rax, rcx
instead ofadd eax, ecx
. This is only avoidable with some kind of thunking system so we know which register we want to put theadd
result into (a thunking system would also allow us to emitlea
instead ofmov
+add
and 3-argumentmul
instead ofmov
+2-argumentmul
). I suggested implementing some kind of thunking system before but it was foiled by the complexities of integrating it with locals, so maybe it'd be something to look at now that locals are removed from the format. Probably the runtime cost is mitigated with register renaming.rsi
andrdx
into temporary registers. This is avoidable, since we know that each of thethen
andelse
branches of the if are called precisely once each.The most important things to implement to get us back to parity with the pre-microwasm implementation:
(block (result i32) (i32.const 1) (i32.const 1))
. This is hard because these values to discard will be in the middle of the stack for us, we need to emit a series ofswap
s anddrop
s.UPDATE: Even with adding extremely basic optimisations we're now within ~10-20% of direct compilation to native performance for the following function:
The native optimised performance compared to the Rust->Wasm->Microwasm->native performance is below: