Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microwasm #18

Merged
merged 62 commits into from Mar 1, 2019

Conversation

Projects
None yet
4 participants
@Vurich
Copy link
Collaborator

Vurich commented Feb 15, 2019

This implements the "microwasm" MIR for lightbeam. An example of what this IR looks like is below (using straw-man text syntax).

Here's the Wasm for a simple fibonacci function (this was generated by Rustc):

(module
  (func $fib (param $p0 i32) (result i32)
    (local $l1 i32)
    (set_local $l1
      (i32.const 1))
    (block $B0
      (br_if $B0
        (i32.lt_u
          (get_local $p0)
          (i32.const 2)))
      (set_local $l1
        (i32.const 1))
      (loop $L1
        (set_local $l1
          (i32.add
            (call $fib
              (i32.add
                (get_local $p0)
                (i32.const -1)))
            (get_local $l1)))
        (br_if $L1
          (i32.gt_u
            (tee_local $p0
              (i32.add
                (get_local $p0)
                (i32.const -2)))
            (i32.const 1)))))
    (get_local $l1)))

Now here's that compiled to Microwasm. Note that this compilation is single-pass and streaming.

def .L1_end :: [i32, i32]
def .L2 :: [i32, i32] num_callers=1
def .L3 :: [i32, i32] has_backwards_callers
def .L3_end :: [i32, i32]
def .L4 :: [i32, i32] num_callers=1

.fn_0:
      const 0i32
      const 1i32
      swap 1
      drop
      pick 1
      const 2i32
      u32.lt
      br_if .L1_end, .L2
.L2:
      const 1i32
      swap 1
      drop
      br .L3
.L3:
      pick 1
      const -1i32
      i32.add
      call 0
      pick 1
      i32.add
      swap 1
      drop
      pick 1
      const -2i32
      i32.add
      swap 2
      const 1i32
      u32.gt
      br_if .L3, .L4
.L4:
      br .L1_end
.L1_end:
      pick 0
      br .return

To allow us to do streaming compilation, the "def" block type/metadata definitions seen at the top are actually interleaved with the rest of the assembly, but in the disassembly I put them at the top for clarity.

The main differences are:

  • No block returns - only calling a new block. Returning from the function is br .return. This simplifies a lot of the backend because now we only have to care about a single way of passing values and we have no callstack.
  • Raw labels instead of hierarchical blocks like Wasm - this allows us to represent any arbitrary CFG.
  • No locals - we have pick and swap instructions to emulate get_local and set_local/tee_local respectively. One nice side-effect of this is that the *_local functions become no-ops at runtime since pick and swap only affect the virtual stack.

Although the quality of code that we can produce in a streaming compiler is not improved by this PR since we're still working with the same source information, this does make it so that we can massively simplify the backend and therefore produce better code without making it unmaintainable. The Wasm->Microwasm compiler is simple (and I've realised some things that could make it simpler still but I haven't implemented them yet) and the Microwasm->native compiler is simple too. Deleting hundreds of lines of code related to locals and block returns was an extremely cathartic experience.

The improvement in output even before I start writing actual optimisations is incredible. Here's a simple Wasm function:

(module
  (func (param i32) (param i32) (result i32)
    get_local 1
    (block (result i32)
        get_local 0
        get_local 0
        br_if 0
        unreachable
    )
    i32.add
  )
)

Before Microwasm, this compiled to the following assembly:

  push  rbp
  mov   rbp, rsp
  sub   rsp, 0x18
  mov   rax, rsi
  test  eax, eax
  je    .L0
  mov   rax, rsi
  jmp   .L1
.L0:
  jmp   .L2
.L1:
  add   eax, edx
  mov   rsp, rbp
  pop   rbp
  ret   
.L2:
  ud2 

Even with Microwasm V1, the implementation of which is way simpler than the implementation that gave us the above code, we get vastly superior output:

  test  esi, esi
  mov   rax, rsi
  mov   rcx, rdx
  jne   .L0
  ud2
.L0:
  add   ecx, eax
  mov   rax, rcx
  ret

The two inefficiencies that I can see here:

  • add ecx, eax; mov rax, rcx instead of add eax, ecx. This is only avoidable with some kind of thunking system so we know which register we want to put the add result into (a thunking system would also allow us to emit lea instead of mov+add and 3-argument mul instead of mov+2-argument mul). I suggested implementing some kind of thunking system before but it was foiled by the complexities of integrating it with locals, so maybe it'd be something to look at now that locals are removed from the format. Probably the runtime cost is mitigated with register renaming.
  • Unnecessary duplication of rsi and rdx into temporary registers. This is avoidable, since we know that each of the then and else branches of the if are called precisely once each.

The most important things to implement to get us back to parity with the pre-microwasm implementation:

  • calls (this requires spilling arguments to the stack)
  • Blocks leaving some number of values on the stack when returning, like (block (result i32) (i32.const 1) (i32.const 1)). This is hard because these values to discard will be in the middle of the stack for us, we need to emit a series of swaps and drops.

UPDATE: Even with adding extremely basic optimisations we're now within ~10-20% of direct compilation to native performance for the following function:

fn fib(n: i32) -> i32 {
    if n == 0 || n == 1 {
        1
    } else {
        fib(n - 1) + fib(n - 2)
    }
}

The native optimised performance compared to the Rust->Wasm->Microwasm->native performance is below:

 name                        out.bench ns/iter  bench ns/iter  diff ns/iter  diff %  speedup 
 tests::bench_fibonacci_run  21,835             24,878                3,043  13.94%   x 0.88 

pepyakin and others added some commits Nov 29, 2018

Register allocation V2
This lays the groundwork for other on-the-fly optimisations,
like passing literals through in order to do const folding
in linear time, while compiling.
Improve behaviour when adding literal to non-literal, materialise loc…
…al if its value will be changed

Currently the implementation of materializing locals causes compilation to be non-linear in degenerate cases

@Vurich Vurich force-pushed the Vurich:microwasm branch from fc6d6c7 to ab8c6e1 Feb 19, 2019

@Vurich Vurich marked this pull request as ready for review Feb 19, 2019

@Vurich Vurich force-pushed the Vurich:microwasm branch from ab8c6e1 to f1d9ccb Feb 19, 2019

@sunfishcode
Copy link
Member

sunfishcode left a comment

I've now made an initial scan through the code, with a few comments inline, and one organizational issue here. My initial impression is that microwasm looks simple and flexible, and I'm interested to see where it goes.

This PR also removes the work-in-progress support for standard wasm, which is unfortunate. One of the things I think would be really valuable from microwasm is the ability to directly compare microwasm with wasm along various dimensions. Also, many potential Wasmtime users will need to support standard wasm for the foreseeable future. You're welcome to your own priorities, so I would just ask if it's at least feasible to organize the code such that standard wasm could be supported as an option, alongside microwasm, so that someone who wishes to work on the standard wasm support could do so.

Also, for the record, for anyone following along: I'm not necessarily endorsing microwasm at this time; for now, I myself am approaching this as an experiment where I'm interested in seeing the results.

wabt = "0.7"
lazy_static = "1.2"
quickcheck = "0.7"

This comment has been minimized.

@sunfishcode

sunfishcode Feb 20, 2019

Member

Can quickcheck be a dev-dependency?

This comment has been minimized.

@Vurich

Vurich Feb 20, 2019

Author Collaborator

Yes, and it should be. Good catch, I always forget to make things dev-dependencies

impl Block {
fn should_serialize_args(&self) -> bool {
self.calling_convention.is_none()
&& (self.num_callers != Some(1) || self.has_backwards_callers)

This comment has been minimized.

@sunfishcode

sunfishcode Feb 20, 2019

Member

It would be ok with me if you wanted to drop the "callers" terminology within the implementation and call them "predecessors".

This comment has been minimized.

@Vurich

Vurich Feb 20, 2019

Author Collaborator

I actually quite like the "callers" terminology from the funclets proposal - we reuse a lot of code between function calls and block calls. The only difference between the two is that a block can continue after a function call and so we have more bookkeeping to do. "Predecessors" seems more opaque to me, even if it's the more widely-used terminology in compiler literature.

calling_convention
.as_ref()
.map(Either::as_ref)
.and_then(Either::left)

This comment has been minimized.

@sunfishcode

sunfishcode Feb 20, 2019

Member

This is admittedly an aesthetic preference, but I prefer declaring a custom enum rather than using either, because I always forget which arm is "left" and which is "right".

This comment has been minimized.

@Vurich

Vurich Feb 20, 2019

Author Collaborator

I actually think that this would make this much more readable.

; jmp =>label.0
);
#[derive(Debug, Clone)]
pub struct VirtualCallingConvention {

This comment has been minimized.

@sunfishcode

sunfishcode Feb 20, 2019

Member

Do I understand correctly that a VirtualCallingConvention is a convention for a block that is only reached from within the function, rather than called from other functions?

This comment has been minimized.

@Vurich

Vurich Feb 20, 2019

Author Collaborator

No, it's just a really poor choice of name (it actually derives from a previous version of the code where I just had VirtualStack but then I added StackDepth too). This actually represents a calling convention that doesn't specify physical locations for its arguments, and so can have arguments in duplicated locations or constants for arguments. This is for blocks that we are certain have precisely one caller. For example, for a loop header we always have to allocate unique locations for each argument since we don't know whether two arguments that are in overlapping slots at the time that we're translating the first call will always be called with overlapping arguments.

A compiler that produces Microwasm directly could avoid this problem, since if a block is always called with overlapping arguments it can simply take fewer arguments and use pick at the beginning of the block, likewise if one of the arguments is always const then it could simply emit the const inside the block itself. This is really only a problem when we're translating from Wasm.

use super::{Signedness, SignfulInt, Size};

pub const I32: SignfulInt = SignfulInt(Signedness::Signed, Size::_32);
pub const I64: SignfulInt = SignfulInt(Signedness::Signed, Size::_64);

This comment has been minimized.

@sunfishcode

sunfishcode Feb 20, 2019

Member

Would it make sense to name these S32 and S64, since wasm's i32 and i64 are not signed?

This comment has been minimized.

@Vurich

Vurich Feb 20, 2019

Author Collaborator

This is a good point. Naming these constants is actually quite difficult since we need to differentiate between a type that could be signed but isn't and a type for which the sign is irrelevant. Ideally I'd use a generic const fn but that's not possible in Rust right now

pub const SU32: SignfulType = Type::Int(sint::U32);
pub const SU64: SignfulType = Type::Int(sint::U64);
pub const SF32: SignfulType = Type::Float(Size::_32);
pub const SF64: SignfulType = Type::Float(Size::_64);

This comment has been minimized.

@sunfishcode

sunfishcode Feb 20, 2019

Member

Using S as a prefix for these names makes them awkward to read. I was thinking "What's a signed unsigned 64?" when reading the code ;-). Would it work to just drop the leading S for these?

This comment has been minimized.

@Vurich

Vurich Feb 20, 2019

Author Collaborator

I could have mod signful/mod signless with constants that have overlapping names, but unfortunately I can't have a generic constant/generic constant function that can return either since Rust doesn't support const fns in traits. The prefix is because we have "signless" and "signful" versions.


RuntimeFunc {
func_start: start,
sig_hash: quickhash(ty) as u32,

This comment has been minimized.

@sunfishcode

sunfishcode Feb 20, 2019

Member

Is it the case that using quickcheckquickhash here is just a temporary solution to get the code up and running?

This comment has been minimized.

@Vurich

Vurich Feb 20, 2019

Author Collaborator

Yes! I want to transition to having Lightbeam support Wasmtime's method of signature checking but I haven't got around to it yet. This is just a stopgap that allows us to check that signature checking works in theory.

@Hywan

This comment has been minimized.

Copy link

Hywan commented Feb 20, 2019

It's not clear for me whether the “microwasm format” is the same than the “wasm format”, i.e. if it is strictly a subset of Wasm minus some constructions. Did I get it right?

If that's correct, then any compiler can produce a microwasm (wasm-compatible) binary, which will run on any runtimes. Is that correct?

@Vurich

This comment has been minimized.

Copy link
Collaborator Author

Vurich commented Feb 20, 2019

I've now made an initial scan through the code, with a few comments inline, and one organizational issue here. My initial impression is that microwasm looks simple and flexible, and I'm interested to see where it goes.

This PR also removes the work-in-progress support for standard wasm, which is unfortunate. One of the things I think would be really valuable from microwasm is the ability to directly compare microwasm with wasm along various dimensions. Also, many potential Wasmtime users will need to support standard wasm for the foreseeable future. You're welcome to your own priorities, so I would just ask if it's at least feasible to organize the code such that standard wasm could be supported as an option, alongside microwasm, so that someone who wishes to work on the standard wasm support could do so.

Also, for the record, for anyone following along: I'm not necessarily endorsing microwasm at this time; for now, I myself am approaching this as an experiment where I'm interested in seeing the results.

@sunfishcode Maybe it's not clear, but this absolutely does not drop support for Wasm. Right now Microwasm is just an MIR for Lightbeam, and the ultimate goal is to have both have equally-first-class support. Since Microwasm is far simpler, compiling Wasm to it as an intermediate step makes implementing new features (right now I'm implementing floats) far simpler. It does change the API of function_body to drop direct support for Wasm, but it's trivial to write a wrapper that consumes Wasm by passing it through MicrowasmConv (and in fact this is what how all of Lightbeam's tests work now). I'll have to write a simple wrapper like this for integrating this version of the code with Wasmtime. I have no plans to remove support for Wasm, one of the main goals of Microwasm is that converting Wasm to it first should never produce worse code than if the Wasm had been converted to native code directly.

@Vurich

This comment has been minimized.

Copy link
Collaborator Author

Vurich commented Feb 20, 2019

It's not clear for me whether the “microwasm format” is the same than the “wasm format”, i.e. if it is strictly a subset of Wasm minus some constructions. Did I get it right?

If that's correct, then any compiler can produce a microwasm (wasm-compatible) binary, which will run on any runtimes. Is that correct?

@Hywan The plan is that any compiler that can produce Wasm can ignore the fact that Microwasm exists entirely. Although we could produce better assembly if we were provided Microwasm by an optimising compiler, the conversion to Microwasm can be treated as an implementation detail for now, simply a method to allow Lightbeam to reduce the complexity of the backend so we can implement new features such as floats and SSE without making either the code unmaintainable or the output low-quality.

For example, before Microwasm it would have been untenably complex to implement incremental stack expansion, so that we can store volatile registers on the stack before calls etc. After the work was done to implement Microwasm, doing so was so simple that I implemented it in just an hour. Likewise for adding code to free up scratch registers if we run out - I haven't implemented it just yet because I don't have any code to test it on, but I already know how I would implement it if it became necessary. It also massively simplifies the handling of unreachable code - previously we'd generate unreachable assembly, now (AFAIK) we never do. It's not necessary for Microwasm to be binary-compatible with Wasm because Wasm can be converted to Microwasm on-the-fly with 0-element lookahead and minimal runtime cost.

@Hywan

This comment has been minimized.

Copy link

Hywan commented Feb 20, 2019

It's not necessary for Microwasm to be binary-compatible with Wasm because Wasm can be converted to Microwasm on-the-fly with 0-element lookahead and minimal runtime cost.

OK, it's for internal usage only. Thanks for the clarification. However, how is the cost for 1Mb Wasm binary? The Microwasm equivalent would more or less be 1Mb in-memory, is that correct?

I'm wondering if Lightbeam could not a provide a tool to “compress” Wasm to Microwasm. This assumption holds if and only if Microwasm is binary-compatible with Wasm. And thus, under those conditions, if Microwasm should not be implemented in Walrus for instance. I'm thinking out loud. If Microwasm is for internal usage only, then this paragraph is pointless.

@Vurich

This comment has been minimized.

Copy link
Collaborator Author

Vurich commented Feb 20, 2019

For now it's for internal use, but the intention is that eventually a compiler could produce Microwasm directly and Lightbeam could consume it - essentially making Lightbeam agnostic over Wasm or Microwasm. Currently there is no specified binary encoding for Microwasm, so it would be difficult to write a tool that outputs it to a file, but if and when we get to the point where we want tools like LLVM to output it then we'll have to start specifying it. Certainly the metadata (table, memory, type sections and the like) will be encoded identically to allow compilers with Wasm backends to reuse that infrastructure.

As for whether support should be implemented in other libraries - be my guest! Do be aware though that the format is in flux right now and it's unlikely to see support outside of Lightbeam until compilers start producing it directly - as a compiler IR it's quite constrained by the need to produce it on-the-fly. A non-streaming compiler like Cranelift or SpiderMonkey can use an IR that is less constrained and to my knowledge no other optimising streaming compilers exist.

@Hywan

This comment has been minimized.

Copy link

Hywan commented Feb 20, 2019

OK. So Microwasm only exists in-memory, it's not intended to be written down in a binary file. Is that correct?

@Vurich

This comment has been minimized.

Copy link
Collaborator Author

Vurich commented Feb 20, 2019

For now, yes, but we will have to specify a binary encoding eventually

@Hywan

This comment has been minimized.

Copy link

Hywan commented Feb 20, 2019

Thanks for the clarification 👍.

@sunfishcode

This comment has been minimized.

Copy link
Member

sunfishcode commented Feb 28, 2019

Talking with @Vurich, I realized that I misunderstood how the wasm -> microwasm translation was structured. wasm -> microwasm -> native code happens in a single pass, which is great. So merging this sounds good so we can iterate on trunk instead of in huge PR :-).

@Vurich Vurich merged commit 74d0e41 into CraneStation:master Mar 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.