Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized transitions for cranelift #1126

Closed
shravanrn opened this issue Sep 27, 2019 · 16 comments
Closed

Optimized transitions for cranelift #1126

shravanrn opened this issue Sep 27, 2019 · 16 comments
Labels
cranelift Issues related to the Cranelift code generator

Comments

@shravanrn
Copy link

This bug is in the context of use of Cranelift to generate wasm sandboxed libraries for use in Firefox - i.e. this is a non web embedding. Full details available here https://bugzilla.mozilla.org/show_bug.cgi?id=1562797

We are using lucet as our wasm compiler which uses Cranelift under the covers. For performance reasons, we need to optimize transitions between native code and wasm sandboxed code beyond what lucet provides out of the box. Key to this is generating efficient trampoline functions that minimize overhead.

To this end we have rewritten trampolines that make a few assumptions based on " properties derivable from the WASM spec about register and stack use of a WASM compiler". These assumptions reduces overhead when using a sandboxed version of libGraphite by several 100's of percent.

Since this is something indirectly implied from the WASM spec, I am hoping that Cranelift can guarantee these and wanted to confirm if these are OK.

There are 3 assumptions - one about stacks, and two about register use

  1. Assembly functions generated by Cranelift WASM never underflows the stack
    • This is guaranteed from the WASM spec due to the fact that WASM functions are proper stack machines
    • This would allow us to use the same stack for the application and wasm module
    • However, our reliance on this depends on how bug-free we think Cranelift's conversion from the WASM stack machine
  2. Assembly functions generated by Cranelift never reads an uninitialized register
    • This is an outcome of all function calls to WASM and within WASM (including indirect jumps) being typesafe.
    • This means there should never be code generated that reads from a scratch register prior to writing to it.
    • This also means there should never be code generate for a function that takes 2 parameters, that attempts to read a value from the register that would hold a third parameter (if it had existed)
    • This allows us to avoid clearing scratch and unused parameter registers during WASM function invocation.
  3. Assembly functions generated by Cranelift always restore callee save registers
    • Like point 2, this is an outcome of all functions being typesafe combined with the fact that WASM is a stack machine, which cannot manipulate the stack in a way that affects register spills.
    • This means that there should no path through the function that would result in a callee save register not being restored
    • This allows us to avoid explicitly restoring registers on the way out
@shravanrn
Copy link
Author

shravanrn commented Sep 27, 2019

This bug was initially opened on the bugzilla. Pasting initial response from Lars T Hansen [:lth] below. Moved to github per suggestion below.


It strikes me that it may be more fruitful to have this discussion in the cranelift repository, https://github.com/CraneStation/cranelift.

I'm not sure how much we can say that the structure of Wasm bytecode leads to these properties being upheld. Take (2) for example. It is true that wasm is typesafe and that all locations start out initialized. But it does not follow that the generated code never reads an uninitialized register -- that depends on code generation strategies. It probably follows from typesafe-ness that the value of an uninitialized register is not used in a semantics-affecting way, but that is a slightly weaker property than you propose, though it has the effect that you're after, I expect.

Cranelift is subject to restrictions like the ones you propose from having to fit into SpiderMonkey and Firefox, however. Both (1) and the slightly weakened version of (2) I believe to be the case. As for (3), it is true at points when the callee-save registers are observable, that is, it is possible that eg trap handling can violate this property while a trap is being unwound, but the property will have been reestablished once the system reaches a state where it is possible to look at the registers, eg, at the boundary where control leaves compiled wasm code and returns into embedder code.

I should say, whether these properties hold may be ABI-dependent. As you know, Cranelift has support for multiple ABIs; the Firefox embedding uses the "baldrdash" ABI which guarantees the properties above. The other ABIs, System-V and Fastcall, may also guarantee them, but I don't know this for a fact.

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 27, 2019

Assembly functions generated by Cranelift WASM never underflows the stack

True, no callconv I now of allows that :)

Assembly functions generated by Cranelift never reads an uninitialized register

True. To read a register, there needs to be a value stored there first.

Assembly functions generated by Cranelift always restore callee save registers

True, it restores the necessary callee save regs for the specified call conv.

@shravanrn
Copy link
Author

shravanrn commented Sep 27, 2019

It probably follows from typesafe-ness that the value of an uninitialized register is not used in a semantics-affecting way, but that is a slightly weaker property than you propose, though it has the effect that you're after, I expect.

I think this sounds right. For instance, if we have an uninitialized register $rsi, and this is cleared with xor $rsi $rsi, which is "technically" a read, this is perfectly fine. What I want to rule out is the possibility of an attacker ever constructing a C program compiled with Cranelift that has the following effect

void foo(/* wasm_heap_argument*/)
{
  long rsi = get_value_of_uninitialized_rsi();
  printf("%ld\n", rsi);
}

Assuming the System V Linux x64 calling convention, $rsi should not be inferable by the program as it is supposed to hold the second function argument, but foo does not have this.

it is possible that eg trap handling can violate this property while a trap is being unwound,

Oh, good catch! This might be subtle as a trap handler may fire in the middle of execution of a wasm function - and so any execution after the trap would not necessarily start from a typesafe starting point. This is definitely something that I'll think about some more. Let me know if you have any more thoughts on this!

but the property will have been reestablished once the system reaches a state where it is possible to look at the registers, eg, at the boundary where control leaves compiled wasm code and returns into embedder code

Awesome! 😃 I think this is definitely a key property I am looking for!

True, no callconv I now of allows that :) True. To read a register, there needs to be a value stored there first. True, it restores the necessary callee save regs for the specified call conv.

Sounds great! 😃 At this time I am only looking at this System V x64 ABI. The main difficulty I was hoping to clarify here is that this is still fine when viewed adversarially i.e. an adversary should never be able to write code compiled with Cranelift with the System V ABI that violates the above properties. From your description it does seem like Cranelift should guarantee these properties (modulo bugs) which is awesome! (Please let me know if my understanding is incorrect)

Fwiw, I think the my overall goal of this discussion is the following - Would it be considered a Cranelift bug if these properties could be violated for the System V ABI (ignore trap handlers for a moment).

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 27, 2019

@shravanrn
Copy link
Author

shravanrn commented Sep 27, 2019

trap handling can violate this property while a trap is being unwound

https://cranelift.readthedocs.io/en/latest/compare-llvm.html#undefined-behavior can be useful.

Thanks! I had a quick look at the wasm runtime for the non web embedding - it looks like all traps are fatal and we don't ever execute the wasm module after a trap. It does seem like restoring registers as part of the trap response is something that is controlled by the wasm runtime (please correct me if I'm wrong here 😃).

From this, I think, the easy and safe option would be to ensure both 1) no further execution of wasm code after a trap 2) kill the embedding thread so we never return to the embedding application when registers are potentially unrestored.

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 27, 2019

it looks like all traps are fatal and we don't ever execute the wasm module after a trap.

there is trapnf

It does seem like restoring registers as part of the trap response is something that is controlled by the wasm runtime

correct, before returning from a nonfatal trap, you should restore all regs to the value at the time of the trap.

@shravanrn
Copy link
Author

there is trapnf

Oh interesting. Is there anywhere I could get more info on this i.e.

  • what sort of application code uses trapnf?
  • is the runtime required to provide a handler for this?

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 27, 2019

I misremembered the name. It is called resumable_trap and not trapnf: https://docs.rs/cranelift-codegen/0.44.0/cranelift_codegen/ir/trait.InstBuilder.html#method.resumable_trap

It was introduced in bytecodealliance/cranelift#871.

is the runtime required to provide a handler for this?

It compiles to ud2 like any other trap, so you need to catch SIGILL.

@shravanrn
Copy link
Author

Gotcha! I will double check how the wasm runtime were using (a modified version of lucet), handles this and get back.

Thanks a bunch for the info!

@shravanrn
Copy link
Author

Looks like lucet is just in the process of migrating to a newer cranelift so they don't support resumable traps yet. So I will follow up there regarding this.

Thanks a bunch for your help! So just to confirm, is it reasonable to say breaking of the invariants listed would be a cranelift bug (modulo resumable traps doing something bad)?

@bnjbvr
Copy link
Member

bnjbvr commented Oct 18, 2019

Thanks a bunch for your help! So just to confirm, is it reasonable to say breaking of the invariants listed would be a cranelift bug (modulo resumable traps doing something bad)?

Yes, I think so.

Should we keep this bug, or is there actual work that's required to be done in Cranelift?

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 18, 2019

Maybe we should document all invariants.

@shravanrn
Copy link
Author

shravanrn commented Oct 29, 2019

From the discussion, I don't believe there is any new work here (so feel free to close as required). But I definitely agree that it may be useful to document this invariant, so that downstream consumers feel ok relying on this 😃

@shravanrn shravanrn changed the title Need info about safe assumptions for Cranelift's register use Optimized transitions for cranelift Feb 25, 2020
@shravanrn
Copy link
Author

(Updating title as we want to point people to this bug until we can update docs)

@cfallin
Copy link
Member

cfallin commented May 4, 2022

It looks like this issue's purpose was mainly to establish answers to the above questions; it looks like there is also a pending question of whether the invariants should be documented. However, it doesn't strike me that they are particularly noteworthy beyond what is already documented: the question is basically whether Cranelift follows the calling convention (restores callee-saved registers), and whether the dataflow of its output is correct (never reads an uninitialized register). These fall out of "correct compiler that follows the spec" and aren't any special additional guarantee. So I'll go ahead and close, but please feel free to ask for other clarifications as needed!

@cfallin cfallin closed this as completed May 4, 2022
@shravanrn
Copy link
Author

Sounds good. I had initially raised this question as we are relying on a performance optimization that made the above assumptions. If it is of interest, we have subsequently shown with formal semantics, that any spec conforming Wasm compiler must automatically meet the above restrictions, as well as the workloads and performance benefits we get. Details available here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

No branches or pull requests

5 participants