Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indirect calls: how do they work? #89

Closed
jfbastien opened this issue May 29, 2015 · 22 comments
Closed

Indirect calls: how do they work? #89

jfbastien opened this issue May 29, 2015 · 22 comments
Milestone

Comments

@jfbastien
Copy link
Member

The current AST semantics document states:

Indirect calls may be made to a value of function-pointer type. A function- pointer value may be obtained for a given function as specified by its index in the function table.

  • CallIndirect - call function indirectly
  • AddressOf - obtain a function pointer value for a given function
    Function-pointer values are comparable for equality and the AddressOf operator is monomorphic. Function-pointer values can be explicitly coerced to and from integers (which, in particular, is necessary when loading/storing to the heap since the heap only provides integer types). For security and safety reasons, the integer value of a coerced function-pointer value is an abstract index and does not reveal the actual machine code address of the target function.

In v.1 function pointer values are local to a single module. The dynamic linking feature is necessary for two modules to pass function pointers back and forth.

IIUC this basically is what Emscripten does.

I'd like us to discuss this a bit more to make sure we consider alternatives before choosing a specific approach:

  • Are different implementations of Web Assembly allowed to return different abstract integers for a function pointer?
  • What's the performance cost?
  • Does this have caveats with C++ UB on function pointers that happen to work on most implementations?
  • Does this support C++ pointer to member function sufficiently?
  • How does this interact with dynamic linking, late binding, and relocations?
  • Is this sufficient for non-C++ languages?
    • Does ObjectiveC work properly / efficiently?
    • Multimethods?
  • Can sanitizers (such as control-flow sanitizer) be implemented efficiently (without Web Assembly runtime involvement)?
  • Can the Web Assembly implementation use a sandboxing approach that doesn't rely on a language VM for security?
    • Can this target NaCl efficiently?
    • can this target MinSFI efficiently?
    • Are we hindering future sandboxing research?

Anything else?

@kripken
Copy link
Member

kripken commented May 29, 2015

Excellent questions!

For now I have a thought on the first of them, that you made me realize. It seems dangerous to let implementations pick their own values. For one thing, they can just pick the actual pointer value as the simplest solution, which is unsafe. But probably no serious implementation would do that. However, a second concern is that it leaves more room for differences between implementations. Of course, no serious program should depend on such differences, but this seems a particularly risky area: we already have semantics that differ from many native platforms, namely that we don't allow calling a method (directly or not) with the wrong types. In practice, this is not as rare in the real world as we would hope. This makes for some fun bugs, and having function pointer values that change between implementations, or that change between runs on a single implementation, would compound the confusions.

I would therefore suggest to specify the pointer values. Perhaps one of these:

  • Methods that want to have an AddressOf taken must declare their numeric id for that operator in their definition. This would be nice since it means the compiler knows function pointers at compile time, and can write those into the memory initialization segment, avoiding work at startup. (This almost means we can get rid of AddressOf, but I'm not sure; dynamic linking might want it.)
  • Methods are numbered from 1 forward, ordered by their order of declaration.

@lukewagner
Copy link
Member

How does this interact with dynamic linking, late binding, and relocations?

The current proposal actually differs from normal Emscripten output (which will bake in integers literals directly instead of symbolically via AddressOf) and the goal of this is specifically to support dynamic linking. The point is that the value of AddressOf depends on the order a module was loaded.

@titzer
Copy link

titzer commented May 30, 2015

On Sat, May 30, 2015 at 12:04 AM, Alon Zakai notifications@github.com
wrote:

Excellent questions!

For now I have a thought on the first of them, that you made me realize.
It seems dangerous to let implementations pick their own values. For one
thing, they can just pick the actual pointer value as the simplest
solution, which is unsafe. But probably no serious implementation would do
that. However, a second concern is that it leaves more room for differences
between implementations. Of course, no serious program should depend on
such differences, but this seems a particularly risky area: we already have
semantics that differ from many native platforms, namely that we don't
allow calling a method (directly or not) with the wrong types. In practice,
this is not as rare in the real world as we would hope. This makes for some
fun bugs, and having function pointer values that change between
implementations, or that change between runs on a single implementation,
would compound the confusions.

I would therefore suggest to specify the pointer values. Perhaps one of
these:

  • Methods that want to have an AddressOf taken must declare their
    numeric id for that operator in their definition. This would be nice since
    it means the compiler knows function pointers at compile time, and can
    write those into the memory initialization segment, avoiding work at
    startup. (This almost means we can get rid of AddressOf, but I'm not
    sure; dynamic linking might want it.)
  • Methods are numbered from 1 forward, ordered by their order of
    declaration.

Why not just use a module number (determined by link order) and the
function's index in the module?

If we go with allowing function pointers (rather than separate
per-signature indirect tables) we're going to need a dynamic type
(signature) check for indirect function calls. It'd be relatively cheap to
be able to go into a table for functions and check if the type for that
function matches the call site. There's probably a bounds check, too.


Reply to this email directly or view it on GitHub
WebAssembly/spec#89 (comment).

@kg
Copy link
Contributor

kg commented May 30, 2015

We should think carefully about how JIT would interact with this, since applications JITting functions are going to end up having a considerable number of function pointers. They might also want to jump into the middle of a JITted function. If it's a module/fn index pair that would potentially mean JITting code would require creating a new module for it (or perhaps it could work like LCG in .NET where you're appending the functions onto the end of an existing module.)

@kripken
Copy link
Member

kripken commented May 30, 2015

@titzer: yes, an automatic numbering would need to take into account the module too. Perhaps it could be just a deterministic running counter that continues through shared modules as they are linked?

The more I think about this, the startup overhead seems like a large issue, as it's common to have function pointers in the data segment. In shared modules it seems unavoidable to have to adjust those function pointers at runtime, but for a singleton non-shared-module, it seems like a shame to require that. An alternative might be to allow marking function pointers in the data segment somehow - "there is an AddressOf of function X here", which is what LLVM IR and PNaCl have, but it still leaves work for the VM at runtime.

@lukewagner
Copy link
Member

@titzer That's a nice idea but, assuming we allow coercion to/from int32 (which is ultimately what you'd have to do before/after storing/loading from the heap), then we'd have to somehow or those two numbers together. Instead, I was assuming, more like what @kripken was saying, a running counter that is incremented in batches after each module is loaded. Also, the current proposal (specifically the "monomorphic" requirement on AddressOf) does imply a single, logical table of functions, not the per-signature tables of asm.js. (The polyfill (as well as an impl that wanted to avoid the dynamic signature check) would be able to work around this by padding up each table with jumps to throw functions so that a given function was only in its signature's table.)

@kg I think it's important that we don't scope-creep dynamic linking to support the full fine-grained code compilation/loading required by JITing. JIT support is already a separate feature on the list that we should address directly as it will likely require specific support anyway to be able to do normal things like ICs and patching.

@MikeHolman
Copy link
Member

Can sanitizers (such as control-flow sanitizer) be implemented efficiently (without Web Assembly runtime involvement)?

@jfbastien What are you thinking exactly? Maybe I misunderstand, but I don't see how a control flow sanitizer can be implemented anywhere other than in the Web Assembly runtime. (To make sure we're on the same page, my reference point is our control flow guard, but maybe that is completely different than what you are talking about).

@jfbastien
Copy link
Member Author

Can sanitizers (such as control-flow sanitizer) be implemented efficiently (without Web Assembly runtime involvement)?

@jfbastien What are you thinking exactly? Maybe I misunderstand, but I don't see how a control flow sanitizer can be implemented anywhere other than in the Web Assembly runtime. (To make sure we're on the same page, my reference point is our control flow guard, but maybe that is completely different than what you are talking about).

There are two uses for sanitizer in Web Assembly:

  1. Browser vendors trying to protect web users from untrusted code.
  2. Web developers trying to use sanitizers on their code, either as a testing / fuzzing tool, or to protect their code from malicious input.

NaCl is an example of 1. in that it enforces control-flow integrity and data-flow integrity, but only to protect web users from untrusted code. PNaCl makes it possible to translate code to NaCl, MinSFI or an OS-sandbox only (bare-metal mode) by leaving unspecified what happens when a function call is invalid. We use these three approaches in different circumstances. I want to ensure that we can to the same in Web Assembly.

I also want to allow developers to use sanitizers on their own code for 2., which is similar to turning on control flow guard. Doing this is basically running a sandbox within a sandbox (sanitizer within Web Assembly), and making the inner one efficient is a bit tricky: LLVM's control-flow sanitizer plays with function tables, and LLVM's address sanitizer uses shadow memory which requires mmap support for MAP_FIXED to be efficient.

@titzer
Copy link

titzer commented Jun 4, 2015

On Thu, Jun 4, 2015 at 7:42 PM, JF Bastien notifications@github.com wrote:

Can sanitizers (such as control-flow sanitizer) be implemented
efficiently (without Web Assembly runtime involvement)?

@jfbastien https://github.com/jfbastien What are you thinking exactly?
Maybe I misunderstand, but I don't see how a control flow sanitizer can be
implemented anywhere other than in the Web Assembly runtime. (To make sure
we're on the same page, my reference point is our control flow guard
http://blogs.msdn.com/b/vcblog/archive/2014/12/08/visual-studio-2015-preview-work-in-progress-security-feature.aspx,
but maybe that is completely different than what you are talking about).

There are two uses for sanitizer in Web Assembly:

  1. Browser vendors trying to protect web users from untrusted code.
  2. Web developers trying to use sanitizers on their code, either as a
    testing / fuzzing tool, or to protect their code from malicious input.

NaCl is an example of 1. in that it enforces control-flow integrity and
data-flow integrity, but only to protect web users from untrusted code.
PNaCl makes it possible to translate code to NaCl, MinSFI or an OS-sandbox
only (bare-metal mode) by leaving unspecified what happens when a function
call is invalid. We use these three approaches in different circumstances.
I want to ensure that we can to the same in Web Assembly.

I also want to allow developers to use sanitizers on their own code for
2., which is similar to turning on control flow guard. Doing this is
basically running a sandbox within a sandbox (sanitizer within Web
Assembly), and making the inner one efficient is a bit tricky: LLVM's
control-flow sanitizer http://reviews.llvm.org/D6095 plays with
function tables, and LLVM's address sanitizer uses shadow memory which
requires mmap support for MAP_FIXED to be efficient.

I don't have an objection to extra sandboxing. But I think that level of
sandboxing would be to protect against bugs in the wasm engine, right?


Reply to this email directly or view it on GitHub
WebAssembly/spec#89 (comment).

@jfbastien
Copy link
Member Author

I don't have an objection to extra sandboxing. But I think that level of sandboxing would be to protect against bugs in the wasm engine, right?

The inner sandboxing is something developers choose to use because they want to protect against bugs in their own code, not in wasm, the same way applications currently use e.g. stack canaries: it's not to protect against compiler or OS bugs but to protect against bugs in their own code.

@titzer
Copy link

titzer commented Jun 4, 2015

On Thu, Jun 4, 2015 at 10:21 PM, JF Bastien notifications@github.com
wrote:

I don't have an objection to extra sandboxing. But I think that level of
sandboxing would be to protect against bugs in the wasm engine, right?

The inner sandboxing is something developers choose to use because they
want to protect against bugs in their own code, not in wasm, the same way
applications currently use e.g. stack canaries: it's not to protect against
compiler or OS bugs but to protect against bugs in their own code.

I don't get it then. Wasm with a trusted stack and locals gives far better
guarantees (see other thread).


Reply to this email directly or view it on GitHub
WebAssembly/spec#89 (comment).

@lukewagner
Copy link
Member

The example in my mind of what I guessed @jfbastien is talking about is heap guards that a *San tool might insert. I agree we shouldn't need much to help us on the stack since it's trusted.

@jfbastien
Copy link
Member Author

UB in a developer's original C++ code isn't all detectable by wasm. @lukewagner has it exactly right: developers should be able to turn on the *san tools when they compile. wasm needs to help them a bit for this to work well (mmap supporting MAP_FIXED, some configuration on the runtime) but overall there's no special support from wasm itself: these capabilities are useful beyond the sanitizers.

In the specific case of indirect calls (what this issue is about): does wasm make it possible to efficiently implement control-flow sanitizer? The LLVM approach claims almost no overhead on Chromium, and I hope the same holds when targeting wasm.

@lukewagner
Copy link
Member

So what specifically are you interested in a control-flow sanitizer catching, given a trusted stack? All I can think of is some guards for the heap-stack.

@titzer
Copy link

titzer commented Jun 4, 2015

I only see the value of NaCl-style sandboxing as a second, added safeguard
against bugs in the wasm engine if we are going to assume the trusted stack
model. In that case the wasm engine is already non-conformant, so there's
nothing to spec.

On Thu, Jun 4, 2015 at 10:51 PM, Luke Wagner notifications@github.com
wrote:

So what specifically are you interested in a control-flow sanitizer
catching, given a trusted stack? All I can think of is some guards for the
heap-stack.


Reply to this email directly or view it on GitHub
WebAssembly/spec#89 (comment).

@jfbastien
Copy link
Member Author

A trusted stack in wasm doesn't protect an application from vtable clobber attacks which lead to a call to a function with the same signature but for the wrong object type. That's one example of where user-mode CFI, such as provided by pcc's work, is desirable.

@lukewagner
Copy link
Member

@jfbastien That's a good example. It'd be nice, in our docs if we say anything about CFI to explain what we mean above and beyond a trusted stack.

@titzer
Copy link

titzer commented Jun 4, 2015

Well, vtable pointers are part of the heap as part of the C++ object
header, so they could be damaged directly by errant writes or an attack,
since no matter what we do we have to allow user programs to store some
representation of an indirect function pointer in the heap.

On Thu, Jun 4, 2015 at 10:57 PM, JF Bastien notifications@github.com
wrote:

A trusted stack in wasm doesn't protect an application from vtable clobber
attacks which lead to a call to a function with the same signature but for
the wrong object type. That's one example of where user-mode CFI, such as
provided by pcc's work http://reviews.llvm.org/D6095, is desirable.


Reply to this email directly or view it on GitHub
WebAssembly/spec#89 (comment).

@lukewagner
Copy link
Member

I assumed that the *San mode involves some extra type checking to make sure the callee's assumed object type matches this's object type and that it's not infallible, just "tighter".

@jfbastien
Copy link
Member Author

@titzer: what @lukewagner said is exactly right, the cfi sanitizer adds extra code to LLVM bitcode (which then would become extra wasm code) to make sure vtable types match.

@sunfishcode
Copy link
Member

#278 is a pull request which is my attempt to capture the present conclusions of this conversation.

@binji
Copy link
Member

binji commented Oct 23, 2015

This has been resolved by #392.

@binji binji closed this as completed Oct 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants