Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss implementation defined behavior #75

Closed
jfbastien opened this issue May 19, 2015 · 21 comments
Closed

Discuss implementation defined behavior #75

jfbastien opened this issue May 19, 2015 · 21 comments
Assignees
Milestone

Comments

@jfbastien
Copy link
Member

Opening this bug so I go back and write documentation about this.

We want to avoid all forms of undefined behavior which can lead to nasal demons, and instead discuss how the wasm platforms allows for implementation defined behavior and what acceptable behavior is.

C/C++ UB is progressively refined by the compiler, and can be affected by tools such as sanitizers. The wasm platform then nails down some behaviors and leaves other open to the implementation. The implementation can then decide, based on the OS/ISA it's executing on, what the behavior is.

Note that behaviors include: "what happens if an enum is out of range", "shift by bitwidth or larger", "what do out-of-bounds accesses do", "what about unaligned accesses", "data races", and much more exciting things!

As a reference PNaCl has a non-comprehensive list of undefined behavior.

@sunfishcode
Copy link
Member

Our present answers:

  • "what happens if an enum is out of range" -> C++ compilers lower enums; that's not wasm's problem
  • "shift by bitwidth or larger" -> just works
  • "what do out-of-bounds accesses do" -> they trap. Or in the non-ideal semantics presently being discussed, there are specific unsavory possibilities.
  • "what about unaligned accesses" -> they work, but if the alignment is less than is claimed, they may be slow
  • "data races" -> Garbage values and non-deterministic orderings seem unavoidable. I am hoping we can draw the line there, but it's not formalized at present.

There's also:

  • NaN sign bits and payloads after floating-point operations are implementation-dependent
  • SIMD may want to retain the "subnormals may or may not be flushed" clause

@kg
Copy link
Contributor

kg commented May 19, 2015

"shift by bitwidth or larger" -> just works probably isn't precise enough. IIRC some architectures have different answers to what 'just works' means here. Extra bits masked off/mod N'd, clamped, treated as 0, etc. The threshold at which the masking/clamping happens varies too.

@sunfishcode
Copy link
Member

AstSemantics.md has the full scoop. Shift counts are unsigned, unmasked, unclamped, and not treated as zero unless they are zero.

@sunfishcode
Copy link
Member

A few more things:

  • there's a maximum callstack depth which depends on dynamic conditions; if the program exceeds that, it traps (Mention that stack overflow is checked. #77)
  • dynamically resizing the heap may fail due to allocation failure
  • programs may fail to start for numerous reasons

Unless I've missed something, this is a comprehensive list of incompletely specified behavior in the language itself, at present.

@titzer
Copy link

titzer commented May 20, 2015

On Wed, May 20, 2015 at 4:30 AM, Dan Gohman notifications@github.com
wrote:

A few more things:

We should group these two under the category "exceeding resources of
execution engine. "

  • programs may fail to start for numerous reasons

We should list these reasons. For example, linking failures, verification
failures, resource exhaustion.

Unless I've missed something, this is a comprehensive list of incompletely
specified behavior in the language itself, at present.


Reply to this email directly or view it on GitHub
WebAssembly/spec#75 (comment).

@sunfishcode
Copy link
Member

Another thing:

  • SIMD.js is currently proposed to have reciprocal and reciprocal sqrt approximation functions. As approximations, the specific results may vary between platforms.

@sunfishcode
Copy link
Member

I created #87 to start a document collecting the list here.

@jfbastien
Copy link
Member Author

I'm hoping that we can explain UB as a progressive filtering: C++ has wide UB, the compiler narrows it somehow, sanitizers can narrow it more, and then wasm filters it more into implementation-defined behavior.

@sunfishcode
Copy link
Member

I agree, that sounds useful.

@sunfishcode
Copy link
Member

On the other hand, this isn't specific to WebAssembly; it's just how C++ works, on any platform. So while there's value in explaining how C++ works to C++ developers, it's not clear where this would fit into the WebAssembly documentation.

@titzer
Copy link

titzer commented Jun 1, 2015

I think it's important that we limit the scope of undefined behavior or
implementation-defined behavior in wasm. That doesn't seem to be a priority
in the C++ world, but it'd be nice to say, e.g. a misaligned load doesn't
cause your program to jump into the middle of "sqrt" or trash half the heap.

-B

On Mon, Jun 1, 2015 at 8:16 PM, Dan Gohman notifications@github.com wrote:

On the other hand, this isn't specific to WebAssembly; it's just how C++
works, on any platform. So while there's value in explaining how C++ works
to C++ developers, it's not clear where this would fit into the WebAssembly
documentation.


Reply to this email directly or view it on GitHub
WebAssembly/spec#75 (comment).

@sunfishcode
Copy link
Member

We are indeed very strenuously limiting the scope of undefined behavior and implementation-defined behavior in wasm.

And we do have pretty good control flow integrity, since return addresses are stored on the trusted stack and can't be clobbered, and indirect calls will always call into the beginning of some function, never into the middle of a function or into garbage memory. We should advertise this more in the documentation.

However, we can't change C++ itself. After a program is compiled to wasm, its behavior will be relatively fixed (races and other documented details notwithstanding), but before that, C++ optimizers are known to take extensive advantage of the threat of nasal demons, and can trash half the heap if they think they're optimizing something.

@titzer
Copy link

titzer commented Jun 2, 2015

On Tue, Jun 2, 2015 at 3:56 AM, Dan Gohman notifications@github.com wrote:

We are indeed very strenuously limiting the scope of undefined behavior
and implementation-defined behavior in wasm
http://IncompletelySpecifiedBehavior.md.

And we do have pretty good control flow integrity, since return addresses
are stored on the trusted stack and can't be clobbered, and indirect calls
will always call into the beginning of some function, never into the
middle of a function or into garbage memory. We should advertise this more
in the documentation.

However, we can't change C++ itself. After a program is compiled to wasm,
its behavior will be relatively fixed (races and other documented details
notwithstanding), but before that, C++ optimizers are known to take
extensive advantage of the threat of nasal demons, and can trash half the
heap if they think they're optimizing something.

Agree; nasal demons are C++ compiler territory; we should just make this
explicit in our documentation.

-B


Reply to this email directly or view it on GitHub
WebAssembly/spec#75 (comment).

@sunfishcode
Copy link
Member

#102 is an attempt at addressing the concerns discussed here.

@jfbastien
Copy link
Member Author

I'd like to capture a point I made in #102:

I'm not sure that we want to guarantee that there is a trusted call stack, that branches always have a valid destination, or that an application can't clobber the call stack. In the context of running untrusted code on the web we definitely want this guarantee, but I see it as an implementation detail. We should make it possible to implement Web Assembly with entirely different sandboxes, or under entirely different security models.

Two examples:

  • Targeting NaCl means there doesn't need to be a trusted stack to enforce security when running untrusted code.
  • Environments such as node.js have a different security boundary, and don't necessarily need to treat code as untrusted and pay the associated cost.

@jfbastien
Copy link
Member Author

Linking to issue #105: Alignment will probably require implementation-defined behavior.

@lukewagner
Copy link
Member

Update on #105: unless @titzer's search finds anything, looks like we get to keep deterministic behavior concerning alignment.

@sunfishcode
Copy link
Member

@jfbastien does #102 address the concerns here, or is there more you'd like to do here?

@jfbastien
Copy link
Member Author

I'll want to revisit this with @davidsehr and others, but that can wait until after going public. Let's just leave it open for now, and try to close before MVP.

@jfbastien jfbastien added this to the MVP milestone Jun 10, 2015
@jfbastien jfbastien self-assigned this Jun 10, 2015
@binji
Copy link
Member

binji commented Oct 23, 2015

Closing along with #107 being closed.

@binji binji closed this as completed Oct 23, 2015
@jfbastien
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants