Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement interrupting wasm code, reimplement stack overflow #1490

Merged
merged 13 commits into from
Apr 21, 2020

Conversation

alexcrichton
Copy link
Member

This commit is a relatively large change for wasmtime with two main
goals:

  • Primarily this enables interrupting executing wasm code with a trap,
    preventing infinite loops in wasm code. Note that resumption of the
    wasm code is not a goal of this commit.

  • Additionally this commit reimplements how we handle stack overflow to
    ensure that host functions always have a reasonable amount of stack to
    run on. This fixes an issue where we might longjmp out of a host
    function, skipping destructors.

Lots of various odds and ends end up falling out in this commit once the
two goals above were implemented. The strategy for implementing this was
also lifted from Spidermonkey and existing functionality inside of
Cranelift. I've tried to write up thorough documentation of how this all
works in crates/environ/src/cranelift.rs where gnarly-ish bits are.

A brief summary of how this works is that each function and each loop
header now checks to see if they're interrupted. Interrupts and the
stack overflow check are actually folded into one now, where function
headers check to see if they've run out of stack and the sentinel value
used to indicate an interrupt, checked in loop headers, tricks functions
into thinking they're out of stack. An interrupt is basically just
writing a value to a location which is read by JIT code.

When interrupts are delivered and what triggers them has been left up to
embedders of the wasmtime crate. The wasmtime::Store type has a
method to acquire an InterruptHandle, where InterruptHandle is a
Send and Sync type which can travel to other threads (or perhaps
even a signal handler) to get notified from. It's intended that this
provides a good degree of flexibility when interrupting wasm code. Note
though that this does have a large caveat where interrupts don't work
when you're interrupting host code, so if you've got a host import
blocking for a long time an interrupt won't actually be received until
the wasm starts running again.

Some fallout included from this change is:

  • Unix signal handlers are no longer registered with SA_ONSTACK.
    Instead they run on the native stack the thread was already using.
    This is possible since stack overflow isn't handled by hitting the
    guard page, but rather it's explicitly checked for in wasm now. Native
    stack overflow will continue to abort the process as usual.

  • Unix sigaltstack management is now no longer necessary since we don't
    use it any more.

  • Windows no longer has any need to reset guard pages since we no longer
    try to recover from faults on guard pages.

  • On all targets probestack intrinsics are disabled since we use a
    different mechanism for catching stack overflow.

  • The C API has been updated with interrupts handles. An example has
    also been added which shows off how to interrupt a module.

Closes #139
Closes #860
Closes #900

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:c-api Issues pertaining to the C API. labels Apr 9, 2020
@github-actions
Copy link

github-actions bot commented Apr 9, 2020

Subscribe to Label Action

This issue or pull request has been labeled: "cranelift", "fuzzing", "wasmtime:api", "wasmtime:c-api"

Users Subscribed to "cranelift"
Users Subscribed to "fuzzing"
Users Subscribed to "wasmtime:api"
Users Subscribed to "wasmtime:c-api"

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@fitzgen
Copy link
Member

fitzgen commented Apr 9, 2020

Haven't looked at the code yet, but I have a question:

The wasmtime::Store type has a
method to acquire an InterruptHandle,

Does this mean that the finest granularity of interrupt is a store?

That seems fine for now, in the single-threaded wasm world, but will likely need reworking to be finer grained when we support threads. Do you have an idea how much more work it would be to make this per-instance?

@alexcrichton
Copy link
Member Author

@fitzgen yeah adapting this shouldn't be too hard at all, it's just a shared Arc data structure and it would be easy to arrange all instances linked together to have access to that Arc

@github-actions github-actions bot added the wasi Issues pertaining to WASI label Apr 9, 2020
@github-actions
Copy link

github-actions bot commented Apr 9, 2020

Subscribe to Label Action

This issue or pull request has been labeled: "wasi"

Users Subscribed to "wasi"

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

cranelift/codegen/src/ir/function.rs Outdated Show resolved Hide resolved
cranelift/reader/src/run_command.rs Show resolved Hide resolved
///
/// By default this option is 1 MB.
pub fn max_wasm_stack(&mut self, size: usize) -> &mut Self {
self.max_wasm_stack = size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert that the size is greater than zero? (or some reasonable min)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it'd help much to single-out zero here, 1 byte of stack allocation is just as nonsensical in a sense. I personally think it's ok to take everything here, and if someone is toying around with 0-byte wasm stacks for testing that seems like it's fine to allow.

crates/environ/src/cranelift.rs Outdated Show resolved Hide resolved

Note that on Windows and macOS the command will be similar, but you'll need
to tweak the `-lpthread` and such annotations as well as the name of the
`libwasmtime.a` file on Windows.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I don't think this is actually that necessary, we already run all the examples on CI with cargo run -p run-examples?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I guess #1463 was unnecessary

crates/fuzzing/src/generators/api.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@pepyakin pepyakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks just great! A couple of nits

crates/runtime/src/traphandlers.rs Outdated Show resolved Hide resolved
crates/runtime/src/traphandlers.rs Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Member Author

Ok I've pushed up a commit which uses a GlobalValue for the stack limit instead of a closure, and there's a "mini interpreter" of this GlobalValue since ins().global_value(...) can't be used this late after legalization. It's hopefully sufficient enough for now but also with better future-compatibility.

I'm not really sure how to serialize this out or parse it in the text format. If that's required can folks point me in the right direction of how to do that?

@bjorn3
Copy link
Contributor

bjorn3 commented Apr 17, 2020

You should serialize it in write_function and then parse it in cranelift-reader.

crates/environ/src/cranelift.rs Outdated Show resolved Hide resolved
crates/environ/src/cranelift.rs Outdated Show resolved Hide resolved
crates/environ/src/cranelift.rs Outdated Show resolved Hide resolved
crates/environ/src/tunables.rs Outdated Show resolved Hide resolved
crates/environ/src/vmoffsets.rs Outdated Show resolved Hide resolved
crates/runtime/src/traphandlers.rs Outdated Show resolved Hide resolved
examples/interrupt.rs Show resolved Hide resolved
@alexcrichton
Copy link
Member Author

Ok @sunfishcode, should be updated!

src/commands/run.rs Outdated Show resolved Hide resolved
crates/runtime/src/traphandlers.rs Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Member Author

Ok after some discussion with @sunfishcode I brought back the SA_ONSTACK flag so we can be sure to still execute the fallback to libstd's segfault handler on the sigaltstack. We already have the code to make the sigaltstack bigger, so that should cover our own purposes for now as well. This means that all signals still go through the sigaltstack.

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good; the one thing left to do is serialize/deserialize the stack_limit field in ir::Function. This probably should be a new declaration in the preamble (see parse_preamble in cranelift/reader/src/parser.rs and write_preamble in cranelift/codegen/src/write.rs, with a syntax like "stack_limit = gv0" or so. I can answer any questions about this code, or if you want, we could probably defer this part to a separate PR, as the PR here is already pretty big.

@alexcrichton
Copy link
Member Author

Ok the last commit should be the stack limit parsing via the syntax you suggested

This commit is a relatively large change for wasmtime with two main
goals:

* Primarily this enables interrupting executing wasm code with a trap,
  preventing infinite loops in wasm code. Note that resumption of the
  wasm code is not a goal of this commit.

* Additionally this commit reimplements how we handle stack overflow to
  ensure that host functions always have a reasonable amount of stack to
  run on. This fixes an issue where we might longjmp out of a host
  function, skipping destructors.

Lots of various odds and ends end up falling out in this commit once the
two goals above were implemented. The strategy for implementing this was
also lifted from Spidermonkey and existing functionality inside of
Cranelift. I've tried to write up thorough documentation of how this all
works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are.

A brief summary of how this works is that each function and each loop
header now checks to see if they're interrupted. Interrupts and the
stack overflow check are actually folded into one now, where function
headers check to see if they've run out of stack and the sentinel value
used to indicate an interrupt, checked in loop headers, tricks functions
into thinking they're out of stack. An interrupt is basically just
writing a value to a location which is read by JIT code.

When interrupts are delivered and what triggers them has been left up to
embedders of the `wasmtime` crate. The `wasmtime::Store` type has a
method to acquire an `InterruptHandle`, where `InterruptHandle` is a
`Send` and `Sync` type which can travel to other threads (or perhaps
even a signal handler) to get notified from. It's intended that this
provides a good degree of flexibility when interrupting wasm code. Note
though that this does have a large caveat where interrupts don't work
when you're interrupting host code, so if you've got a host import
blocking for a long time an interrupt won't actually be received until
the wasm starts running again.

Some fallout included from this change is:

* Unix signal handlers are no longer registered with `SA_ONSTACK`.
  Instead they run on the native stack the thread was already using.
  This is possible since stack overflow isn't handled by hitting the
  guard page, but rather it's explicitly checked for in wasm now. Native
  stack overflow will continue to abort the process as usual.

* Unix sigaltstack management is now no longer necessary since we don't
  use it any more.

* Windows no longer has any need to reset guard pages since we no longer
  try to recover from faults on guard pages.

* On all targets probestack intrinsics are disabled since we use a
  different mechanism for catching stack overflow.

* The C API has been updated with interrupts handles. An example has
  also been added which shows off how to interrupt a module.

Closes bytecodealliance#139
Closes bytecodealliance#860
Closes bytecodealliance#900
Allows libstd to print out stack overflow on failure still.
@sunfishcode
Copy link
Member

Looks good! There's now just the two CI failures. One is an easy fix I just commented on above.

There other is a failure in host_segfault.rs on macos-latest CI:

     Running target/release/deps/host_segfault-de61d05b549d9afb

thread 'main' panicked at 'assertion failed: stderr.contains("thread 'main' has overflowed its stack")', tests/host_segfault.rs:95:9
stack backtrace:
   0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
   1: core::fmt::write
   2: std::io::Write::write_fmt
   3: std::panicking::default_hook::{{closure}}
   4: std::panicking::default_hook
   5: std::panicking::rust_panic_with_hook
   6: std::panicking::begin_panic
   7: host_segfault::main
   8: std::rt::lang_start::{{closure}}
   9: std::panicking::try::do_call
  10: __rust_maybe_catch_panic
  11: std::rt::lang_start_internal
  12: main

@alexcrichton
Copy link
Member Author

Ok all green now!

@sunfishcode
Copy link
Member

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure wasi Issues pertaining to WASI wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:c-api Issues pertaining to the C API.
Projects
None yet
7 participants