Implement interrupting wasm code, reimplement stack overflow #1490

alexcrichton · 2020-04-09T19:08:16Z

This commit is a relatively large change for wasmtime with two main
goals:

Primarily this enables interrupting executing wasm code with a trap,
preventing infinite loops in wasm code. Note that resumption of the
wasm code is not a goal of this commit.
Additionally this commit reimplements how we handle stack overflow to
ensure that host functions always have a reasonable amount of stack to
run on. This fixes an issue where we might longjmp out of a host
function, skipping destructors.

Lots of various odds and ends end up falling out in this commit once the
two goals above were implemented. The strategy for implementing this was
also lifted from Spidermonkey and existing functionality inside of
Cranelift. I've tried to write up thorough documentation of how this all
works in crates/environ/src/cranelift.rs where gnarly-ish bits are.

A brief summary of how this works is that each function and each loop
header now checks to see if they're interrupted. Interrupts and the
stack overflow check are actually folded into one now, where function
headers check to see if they've run out of stack and the sentinel value
used to indicate an interrupt, checked in loop headers, tricks functions
into thinking they're out of stack. An interrupt is basically just
writing a value to a location which is read by JIT code.

When interrupts are delivered and what triggers them has been left up to
embedders of the wasmtime crate. The wasmtime::Store type has a
method to acquire an InterruptHandle, where InterruptHandle is a
Send and Sync type which can travel to other threads (or perhaps
even a signal handler) to get notified from. It's intended that this
provides a good degree of flexibility when interrupting wasm code. Note
though that this does have a large caveat where interrupts don't work
when you're interrupting host code, so if you've got a host import
blocking for a long time an interrupt won't actually be received until
the wasm starts running again.

Some fallout included from this change is:

Unix signal handlers are no longer registered with SA_ONSTACK.
Instead they run on the native stack the thread was already using.
This is possible since stack overflow isn't handled by hitting the
guard page, but rather it's explicitly checked for in wasm now. Native
stack overflow will continue to abort the process as usual.
Unix sigaltstack management is now no longer necessary since we don't
use it any more.
Windows no longer has any need to reset guard pages since we no longer
try to recover from faults on guard pages.
On all targets probestack intrinsics are disabled since we use a
different mechanism for catching stack overflow.
The C API has been updated with interrupts handles. An example has
also been added which shows off how to interrupt a module.

Closes #139
Closes #860
Closes #900

github-actions · 2020-04-09T19:32:38Z

Subscribe to Label Action

This issue or pull request has been labeled: "cranelift", "fuzzing", "wasmtime:api", "wasmtime:c-api"

Users Subscribed to "cranelift"

@bnjbvr

Users Subscribed to "fuzzing"

@fitzgen

Users Subscribed to "wasmtime:api"

@peterhuene

Users Subscribed to "wasmtime:c-api"

@peterhuene

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

fitzgen · 2020-04-09T19:46:04Z

Haven't looked at the code yet, but I have a question:

The wasmtime::Store type has a
method to acquire an InterruptHandle,

Does this mean that the finest granularity of interrupt is a store?

That seems fine for now, in the single-threaded wasm world, but will likely need reworking to be finer grained when we support threads. Do you have an idea how much more work it would be to make this per-instance?

alexcrichton · 2020-04-09T19:47:16Z

@fitzgen yeah adapting this shouldn't be too hard at all, it's just a shared Arc data structure and it would be easy to arrange all instances linked together to have access to that Arc

github-actions · 2020-04-09T20:05:11Z

Subscribe to Label Action

This issue or pull request has been labeled: "wasi"

Users Subscribed to "wasi"

@kubkon

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

cranelift/codegen/src/ir/function.rs

cranelift/reader/src/run_command.rs

fitzgen · 2020-04-10T16:32:43Z

crates/api/src/runtime.rs

+    ///
+    /// By default this option is 1 MB.
+    pub fn max_wasm_stack(&mut self, size: usize) -> &mut Self {
+        self.max_wasm_stack = size;


assert that the size is greater than zero? (or some reasonable min)

I don't think it'd help much to single-out zero here, 1 byte of stack allocation is just as nonsensical in a sense. I personally think it's ok to take everything here, and if someone is toying around with 0-byte wasm stacks for testing that seems like it's fine to allow.

crates/environ/src/cranelift.rs

fitzgen · 2020-04-10T17:16:58Z

examples/interrupt.c

+
+Note that on Windows and macOS the command will be similar, but you'll need
+to tweak the `-lpthread` and such annotations as well as the name of the
+`libwasmtime.a` file on Windows.


Also add a test for this file over here: https://github.com/bytecodealliance/wasmtime/blob/master/crates/c-api/tests/wasm-c-examples.rs

Hm I don't think this is actually that necessary, we already run all the examples on CI with cargo run -p run-examples?

Ah ok, I guess #1463 was unnecessary

crates/fuzzing/src/generators/api.rs

pepyakin

Looks just great! A couple of nits

crates/runtime/src/traphandlers.rs

crates/environ/src/cranelift.rs

alexcrichton · 2020-04-17T22:05:07Z

Ok I've pushed up a commit which uses a GlobalValue for the stack limit instead of a closure, and there's a "mini interpreter" of this GlobalValue since ins().global_value(...) can't be used this late after legalization. It's hopefully sufficient enough for now but also with better future-compatibility.

I'm not really sure how to serialize this out or parse it in the text format. If that's required can folks point me in the right direction of how to do that?

bjorn3 · 2020-04-17T22:08:25Z

You should serialize it in write_function and then parse it in cranelift-reader.

crates/environ/src/cranelift.rs

crates/environ/src/tunables.rs

crates/environ/src/vmoffsets.rs

crates/runtime/src/traphandlers.rs

examples/interrupt.rs

alexcrichton · 2020-04-20T15:11:06Z

Ok @sunfishcode, should be updated!

src/commands/run.rs

crates/runtime/src/traphandlers.rs

alexcrichton · 2020-04-20T15:50:56Z

Ok after some discussion with @sunfishcode I brought back the SA_ONSTACK flag so we can be sure to still execute the fallback to libstd's segfault handler on the sigaltstack. We already have the code to make the sigaltstack bigger, so that should cover our own purposes for now as well. This means that all signals still go through the sigaltstack.

sunfishcode

This looks good; the one thing left to do is serialize/deserialize the stack_limit field in ir::Function. This probably should be a new declaration in the preamble (see parse_preamble in cranelift/reader/src/parser.rs and write_preamble in cranelift/codegen/src/write.rs, with a syntax like "stack_limit = gv0" or so. I can answer any questions about this code, or if you want, we could probably defer this part to a separate PR, as the PR here is already pretty big.

alexcrichton · 2020-04-20T20:53:36Z

Ok the last commit should be the stack limit parsing via the syntax you suggested

This commit is a relatively large change for wasmtime with two main goals: * Primarily this enables interrupting executing wasm code with a trap, preventing infinite loops in wasm code. Note that resumption of the wasm code is not a goal of this commit. * Additionally this commit reimplements how we handle stack overflow to ensure that host functions always have a reasonable amount of stack to run on. This fixes an issue where we might longjmp out of a host function, skipping destructors. Lots of various odds and ends end up falling out in this commit once the two goals above were implemented. The strategy for implementing this was also lifted from Spidermonkey and existing functionality inside of Cranelift. I've tried to write up thorough documentation of how this all works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are. A brief summary of how this works is that each function and each loop header now checks to see if they're interrupted. Interrupts and the stack overflow check are actually folded into one now, where function headers check to see if they've run out of stack and the sentinel value used to indicate an interrupt, checked in loop headers, tricks functions into thinking they're out of stack. An interrupt is basically just writing a value to a location which is read by JIT code. When interrupts are delivered and what triggers them has been left up to embedders of the `wasmtime` crate. The `wasmtime::Store` type has a method to acquire an `InterruptHandle`, where `InterruptHandle` is a `Send` and `Sync` type which can travel to other threads (or perhaps even a signal handler) to get notified from. It's intended that this provides a good degree of flexibility when interrupting wasm code. Note though that this does have a large caveat where interrupts don't work when you're interrupting host code, so if you've got a host import blocking for a long time an interrupt won't actually be received until the wasm starts running again. Some fallout included from this change is: * Unix signal handlers are no longer registered with `SA_ONSTACK`. Instead they run on the native stack the thread was already using. This is possible since stack overflow isn't handled by hitting the guard page, but rather it's explicitly checked for in wasm now. Native stack overflow will continue to abort the process as usual. * Unix sigaltstack management is now no longer necessary since we don't use it any more. * Windows no longer has any need to reset guard pages since we no longer try to recover from faults on guard pages. * On all targets probestack intrinsics are disabled since we use a different mechanism for catching stack overflow. * The C API has been updated with interrupts handles. An example has also been added which shows off how to interrupt a module. Closes bytecodealliance#139 Closes bytecodealliance#860 Closes bytecodealliance#900

Allows libstd to print out stack overflow on failure still.

crates/api/tests/stack-overflow.rs

sunfishcode · 2020-04-21T03:54:56Z

Looks good! There's now just the two CI failures. One is an easy fix I just commented on above.

There other is a failure in host_segfault.rs on macos-latest CI:

     Running target/release/deps/host_segfault-de61d05b549d9afb

thread 'main' panicked at 'assertion failed: stderr.contains("thread 'main' has overflowed its stack")', tests/host_segfault.rs:95:9
stack backtrace:
   0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
   1: core::fmt::write
   2: std::io::Write::write_fmt
   3: std::panicking::default_hook::{{closure}}
   4: std::panicking::default_hook
   5: std::panicking::rust_panic_with_hook
   6: std::panicking::begin_panic
   7: host_segfault::main
   8: std::rt::lang_start::{{closure}}
   9: std::panicking::try::do_call
  10: __rust_maybe_catch_panic
  11: std::rt::lang_start_internal
  12: main

alexcrichton · 2020-04-21T17:43:17Z

Ok all green now!

sunfishcode · 2020-04-21T18:03:24Z

Great!

github-actions bot added cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:c-api Issues pertaining to the C API. labels Apr 9, 2020

github-actions bot added the wasi Issues pertaining to WASI label Apr 9, 2020

bjorn3 suggested changes Apr 9, 2020

View reviewed changes

cranelift/codegen/src/ir/function.rs Outdated Show resolved Hide resolved

alexcrichton mentioned this pull request Apr 10, 2020

Add new MachInst backend and ARM64 support. #1494

Merged

alexcrichton force-pushed the catch-stack-overflow branch from dca6731 to f253d3b Compare April 10, 2020 16:23

fitzgen reviewed Apr 10, 2020

View reviewed changes

alexcrichton force-pushed the catch-stack-overflow branch from 030e6d1 to fc1029f Compare April 14, 2020 14:23

pepyakin reviewed Apr 14, 2020

View reviewed changes

crates/runtime/src/traphandlers.rs Outdated Show resolved Hide resolved

crates/runtime/src/traphandlers.rs Outdated Show resolved Hide resolved

alexcrichton force-pushed the catch-stack-overflow branch from 8147d95 to 55d042a Compare April 15, 2020 13:41

alexcrichton mentioned this pull request Apr 17, 2020

Support MinGW as a wasmtime target #1535

Closed

alexcrichton force-pushed the catch-stack-overflow branch from 55d042a to c7ede53 Compare April 17, 2020 20:35

abrown reviewed Apr 17, 2020

View reviewed changes

crates/environ/src/cranelift.rs Outdated Show resolved Hide resolved

alexcrichton force-pushed the catch-stack-overflow branch from c7ede53 to 583bb80 Compare April 17, 2020 22:03

alexcrichton force-pushed the catch-stack-overflow branch from 583bb80 to 3cc664f Compare April 17, 2020 22:24

sunfishcode reviewed Apr 18, 2020

View reviewed changes

alexcrichton force-pushed the catch-stack-overflow branch from 3cc664f to 2925241 Compare April 20, 2020 14:38

sunfishcode reviewed Apr 20, 2020

View reviewed changes

src/commands/run.rs Outdated Show resolved Hide resolved

crates/runtime/src/traphandlers.rs Outdated Show resolved Hide resolved

sunfishcode reviewed Apr 20, 2020

View reviewed changes

alexcrichton force-pushed the catch-stack-overflow branch from a70cf78 to f65f3f5 Compare April 20, 2020 20:52

alexcrichton added 10 commits April 20, 2020 13:58

Update comment about magical interrupt value

aa839cb

Store stack limit as a global value, not a closure

a35fa64

Run rustfmt

edf4f57

Handle review comments

f6fbe17

Add a comment about SA_ONSTACK

d83371c

Use usize for type of INTERRUPTED

42f4edb

Parse human-readable durations

46029aa

Bring back sigaltstack handling

e954b16

Allows libstd to print out stack overflow on failure still.

Add parsing and emission of stack limit-via-preamble

4037d57

alexcrichton force-pushed the catch-stack-overflow branch from f65f3f5 to 4037d57 Compare April 20, 2020 20:58

Fix new example for new apis

646b567

sunfishcode reviewed Apr 21, 2020

View reviewed changes

crates/api/tests/stack-overflow.rs Outdated Show resolved Hide resolved

alexcrichton added 2 commits April 21, 2020 07:20

Fix host segfault test in release mode

35c533f

Fix new doc example

9798fff

sunfishcode merged commit c9a0ba8 into bytecodealliance:master Apr 21, 2020

alexcrichton deleted the catch-stack-overflow branch April 21, 2020 18:31

thomastaylor312 mentioned this pull request Apr 21, 2020

Add stopping of long running WASM processes krustlet/krustlet#212

Closed

alexcrichton mentioned this pull request Apr 21, 2020

Stack limit protections not implemented for AArch64 #1569

Closed

bacongobbler mentioned this pull request May 12, 2020

stopping running WASI function calls wasm3/wasm3#138

Closed

ppedziwiatr mentioned this pull request Oct 28, 2021

feat: isolated contract execution environment warp-contracts/warp#27

Closed

alexcrichton mentioned this pull request May 28, 2024

Remove ArgumentPurpose::StackLimit #8700

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement interrupting wasm code, reimplement stack overflow #1490

Implement interrupting wasm code, reimplement stack overflow #1490

alexcrichton commented Apr 9, 2020

github-actions bot commented Apr 9, 2020

fitzgen commented Apr 9, 2020

alexcrichton commented Apr 9, 2020

github-actions bot commented Apr 9, 2020

fitzgen Apr 10, 2020

alexcrichton Apr 10, 2020

fitzgen Apr 10, 2020

alexcrichton Apr 10, 2020

fitzgen Apr 10, 2020

pepyakin left a comment

alexcrichton commented Apr 17, 2020

bjorn3 commented Apr 17, 2020

alexcrichton commented Apr 20, 2020

alexcrichton commented Apr 20, 2020

sunfishcode left a comment

alexcrichton commented Apr 20, 2020

sunfishcode commented Apr 21, 2020

alexcrichton commented Apr 21, 2020

sunfishcode commented Apr 21, 2020

Implement interrupting wasm code, reimplement stack overflow #1490

Implement interrupting wasm code, reimplement stack overflow #1490

Conversation

alexcrichton commented Apr 9, 2020

github-actions bot commented Apr 9, 2020

Subscribe to Label Action

fitzgen commented Apr 9, 2020

alexcrichton commented Apr 9, 2020

github-actions bot commented Apr 9, 2020

Subscribe to Label Action

fitzgen Apr 10, 2020

Choose a reason for hiding this comment

alexcrichton Apr 10, 2020

Choose a reason for hiding this comment

fitzgen Apr 10, 2020

Choose a reason for hiding this comment

alexcrichton Apr 10, 2020

Choose a reason for hiding this comment

fitzgen Apr 10, 2020

Choose a reason for hiding this comment

pepyakin left a comment

Choose a reason for hiding this comment

alexcrichton commented Apr 17, 2020

bjorn3 commented Apr 17, 2020

alexcrichton commented Apr 20, 2020

alexcrichton commented Apr 20, 2020

sunfishcode left a comment

Choose a reason for hiding this comment

alexcrichton commented Apr 20, 2020

sunfishcode commented Apr 21, 2020

alexcrichton commented Apr 21, 2020

sunfishcode commented Apr 21, 2020