Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weired utf8 parsing error. #7951

Closed
MendyBerger opened this issue Feb 16, 2024 · 4 comments · Fixed by bytecodealliance/wit-bindgen#846
Closed

Weired utf8 parsing error. #7951

MendyBerger opened this issue Feb 16, 2024 · 4 comments · Fixed by bytecodealliance/wit-bindgen#846
Labels
bug Incorrect behavior in the current implementation that needs fixing

Comments

@MendyBerger
Copy link

MendyBerger commented Feb 16, 2024

This issue is a weird one. There seems to be some very specific set of inputs that trigger a utf8 parsing error, and modifying the inputs even just a little can stop it from happening. This was really hard to get to reproduce reliably.

I ran into this issue twice, and this is my smallest reproduction that triggers it. Sorry if it's too convoluted, don't know how to make it smaller, and already spent quite a few hours on it.

Test Case

Wasm input:
https://github.com/MendyBerger/wasmtime-utf8-parsing-error/blob/main/wasm/src/lib.rs

Actual wasm:
https://github.com/MendyBerger/wasmtime-utf8-parsing-error/blob/main/wasm/out.wasm

Runtime code:
https://github.com/MendyBerger/wasmtime-utf8-parsing-error/blob/main/runtime/src/main.rs

Steps to Reproduce

Clone the repo at https://github.com/MendyBerger/wasmtime-utf8-parsing-error.
Compile the wasm module.
Run with the runtime.

Here's a list of commands

# clone the repo
git clone https://github.com/MendyBerger/wasmtime-utf8-parsing-error
cd wasmtime-utf8-parsing-error

# compile the wasm module
cd wasm
cargo build --release --target wasm32-unknown-unknown
wasm-tools component new ./target/wasm32-unknown-unknown/release/wasm.wasm -o out.wasm

# run the runtime
cd ../runtime
cargo run

Expected Results

Expect it to just run.

Actual Results

Fails with the following error:

Error: error while executing at wasm backtrace:
    0: 0x1596ef - wit-component:shim!indirect-$root-my-func
    1:  0x4ab - <unknown>!run
note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable may show more debugging information

Caused by:
    invalid utf-8 sequence of 1 bytes from index 0

Versions and Environment

Wasmtime version or commit: 17.0.1

Operating system: Ubuntu

Architecture: x86

Extra Info

Some of the weirdness:

  • Try to change the 'p' to a 'j' here, notice how it no longer triggers the issue 🙃
  • Remove a single 'p' from this string, and the issue no longer gets triggered 😲
  • Change the 'p's here for another char, and the issue still gets triggered 🤔
@MendyBerger MendyBerger added the bug Incorrect behavior in the current implementation that needs fixing label Feb 16, 2024
@alexcrichton
Copy link
Member

Thanks for the report, and very much appreciate the work put in to minimizing this! Looking at the code I haven't reproduce this exactly just yet but I suspect you're running into bytecodealliance/wit-bindgen#817 where this is a bug in the guest generated code where there's UB in the rust/wasm layer, which I think could explain why you're seeing such surprising behavior here.

I'll try to see if I can fix that issue soon and see if it fixes this as well

@MendyBerger
Copy link
Author

Thanks!! I'll keep an eye on that one.

alexcrichton added a commit to alexcrichton/witx-bindgen that referenced this issue Feb 17, 2024
This commit fixes an soundness issue in the Rust code generator's generated
code. Previously unsound code was generated when:

* An import had a parameter
* That had either a list or a variant of an aggregate where:
* Where one field was a `list<T>` or `string`
* And another field was an owned resource

In this situation the Rust generator uses an "owned" type for the argument,
such as `MyStruct`. This is done to reflect how ownership of the resource in
the argument is lost when the function is called. The problem with this,
however, is that the rest of bindings generation assumes that imported
arguments are all "borrowed" meaning that raw pointers from lists/strings can
be passed through to the canonical ABI. This is not the case here because the
argument is owned, meaning that the generated code would record an argument to
be passed to the canonical ABI and then promptly deallocate the argument.

The fix here is preceded by a refactoring to how Rust manages owned types to
make the fix possible. The general idea for the fix though is that while `x:
MyStruct` is the argument to the function all references internally are through
`&x` to ensure that it remains rooted as an argument, preserving all pointers
to lists and such. This unfortunately means that ownership can no longer model
movement of resources and instead interior mutability must be used (since we
have to move out of `&Resource<T>` since everything is borrowed). Fixing that,
however, is probably not worth the complexity at this time.

Closes bytecodealliance#817
Closes bytecodealliance/wasmtime#7951
@alexcrichton
Copy link
Member

Ok I've confirmed that your issue should be fixed by bytecodealliance/wit-bindgen#846, so I think it was indeed that wit-bindgen issue.

@MendyBerger
Copy link
Author

@alexcrichton thanks so much!!! I super appreciate how quickly you did this!

alexcrichton added a commit to alexcrichton/witx-bindgen that referenced this issue Feb 20, 2024
This commit fixes an soundness issue in the Rust code generator's generated
code. Previously unsound code was generated when:

* An import had a parameter
* That had either a list or a variant of an aggregate where:
* Where one field was a `list<T>` or `string`
* And another field was an owned resource

In this situation the Rust generator uses an "owned" type for the argument,
such as `MyStruct`. This is done to reflect how ownership of the resource in
the argument is lost when the function is called. The problem with this,
however, is that the rest of bindings generation assumes that imported
arguments are all "borrowed" meaning that raw pointers from lists/strings can
be passed through to the canonical ABI. This is not the case here because the
argument is owned, meaning that the generated code would record an argument to
be passed to the canonical ABI and then promptly deallocate the argument.

The fix here is preceded by a refactoring to how Rust manages owned types to
make the fix possible. The general idea for the fix though is that while `x:
MyStruct` is the argument to the function all references internally are through
`&x` to ensure that it remains rooted as an argument, preserving all pointers
to lists and such. This unfortunately means that ownership can no longer model
movement of resources and instead interior mutability must be used (since we
have to move out of `&Resource<T>` since everything is borrowed). Fixing that,
however, is probably not worth the complexity at this time.

Closes bytecodealliance#817
Closes bytecodealliance/wasmtime#7951
github-merge-queue bot pushed a commit to bytecodealliance/wit-bindgen that referenced this issue Feb 20, 2024
* Refactor how Rust handles ownership

This commit is the next in a long line of refactors to try to model how
the Rust generator handles ownership. The Rust generator has an
`ownership` knob for deciding how to render types. The Rust generator
additionally models ownership of resources/lists in params/results of
imports/exports. Putting all of this together has led to many attempts
to wrangle this in a form that's understandable but every time a bug
comes up I feel like I don't understand what's going on. I've made
another attempt in this commit.

Here the goal is to centralize knowledge about types as much as
possible. Instead of being spread out over a few methods everything is
hopefully now centralized in a single type and a single "main method".
All other little pieces stem from these two and are modeled to be
relatively simple compared to the "main method". Whether or not this
stands the test of time we'll see.

This does change generated code in some situations as can be seen by the
test that was updated. Nothing major should be changed, but a few more
borrows are now required in places which probably should have been
borrows before.

More comments are found in the code about specific changes made here.

* Fix unsoundness in generated Rust code

This commit fixes an soundness issue in the Rust code generator's generated
code. Previously unsound code was generated when:

* An import had a parameter
* That had either a list or a variant of an aggregate where:
* Where one field was a `list<T>` or `string`
* And another field was an owned resource

In this situation the Rust generator uses an "owned" type for the argument,
such as `MyStruct`. This is done to reflect how ownership of the resource in
the argument is lost when the function is called. The problem with this,
however, is that the rest of bindings generation assumes that imported
arguments are all "borrowed" meaning that raw pointers from lists/strings can
be passed through to the canonical ABI. This is not the case here because the
argument is owned, meaning that the generated code would record an argument to
be passed to the canonical ABI and then promptly deallocate the argument.

The fix here is preceded by a refactoring to how Rust manages owned types to
make the fix possible. The general idea for the fix though is that while `x:
MyStruct` is the argument to the function all references internally are through
`&x` to ensure that it remains rooted as an argument, preserving all pointers
to lists and such. This unfortunately means that ownership can no longer model
movement of resources and instead interior mutability must be used (since we
have to move out of `&Resource<T>` since everything is borrowed). Fixing that,
however, is probably not worth the complexity at this time.

Closes #817
Closes bytecodealliance/wasmtime#7951
rvolosatovs pushed a commit to bytecodealliance/wrpc that referenced this issue May 23, 2024
* Refactor how Rust handles ownership

This commit is the next in a long line of refactors to try to model how
the Rust generator handles ownership. The Rust generator has an
`ownership` knob for deciding how to render types. The Rust generator
additionally models ownership of resources/lists in params/results of
imports/exports. Putting all of this together has led to many attempts
to wrangle this in a form that's understandable but every time a bug
comes up I feel like I don't understand what's going on. I've made
another attempt in this commit.

Here the goal is to centralize knowledge about types as much as
possible. Instead of being spread out over a few methods everything is
hopefully now centralized in a single type and a single "main method".
All other little pieces stem from these two and are modeled to be
relatively simple compared to the "main method". Whether or not this
stands the test of time we'll see.

This does change generated code in some situations as can be seen by the
test that was updated. Nothing major should be changed, but a few more
borrows are now required in places which probably should have been
borrows before.

More comments are found in the code about specific changes made here.

* Fix unsoundness in generated Rust code

This commit fixes an soundness issue in the Rust code generator's generated
code. Previously unsound code was generated when:

* An import had a parameter
* That had either a list or a variant of an aggregate where:
* Where one field was a `list<T>` or `string`
* And another field was an owned resource

In this situation the Rust generator uses an "owned" type for the argument,
such as `MyStruct`. This is done to reflect how ownership of the resource in
the argument is lost when the function is called. The problem with this,
however, is that the rest of bindings generation assumes that imported
arguments are all "borrowed" meaning that raw pointers from lists/strings can
be passed through to the canonical ABI. This is not the case here because the
argument is owned, meaning that the generated code would record an argument to
be passed to the canonical ABI and then promptly deallocate the argument.

The fix here is preceded by a refactoring to how Rust manages owned types to
make the fix possible. The general idea for the fix though is that while `x:
MyStruct` is the argument to the function all references internally are through
`&x` to ensure that it remains rooted as an argument, preserving all pointers
to lists and such. This unfortunately means that ownership can no longer model
movement of resources and instead interior mutability must be used (since we
have to move out of `&Resource<T>` since everything is borrowed). Fixing that,
however, is probably not worth the complexity at this time.

Closes #817
Closes bytecodealliance/wasmtime#7951
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants