-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weired utf8 parsing error. #7951
Comments
Thanks for the report, and very much appreciate the work put in to minimizing this! Looking at the code I haven't reproduce this exactly just yet but I suspect you're running into bytecodealliance/wit-bindgen#817 where this is a bug in the guest generated code where there's UB in the rust/wasm layer, which I think could explain why you're seeing such surprising behavior here. I'll try to see if I can fix that issue soon and see if it fixes this as well |
Thanks!! I'll keep an eye on that one. |
This commit fixes an soundness issue in the Rust code generator's generated code. Previously unsound code was generated when: * An import had a parameter * That had either a list or a variant of an aggregate where: * Where one field was a `list<T>` or `string` * And another field was an owned resource In this situation the Rust generator uses an "owned" type for the argument, such as `MyStruct`. This is done to reflect how ownership of the resource in the argument is lost when the function is called. The problem with this, however, is that the rest of bindings generation assumes that imported arguments are all "borrowed" meaning that raw pointers from lists/strings can be passed through to the canonical ABI. This is not the case here because the argument is owned, meaning that the generated code would record an argument to be passed to the canonical ABI and then promptly deallocate the argument. The fix here is preceded by a refactoring to how Rust manages owned types to make the fix possible. The general idea for the fix though is that while `x: MyStruct` is the argument to the function all references internally are through `&x` to ensure that it remains rooted as an argument, preserving all pointers to lists and such. This unfortunately means that ownership can no longer model movement of resources and instead interior mutability must be used (since we have to move out of `&Resource<T>` since everything is borrowed). Fixing that, however, is probably not worth the complexity at this time. Closes bytecodealliance#817 Closes bytecodealliance/wasmtime#7951
Ok I've confirmed that your issue should be fixed by bytecodealliance/wit-bindgen#846, so I think it was indeed that wit-bindgen issue. |
@alexcrichton thanks so much!!! I super appreciate how quickly you did this! |
This commit fixes an soundness issue in the Rust code generator's generated code. Previously unsound code was generated when: * An import had a parameter * That had either a list or a variant of an aggregate where: * Where one field was a `list<T>` or `string` * And another field was an owned resource In this situation the Rust generator uses an "owned" type for the argument, such as `MyStruct`. This is done to reflect how ownership of the resource in the argument is lost when the function is called. The problem with this, however, is that the rest of bindings generation assumes that imported arguments are all "borrowed" meaning that raw pointers from lists/strings can be passed through to the canonical ABI. This is not the case here because the argument is owned, meaning that the generated code would record an argument to be passed to the canonical ABI and then promptly deallocate the argument. The fix here is preceded by a refactoring to how Rust manages owned types to make the fix possible. The general idea for the fix though is that while `x: MyStruct` is the argument to the function all references internally are through `&x` to ensure that it remains rooted as an argument, preserving all pointers to lists and such. This unfortunately means that ownership can no longer model movement of resources and instead interior mutability must be used (since we have to move out of `&Resource<T>` since everything is borrowed). Fixing that, however, is probably not worth the complexity at this time. Closes bytecodealliance#817 Closes bytecodealliance/wasmtime#7951
* Refactor how Rust handles ownership This commit is the next in a long line of refactors to try to model how the Rust generator handles ownership. The Rust generator has an `ownership` knob for deciding how to render types. The Rust generator additionally models ownership of resources/lists in params/results of imports/exports. Putting all of this together has led to many attempts to wrangle this in a form that's understandable but every time a bug comes up I feel like I don't understand what's going on. I've made another attempt in this commit. Here the goal is to centralize knowledge about types as much as possible. Instead of being spread out over a few methods everything is hopefully now centralized in a single type and a single "main method". All other little pieces stem from these two and are modeled to be relatively simple compared to the "main method". Whether or not this stands the test of time we'll see. This does change generated code in some situations as can be seen by the test that was updated. Nothing major should be changed, but a few more borrows are now required in places which probably should have been borrows before. More comments are found in the code about specific changes made here. * Fix unsoundness in generated Rust code This commit fixes an soundness issue in the Rust code generator's generated code. Previously unsound code was generated when: * An import had a parameter * That had either a list or a variant of an aggregate where: * Where one field was a `list<T>` or `string` * And another field was an owned resource In this situation the Rust generator uses an "owned" type for the argument, such as `MyStruct`. This is done to reflect how ownership of the resource in the argument is lost when the function is called. The problem with this, however, is that the rest of bindings generation assumes that imported arguments are all "borrowed" meaning that raw pointers from lists/strings can be passed through to the canonical ABI. This is not the case here because the argument is owned, meaning that the generated code would record an argument to be passed to the canonical ABI and then promptly deallocate the argument. The fix here is preceded by a refactoring to how Rust manages owned types to make the fix possible. The general idea for the fix though is that while `x: MyStruct` is the argument to the function all references internally are through `&x` to ensure that it remains rooted as an argument, preserving all pointers to lists and such. This unfortunately means that ownership can no longer model movement of resources and instead interior mutability must be used (since we have to move out of `&Resource<T>` since everything is borrowed). Fixing that, however, is probably not worth the complexity at this time. Closes #817 Closes bytecodealliance/wasmtime#7951
* Refactor how Rust handles ownership This commit is the next in a long line of refactors to try to model how the Rust generator handles ownership. The Rust generator has an `ownership` knob for deciding how to render types. The Rust generator additionally models ownership of resources/lists in params/results of imports/exports. Putting all of this together has led to many attempts to wrangle this in a form that's understandable but every time a bug comes up I feel like I don't understand what's going on. I've made another attempt in this commit. Here the goal is to centralize knowledge about types as much as possible. Instead of being spread out over a few methods everything is hopefully now centralized in a single type and a single "main method". All other little pieces stem from these two and are modeled to be relatively simple compared to the "main method". Whether or not this stands the test of time we'll see. This does change generated code in some situations as can be seen by the test that was updated. Nothing major should be changed, but a few more borrows are now required in places which probably should have been borrows before. More comments are found in the code about specific changes made here. * Fix unsoundness in generated Rust code This commit fixes an soundness issue in the Rust code generator's generated code. Previously unsound code was generated when: * An import had a parameter * That had either a list or a variant of an aggregate where: * Where one field was a `list<T>` or `string` * And another field was an owned resource In this situation the Rust generator uses an "owned" type for the argument, such as `MyStruct`. This is done to reflect how ownership of the resource in the argument is lost when the function is called. The problem with this, however, is that the rest of bindings generation assumes that imported arguments are all "borrowed" meaning that raw pointers from lists/strings can be passed through to the canonical ABI. This is not the case here because the argument is owned, meaning that the generated code would record an argument to be passed to the canonical ABI and then promptly deallocate the argument. The fix here is preceded by a refactoring to how Rust manages owned types to make the fix possible. The general idea for the fix though is that while `x: MyStruct` is the argument to the function all references internally are through `&x` to ensure that it remains rooted as an argument, preserving all pointers to lists and such. This unfortunately means that ownership can no longer model movement of resources and instead interior mutability must be used (since we have to move out of `&Resource<T>` since everything is borrowed). Fixing that, however, is probably not worth the complexity at this time. Closes #817 Closes bytecodealliance/wasmtime#7951
This issue is a weird one. There seems to be some very specific set of inputs that trigger a utf8 parsing error, and modifying the inputs even just a little can stop it from happening. This was really hard to get to reproduce reliably.
I ran into this issue twice, and this is my smallest reproduction that triggers it. Sorry if it's too convoluted, don't know how to make it smaller, and already spent quite a few hours on it.
Test Case
Wasm input:
https://github.com/MendyBerger/wasmtime-utf8-parsing-error/blob/main/wasm/src/lib.rs
Actual wasm:
https://github.com/MendyBerger/wasmtime-utf8-parsing-error/blob/main/wasm/out.wasm
Runtime code:
https://github.com/MendyBerger/wasmtime-utf8-parsing-error/blob/main/runtime/src/main.rs
Steps to Reproduce
Clone the repo at https://github.com/MendyBerger/wasmtime-utf8-parsing-error.
Compile the wasm module.
Run with the runtime.
Here's a list of commands
Expected Results
Expect it to just run.
Actual Results
Fails with the following error:
Versions and Environment
Wasmtime version or commit:
17.0.1
Operating system: Ubuntu
Architecture: x86
Extra Info
Some of the weirdness:
The text was updated successfully, but these errors were encountered: