Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks with Wasmtime 0.22.1-preview1 #12

Closed
christophwille opened this issue Jan 20, 2021 · 9 comments
Closed

Benchmarks with Wasmtime 0.22.1-preview1 #12

christophwille opened this issue Jan 20, 2021 · 9 comments

Comments

@christophwille
Copy link
Owner

https://github.com/christophwille/csharp-opa-wasm/blob/master/src/Opa.Wasm.Benchmarks/Program.cs

Times went up, and I got a so far non-reproducible error (it happens sometimes, but not every time, eg just now I had it happen in WorkloadWarmup):

WorkloadActual 24: 128 op, 624120800.00 ns, 4.8759 ms/op
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at Wasmtime.Interop.wasm_memory_delete(IntPtr)
at Wasmtime.Interop.wasm_memory_delete(IntPtr)
at Wasmtime.Interop+MemoryHandle.ReleaseHandle()
at System.Runtime.InteropServices.SafeHandle.InternalRelease(Boolean)
at System.Runtime.InteropServices.SafeHandle.Finalize()
ExitCode != 0
// Benchmark Process 28328 has exited with code -1073741819

@peterhuene any ideas?

@peterhuene
Copy link

peterhuene commented Jan 20, 2021

It looks like your WriteString call is writing out of bounds (it really should do a bounds check to prevent out-of-bounds writes), which is why this randomly fails when trying to delete the memory from a finalizer (I'd recommend calling Dispose on _envMemory from the policy by the way; it won't fix this issue, though).

It seems like either opa_malloc is not growing the memory properly (at the time of the WriteString call, _envMemory.Size remains 2 and the addresses to write are greater than 2 Wasm pages) or something in the Wasmtime API is not properly being updated.

Note that setting the memory minimum to 3 "fixes" the issue because now the writes are no longer out-of-bounds.

I'll keep investigating.

@peterhuene
Copy link

Actually, I take that back. The writes appear in bounds. Still digging as to the problem.

@peterhuene
Copy link

There's clearly some memory corruption taking place as I saw it manifest as this panic during one of the benchmark runs:

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', crates/environ/src/vmoffsets.rs:379:22
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   3: wasmtime_environ::vmoffsets::VMOffsets::vmctx_memories_begin
   4: wasmtime_runtime::instance::Instance::imported_memory_grow
   5: wasmtime_runtime::libcalls::wasmtime_imported_memory32_grow
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: _RegisterSetjmp
  11: wasmtime_runtime::traphandlers::catch_traps
  12: wasmtime::func::Func::call
  13: _wasmtime_func_call
  14: _wasm_func_call

@christophwille
Copy link
Owner Author

Funny thing is that the sample hasn't changed since the very start of the project - only recompiled with newer versions of OPA. And we already had one memory issue with that quite a bit back. Seems that the benchmark once again tripped something.

@peterhuene
Copy link

peterhuene commented Jan 20, 2021

Current theory: the problem is the finalizer thread is corrupting some reference counters (internal to Wasmtime), causing premature frees and other unpleasantness.

Right now there are function handles (from DefineFunction return not being disposed) and a memory handle (Dispose isn't called on _envMemory) that gets put on the finalizer queue for the benchmark test.

What I believe is happening is that, due to a race with the finalizer thread, the reference count of Store is being incorrectly decremented to 0 and deallocated while the store is still being used. When a store is deallocated, all associated instances are also deallocated.

When an instance is deallocated while still in use, all sorts of unpleasantness follows as the VMContext may become corrupted.

Unfortunately, the underlying C API has thread-affinity for objects associated with a store. So this leaves us in a bind for .NET as we must either:

  • Synchronize all accesses to a store-associated handle thereby allowing the finalizer thread to call the Wasmtime API, which will hurt performance.
  • Leak memory if Dispose isn't called.
  • Do what we do now: don't leak memory, but introduce a finalizer thread race to undefined behavior.

For now you should be able to work around the problem by calling Dispose on each return from DefineFunction and also disposing _envMemory.

I also noticed that _store in OpaModule doesn't seem to be used and can be removed.

@peterhuene
Copy link

I've opened bytecodealliance/wasmtime-dotnet#53 to track the underlying issue here.

@christophwille
Copy link
Owner Author

Oh, so I totally misunderstood DefineFunction API (I thought define it and everything's fine forever)... now if the return is Disposable... does that mean after my BuildHost() method is complete, the defined function is gone too? Would I need to hold on to those Disposable until Dispose()? Or if I Dispose them right away, the definition is still there?

@christophwille
Copy link
Owner Author

Thanks for catching the missing Dispose()!

@peterhuene
Copy link

Everything is reference counted internally, so you don't need to keep the Function objects around after calling DefineFunction unless you want to pass them as funcref values to Wasm; by calling Dispose though, it'll suppress finalization and remove the race with the finalizer thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants