-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for validating linear memory size updates #2078
Add test for validating linear memory size updates #2078
Conversation
2e83ff2
to
743a913
Compare
} app_state_t; | ||
typedef struct { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor]
} app_state_t; | |
typedef struct { | |
} app_state_t; | |
typedef struct { |
743a913
to
7418940
Compare
@@ -33,6 +33,7 @@ typedef float64 CellType_F64; | |||
#define CHECK_MEMORY_OVERFLOW(bytes) \ | |||
do { \ | |||
uint64 offset1 = (uint64)offset + (uint64)addr; \ | |||
linear_mem_size = get_linear_memory_size(memory, node); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this change may be a little big and impact the performance a lot: in each memory load/store, there are two function calls, a lock/unlock and a load operations. There should be some optimization opportunities:
- No need to re-load the linear memory size when multi-threading isn't enabled, the local linear_mem_size will be updated after invoke memory.grow opcode and after calling other functions which may call memory.grow.
- For multi-threading, in memory access boundary check, we can load the linear memory size from
memory->memory_data_size
directly, (1) After some old refactoring, we store linear memory size inmemory->memory_data_size
, which always is equal tomemory->num_bytes_per_page * memory->cur_page_count
, (2) the load operation is atomic since thememory->memory_data_size
is uint32, a basic data type, the load operation can be finished by one cpu instruction, and there can be only one cpu instruction loading/storing at the same time. We can see that for atomic i32.load opcode, LLVM just translates it into a simple mov instruction. And for AOT memory boundary check, we also just load the mem_check_bound directly in each time boundary check, and in your test, the issue isn't reproduced in AOT:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/compilation/aot_emit_memory.c#L69-L74
There may be some fragmentary work to fix issue and refine the code. I uploaded another PR #2088 to fix it, could you help review and test whether it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyongh I had similar idea to what you've got in the #2088; however, my major concern was that the memory_data_size
is not atomic. I think the better option would be to make the variable atomic, but I'm not sure if we want to take a dependency on that in the iwasm
core (we already have a dependency on that in the wasi-libc wrapper, but that's optional module). If we're happy with bringing atomics into iwasm, I can rework this PR and submit update to use that instead of mutexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loganek Do we need to declare the memory_data_size
variable atomic? Per my understanding, we only need to make the load operation atomic, since the store operation is locked by shared_mem_lock
in wasm_enlarge_memory
. If you have concerned about the load operation, we may replace:
#define get_linear_mem_size() memory->memory_data_size
to
#define get_linear_mem_size() atomic_load(&memory->memory_data_size)
Note that I have already introduce atomic_thread_fence
when implementing share memory's atomic.fence
opcode:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/shared/platform/include/platform_api_extension.h#L122-L128
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/interpreter/wasm_interp_classic.c#L3475
I think we can add atomic_load
for memory_data_size
. Is that OK?
BTW, another method to do the synchronization may be to apply the similar algorithm of GC's stop the world
when doing GC reclaim: when a thread needs to memory.grow
, it launches the request to stop the world
, or, suspend all other threads. It waits until all other threads in waiting status, and then grows the memory, and after that, it notifies all other threads. And after other thread is waken up, it will update the local linear_mem_size. By this way, there should be no atomic operation needed for memory_data_size. Note that this algorithm isn't implemented in GC branch yet, we may implement it in the future if it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loganek Did you mean to use atomic load/store for all the operations of memory data size? Yes, it may be better if the suspend/restore threads mechanism hasn't been well implemented. My concern is that it may be a big change and requires a lot of tests to ensure that it works well for multi-threading mode and doesn't impact the non-threading mode, currently we have spent a lot of time testing and almost finished all the tests based on #2088 and are preparing to release a new version. If there is a big change, we need to re-test again. My suggestion is to merge #2088 (or add atomic operation for the loading of memory->memory_data_size
for an enhancement) and release version 1.2.1, per my understanding, it should also work well since the loading is an atomic operation. And after that, please refactor the code with atomic load/store if you want, and we had better implement the suspend/restore threads mechanism in the future, the latter may be a better option since it doesn't require atomic load/store operations and there is just a very short pause when other threads are waiting for a thread to do memory growing, since the memory.grow operation for shared memory is quick (not like some Java/JavaScript GCs which need to mark/sweep/unmark objects). What's your opinion? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wenyongh , sorry for late reply. Yes, I think it'd still be valuable to either make use of atomic operations or suspend/restore mechanism here even though your fix in #2088 solves vast majority of cases. I limited the scope of this PR just to add the test so we can merge it straight away, and I'll work on the improvements in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks, let's merge this PR first.
7418940
to
ff38032
Compare
When memory.grow operation is called, it updates a `linear_mem_size` variable in the `wasm_interp_call_func_bytecode()`; that update is not propagated to all the threads, therefore if the function already is running and want to access the memory that was added (after `memory.grow`) e.g. by calling `memory.copy` where we check boundaries, the `memory.copy` fails with: `Exception: out of bounds memory access` exception. I'm adding a test for now, but will work on the implementation later. In the meantime, suggestions on how to approach this problem are very welcome
ff38032
to
0306704
Compare
When memory.grow operation is called, it updates a
linear_mem_size
variable in thewasm_interp_call_func_bytecode()
; that update is not propagated to all the threads, therefore if the function already is running and want to access the memory that was added (aftermemory.grow
) e.g. by callingmemory.copy
where we check boundaries, thememory.copy
fails with:Exception: out of bounds memory access
exception.It's only been tested on classic and fast interpreter. The issue couldn't be reproduced on AOT (the mechanisms are different there, I don't think it's an issue for AOT).
The current implementation is not optimal as it causes mutex lock for memory access operations. I'd suggest though merging that to have a working, bug-free solution for the upcoming release and we can optimize that in the future if needed.