Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for validating linear memory size updates #2078

Merged

Conversation

loganek
Copy link
Collaborator

@loganek loganek commented Mar 28, 2023

When memory.grow operation is called, it updates a linear_mem_size variable in the wasm_interp_call_func_bytecode(); that update is not propagated to all the threads, therefore if the function already is running and want to access the memory that was added (after memory.grow) e.g. by calling memory.copy where we check boundaries, the memory.copy fails with: Exception: out of bounds memory access exception.

It's only been tested on classic and fast interpreter. The issue couldn't be reproduced on AOT (the mechanisms are different there, I don't think it's an issue for AOT).

The current implementation is not optimal as it causes mutex lock for memory access operations. I'd suggest though merging that to have a working, bug-free solution for the upcoming release and we can optimize that in the future if needed.

@loganek loganek force-pushed the loganek/linear-memory-update branch 2 times, most recently from 2e83ff2 to 743a913 Compare March 28, 2023 11:51
Comment on lines +12 to +14
} app_state_t;
typedef struct {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor]

Suggested change
} app_state_t;
typedef struct {
} app_state_t;
typedef struct {

@loganek loganek force-pushed the loganek/linear-memory-update branch from 743a913 to 7418940 Compare March 29, 2023 22:48
@@ -33,6 +33,7 @@ typedef float64 CellType_F64;
#define CHECK_MEMORY_OVERFLOW(bytes) \
do { \
uint64 offset1 = (uint64)offset + (uint64)addr; \
linear_mem_size = get_linear_memory_size(memory, node); \
Copy link
Contributor

@wenyongh wenyongh Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this change may be a little big and impact the performance a lot: in each memory load/store, there are two function calls, a lock/unlock and a load operations. There should be some optimization opportunities:

  1. No need to re-load the linear memory size when multi-threading isn't enabled, the local linear_mem_size will be updated after invoke memory.grow opcode and after calling other functions which may call memory.grow.
  2. For multi-threading, in memory access boundary check, we can load the linear memory size from memory->memory_data_size directly, (1) After some old refactoring, we store linear memory size in memory->memory_data_size, which always is equal to memory->num_bytes_per_page * memory->cur_page_count, (2) the load operation is atomic since the memory->memory_data_size is uint32, a basic data type, the load operation can be finished by one cpu instruction, and there can be only one cpu instruction loading/storing at the same time. We can see that for atomic i32.load opcode, LLVM just translates it into a simple mov instruction. And for AOT memory boundary check, we also just load the mem_check_bound directly in each time boundary check, and in your test, the issue isn't reproduced in AOT:
    https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/compilation/aot_emit_memory.c#L69-L74

There may be some fragmentary work to fix issue and refine the code. I uploaded another PR #2088 to fix it, could you help review and test whether it works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyongh I had similar idea to what you've got in the #2088; however, my major concern was that the memory_data_size is not atomic. I think the better option would be to make the variable atomic, but I'm not sure if we want to take a dependency on that in the iwasm core (we already have a dependency on that in the wasi-libc wrapper, but that's optional module). If we're happy with bringing atomics into iwasm, I can rework this PR and submit update to use that instead of mutexes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loganek Do we need to declare the memory_data_size variable atomic? Per my understanding, we only need to make the load operation atomic, since the store operation is locked by shared_mem_lock in wasm_enlarge_memory. If you have concerned about the load operation, we may replace:

#define get_linear_mem_size() memory->memory_data_size

to

#define get_linear_mem_size() atomic_load(&memory->memory_data_size)

Note that I have already introduce atomic_thread_fence when implementing share memory's atomic.fence opcode:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/shared/platform/include/platform_api_extension.h#L122-L128
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/interpreter/wasm_interp_classic.c#L3475

I think we can add atomic_load for memory_data_size. Is that OK?

BTW, another method to do the synchronization may be to apply the similar algorithm of GC's stop the world when doing GC reclaim: when a thread needs to memory.grow, it launches the request to stop the world, or, suspend all other threads. It waits until all other threads in waiting status, and then grows the memory, and after that, it notifies all other threads. And after other thread is waken up, it will update the local linear_mem_size. By this way, there should be no atomic operation needed for memory_data_size. Note that this algorithm isn't implemented in GC branch yet, we may implement it in the future if it is needed.

Copy link
Contributor

@wenyongh wenyongh Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loganek Did you mean to use atomic load/store for all the operations of memory data size? Yes, it may be better if the suspend/restore threads mechanism hasn't been well implemented. My concern is that it may be a big change and requires a lot of tests to ensure that it works well for multi-threading mode and doesn't impact the non-threading mode, currently we have spent a lot of time testing and almost finished all the tests based on #2088 and are preparing to release a new version. If there is a big change, we need to re-test again. My suggestion is to merge #2088 (or add atomic operation for the loading of memory->memory_data_size for an enhancement) and release version 1.2.1, per my understanding, it should also work well since the loading is an atomic operation. And after that, please refactor the code with atomic load/store if you want, and we had better implement the suspend/restore threads mechanism in the future, the latter may be a better option since it doesn't require atomic load/store operations and there is just a very short pause when other threads are waiting for a thread to do memory growing, since the memory.grow operation for shared memory is quick (not like some Java/JavaScript GCs which need to mark/sweep/unmark objects). What's your opinion? Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wenyongh , sorry for late reply. Yes, I think it'd still be valuable to either make use of atomic operations or suspend/restore mechanism here even though your fix in #2088 solves vast majority of cases. I limited the scope of this PR just to add the test so we can merge it straight away, and I'll work on the improvements in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks, let's merge this PR first.

@loganek loganek force-pushed the loganek/linear-memory-update branch from 7418940 to ff38032 Compare April 7, 2023 11:09
When memory.grow operation is called, it updates a `linear_mem_size` variable
in the `wasm_interp_call_func_bytecode()`; that update is not propagated to
all the threads, therefore if the function already is running and want to
access the memory that was added (after `memory.grow`) e.g. by calling
`memory.copy` where we check boundaries, the `memory.copy` fails with:
`Exception: out of bounds memory access` exception. I'm adding a test for now,
but will work on the implementation later.

In the meantime, suggestions on how to approach this problem are very welcome
@loganek loganek force-pushed the loganek/linear-memory-update branch from ff38032 to 0306704 Compare April 7, 2023 11:13
@wenyongh wenyongh merged commit a2d4744 into bytecodealliance:main Apr 8, 2023
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
@loganek loganek deleted the loganek/linear-memory-update branch June 10, 2024 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants