Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aot/jit native stack bound check improvement #2244

Merged
merged 107 commits into from
Jun 21, 2023

Conversation

yamt
Copy link
Collaborator

@yamt yamt commented May 30, 2023

summary:

Move the native stack overflow check from the caller to the callee because the former doesn't work for call_indirect and imported functions.

Make the stack usage estimation more accurate.
Instead of making a guess from the number of wasm locals in the function, use the LLVM's idea of the stack size of each MachineFunction. The former is inaccurate because a) it doesn't reflect optimization passes and b) wasm locals are not the only reason to use stack.

To use the post-compilation stack usage information without requiring 2-pass compilation or machine-code imm rewrites, introduce a global array to store stack consumption of each functions.
for JIT, use a custom IRCompiler with an extra pass to fill the array.
for AOT, use clang -fstack-usage equivalent instead because we support external llc.

Re-implement function call stack usage estimation to reflect the real calling conventions better.
(aot_estimate_stack_usage_for_function_call)

Re-implement stack estimation logic (--enable-memory-profiling) based on the new machinery.

discussions:
#2105

todo/known issues/open questions:

@yamt yamt marked this pull request as ready for review June 19, 2023 06:22
@yamt
Copy link
Collaborator Author

yamt commented Jun 19, 2023

i fixed xtensa case. it's still inefficient, but not broken.
while i haven't tested on a real hardware yet, the wamrc output looks reasonable.

core/iwasm/compilation/aot_llvm.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot.h Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_compiler.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_emit_aot_file.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_emit_aot_file.c Show resolved Hide resolved
Copy link
Contributor

@wenyongh wenyongh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yamt
Copy link
Collaborator Author

yamt commented Jun 21, 2023

i fixed xtensa case. it's still inefficient, but not broken. while i haven't tested on a real hardware yet, the wamrc output looks reasonable.

lightly tested on esp32-devkitc. it worked as expected so far.

Copy link
Collaborator

@xujuntwt95329 xujuntwt95329 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenyongh
Copy link
Contributor

i fixed xtensa case. it's still inefficient, but not broken. while i haven't tested on a real hardware yet, the wamrc output looks reasonable.

lightly tested on esp32-devkitc. it worked as expected so far.

OK, it seems there is no comment from other developers, let's merge this PR?

@yamt
Copy link
Collaborator Author

yamt commented Jun 21, 2023

i fixed xtensa case. it's still inefficient, but not broken. while i haven't tested on a real hardware yet, the wamrc output looks reasonable.

lightly tested on esp32-devkitc. it worked as expected so far.

OK, it seems there is no comment from other developers, let's merge this PR?

i have no problem with it

@wenyongh wenyongh merged commit cd7941c into bytecodealliance:main Jun 21, 2023
371 checks passed
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
Move the native stack overflow check from the caller to the callee because the
former doesn't work for call_indirect and imported functions.

Make the stack usage estimation more accurate. Instead of making a guess from
the number of wasm locals in the function, use the LLVM's idea of the stack size
of each MachineFunction. The former is inaccurate because a) it doesn't reflect
optimization passes, and b) wasm locals are not the only reason to use stack.

To use the post-compilation stack usage information without requiring 2-pass
compilation or machine-code imm rewriting, introduce a global array to store
stack consumption of each functions:
For JIT, use a custom IRCompiler with an extra pass to fill the array.
For AOT, use `clang -fstack-usage` equivalent because we support external llc.

Re-implement function call stack usage estimation to reflect the real calling
conventions better. (aot_estimate_stack_usage_for_function_call)

Re-implement stack estimation logic (--enable-memory-profiling) based on the new
machinery.

Discussions: bytecodealliance#2105.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants