Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update memfd image construction to avoid excessively large images #3819

Merged

Conversation

alexcrichton
Copy link
Member

Previously memfd-based image construction had a hard limit of a 1GB
memory image but this mean that tiny wasm modules could allocate up to
1GB of memory which became a bit excessive especially in terms of memory
usage during fuzzing. To fix this the conversion to a static memory
image has been updated to first do a conversion to paged memory
initialization, which is sparse, followed by a second conversion to
static memory initialization.

The sparse construction for the paged step should make it such that the
upper/lower bounds of the initialization image are easily computed, and
then afterwards this limit can be checked against some heuristics to
determine if we're willing to commit to building up a whole static image
for that module. The heuristics have been tweaked from "must be less
than 1GB" to one of two conditions must be true:

  • Either the total memory image size is at most twice the size of the
    original paged data itself.

  • Otherwise the memory image size must be smaller than a reasonable
    threshold, currently 1MB.

We'll likely need to tweak this over time and it's still possible to
cause a lot of extra memory consumption, but for now this should be
enough to appease the fuzzers.

Closes #3815

Previously memfd-based image construction had a hard limit of a 1GB
memory image but this mean that tiny wasm modules could allocate up to
1GB of memory which became a bit excessive especially in terms of memory
usage during fuzzing. To fix this the conversion to a static memory
image has been updated to first do a conversion to paged memory
initialization, which is sparse, followed by a second conversion to
static memory initialization.

The sparse construction for the paged step should make it such that the
upper/lower bounds of the initialization image are easily computed, and
then afterwards this limit can be checked against some heuristics to
determine if we're willing to commit to building up a whole static image
for that module. The heuristics have been tweaked from "must be less
than 1GB" to one of two conditions must be true:

* Either the total memory image size is at most twice the size of the
  original paged data itself.

* Otherwise the memory image size must be smaller than a reasonable
  threshold, currently 1MB.

We'll likely need to tweak this over time and it's still possible to
cause a lot of extra memory consumption, but for now this should be
enough to appease the fuzzers.

Closes bytecodealliance#3815
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks so much for taking care of this! Just a few nits below.

crates/environ/src/module.rs Outdated Show resolved Hide resolved
crates/environ/src/module.rs Outdated Show resolved Hide resolved
@alexcrichton alexcrichton merged commit b62fe21 into bytecodealliance:main Feb 17, 2022
@alexcrichton alexcrichton deleted the less-memory-on-compile branch February 17, 2022 16:37
@cfallin
Copy link
Member

cfallin commented Feb 18, 2022

@alexcrichton I've been playing a bit with different modules to see if the above heuristics work, and I'm becoming increasingly worried. For example I have a wizened SpiderMonkey wasm with init_size of 7.8MB and data_size of 4.7MB; with some more playing with allocation ordering and such I'm sure I could push that under the 50%-density limit. Wasm modules that are Wizened from garbage-collected languages are especially vulnerable, because the memory layout tends to be more special than just "keep appending data", but really any process that snapshots could result in lower-density images because the right malloc/free pattern can create fragmentation in the heap.

My fundamental worry is, still, the performance cliff: if the user gets unlucky, they fall into a much lower-performance mode. It feels like we're optimizing for the wrong thing here: an abstract notion of sparse memory usage and leanness, but at the cost of real-world scenarios.

To give another example, if I write a program with a native toolchain and embed data in it, and that data happens to have lots of zeroes, ld is happy to produce a .data or .rodata that is huge, proportional with my program's initial memory footprint. It's optimizing for mmap-ability of the data.

I think we should do the same thing: fundamentally, we're compiling to a representation that is "close to the metal", and part of that closeness is that it has a one-to-one image of what will go into memory. Making that a behavior that one has to fit the right heuristics to get just feels wrong somehow; like JavaScript engines all over again.

So, I'd like to argue that we revert this change, and go back to a static limit of some sort, as I had proposed above (EDIT: actually in the discussion in #3815). We could perhaps make it user-configurable (I'd happily agree that baked-in arbitrary limits are bad), but I want to make sure the cliff is harder to hit than "oops, did too much wizening and heap is a bit sparse" :-)

Thoughts?

@cfallin
Copy link
Member

cfallin commented Feb 18, 2022

Ah, and one more advantage of a static limit-based approach is that we can take advantage of the "image and leftovers" aspect of our initialization data structure: we can build a dense image, ready to mmap, for all memory up to the last data segment that is below our bound; and then only do eager initialization for data beyond that.

This means that e.g. if we have relatively dense heap down in the 0..N MB range, and then a random byte up at 1GiB, we don't reject the whole thing and do eager init of N megabytes; instead we continue to do memfd as normal and then just do eager init of the one random high byte.

I'll go ahead and create an issue for this rather than braindumping on a closed PR, sorry :-)

mpardesh pushed a commit to avanhatt/wasmtime that referenced this pull request Mar 17, 2022
…tecodealliance#3819)

* Update memfd image construction to avoid excessively large images

Previously memfd-based image construction had a hard limit of a 1GB
memory image but this mean that tiny wasm modules could allocate up to
1GB of memory which became a bit excessive especially in terms of memory
usage during fuzzing. To fix this the conversion to a static memory
image has been updated to first do a conversion to paged memory
initialization, which is sparse, followed by a second conversion to
static memory initialization.

The sparse construction for the paged step should make it such that the
upper/lower bounds of the initialization image are easily computed, and
then afterwards this limit can be checked against some heuristics to
determine if we're willing to commit to building up a whole static image
for that module. The heuristics have been tweaked from "must be less
than 1GB" to one of two conditions must be true:

* Either the total memory image size is at most twice the size of the
  original paged data itself.

* Otherwise the memory image size must be smaller than a reasonable
  threshold, currently 1MB.

We'll likely need to tweak this over time and it's still possible to
cause a lot of extra memory consumption, but for now this should be
enough to appease the fuzzers.

Closes bytecodealliance#3815

* Review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excessive memory usage compiling small module with memfd
2 participants