[wasm] Add custom mmap/munmap implementation for anonymous mappings #101871

kg · 2024-05-04T04:39:17Z

Introduction

The biggest hotspot in many of my startup measurements is memset. Out of memset's various callers in the runtime, the main culprits here are sgen_alloc_os_memory and sgen_alloc_os_memory_aligned, and memset is being performed on their behalf by the mmap implementation in emscripten's libc:

emscripten libc implements mmap/munmap as a broken adapter on top of malloc (not calloc), which means it has no choice but to invoke memset on every allocation in order to provide properly zeroed bytes for our allocation requests. this is theoretically fixable on the emscripten end (I'm still looking into it, but it's not trivial), but in the process of testing my theory that the memset isn't necessary, I noticed some other problems and decided to try and put together a complete implementation of mmap/munmap for our needs that can skip calling memset. The implementation is controlled by a runtime option, so you can turn it off to get emscripten mmap instead.

Some more details in no particular order:

as mentioned, emscripten mmap has to memset() to zero every allocation, even if it's already zero and even if the caller doesn't need it to be zeroed. Since it's a shim over malloc, it could theoretically use calloc (which allocates zeroed memory), but...
emscripten offers two production-quality allocators, dlmalloc and mimalloc. both of them have calloc implementations that could theoretically skip memset, but due to how emscripten configures them, that optimization doesn't engage so they still call memset. fixing this in mimalloc looks possible (but nontrivial), but dlmalloc's version of the optimization requires access to mmap (which it can't use, since mmap is implemented using malloc here).
emscripten's version of sbrk works by growing the webassembly memory, and that produces pre-zeroed pages. so memory allocated via sbrk is pre-zeroed, we shouldn't call memset. emscripten's dlmalloc and mimalloc are configured to get their memory from sbrk.
according to spec, munmap should attempt to unmap any pages in the specified region, and silently fail if the region includes pages that aren't mapped. emscripten's version of munmap fails if you attempt to do a partial unmap or unmap multiple mappings in one call. it also locates mappings by doing a full scan of an unordered list of all mappings, so the cost of munmap goes up as you allocate more memory. my testing shows that we have code that expects munmap to behave correctly, and it's just not checking for the error code emscripten is returning.
according to spec, mmap and munmap operate on whole pages, and in some cases should fail if passed addresses that aren't page-aligned. emscripten's implementation ignores the existence of pages entirely, so you can ask for 96 bytes and get back 96 bytes. (it does try to align the starting address, though!)
sgen_init_internal_allocator has a bunch of safety checks that haven't been running on wasm, because wasm's page size is so big that the termination condition on the loop is met immediately. If you enable the checks, the assertions fail. It seems like this is harmless, but I'm not certain.
the lock-free allocator has a hard limit of 16KB for its allocations, and those allocations are serviced via mmap, which means each time it allocates a block on wasm it throws 3/4 of a page away (in theory). it seems to work if I raise the limit to 64KB, but only with a real implementation of mmap - a limit of 64KB with emscripten's mmap causes crashes.
major_free_swept_blocks in sgen is hard-coded to not free pages on wasm. it appears this is because we get random crashes when freeing pages with emscripten's munmap; with my implementation we can turn freeing pages back on without issues. (I don't know if we actually want to.)
our aligned valloc implementation on wasm adds some overhead and memory waste, but with a real mmap that provides page size and alignment guarantees, we can usually skip running the pad-and-align logic and just allocate a page directly with knowledge that it will be aligned.
emscripten sbrk doesn't return aligned addresses or round allocation sizes up to the wasm page size, so it's not possible to use it to allocate a single page directly; we have to pad our allocations as a result and try to clean up the waste somehow.

Design overview

mono_wasm_page_manager (mwpm) is an extremely simple mmap implementation that has a statically-allocated page table for the whole 32-bit wasm address space. it manages fixed-size pages that match the native WASM page size (64KB), though it can be configured for smaller pages (my testing shows no real advantage from doing this.)

to service allocations, mwpm uses sbrk to grab large chunks of pre-zeroed memory, and updates its page table to record that those pages are currently free and zeroed. as a result when a caller asks for zeroed memory, there is no need to call memset to zero it. pages controlled by other callers of sbrk remain as dead zones in the page table and will not be used. in order to get aligned pages out of sbrk, it allocates an extra page worth of memory and discards the excess on either side of the allocated pages. if mwpm discovers that it has called sbrk twice in a row without another allocator (i.e. dlmalloc) getting to it, it welds the two allocations together, recovering most of the alignment waste bytes that were lost due to sbrk's lack of alignment.

Finding available pages to service allocations is done with a naive linear scan - I tested more complex strategies and they produced worse fragmentation, though it's possible I missed the perfect solution. the first suitable spot for an allocation is the one that gets used.

When pages are unmapped, they are returned to the page table in a 'free and dirty' state. Requests to map zeroed memory will eagerly use this memory but call memset to prepare it before returning it, while if the caller doesn't request zeroed memory we will skip the memset.

Performance

From an algorithmic perspective this is a bad implementation, but in testing the performance is on par with emscripten's mmap.

S.T.J.Tests finishes with a memory size of 1.409GB using emscripten libc, and a memory size of 1.410GB using mwpm. mwpm finishes the test run with 76.6% of its allocated pages in use, and enough space to allocate 461 sequential pages, so fragmentation is mostly under control. the mwpm version of the test run completed in 3:09, vs emscripten at 2:57, so it's not catastrophically slower (if it's slower at all... I still have to figure out a good way to do realistic benchmarking.) As one would expect, the memset startup bottleneck isn't present with mwpm active.

We could probably optimize page table searches and modifications using WASM SIMD once it's enabled for the runtime.

TODOs

~~Locking for multithreaded builds~~
Wire up a way for mono to communicate that it doesn't actually need zeroed memory when allocating pages
~~Instead of memsetting the entire allocation, memset only the pages that need to be zeroed~~
Find a way to make LOCK_FREE_ALLOC_SB_MAX_SIZE dynamic based on whether mwpm is active
~~Use a more complex page table representation so we don't ever have to scan pages we don't control~~
Realistic benchmark measurements other than 'is startup faster' and 'did tests get way slower'
Separate arenas of some sort for 'small' (1-2 page?) and 'large' allocations to control fragmentation?

dotnet-policy-service · 2024-05-04T04:39:29Z

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

…eap horribly

Reformat header

kg · 2024-05-07T15:06:53Z

There's more to be done to improve on this, but it feels mergeable so it would be great to at least get initial feedback.

lambdageek

looking good

src/mono/mono/utils/mono-wasm-pagemgr.h

src/mono/mono/utils/mono-wasm-pagemgr.c

lambdageek · 2024-05-07T15:33:01Z

Should we turn this off for threaded wasm? Is it already off for threaded wasm? Is it off for WASI?

kg · 2024-05-07T15:36:19Z

Should we turn this off for threaded wasm? Is it already off for threaded wasm? Is it off for WASI?

Our current implementation is actually subtly broken for threaded wasm. I'm open to disabling this for MT anyway, but it should be thread-safe thanks to the mutex. It should be enabled for WASI I think.

src/mono/mono/utils/mono-wasm-pagemgr.c

kg · 2024-05-08T05:40:58Z

After thinking some more about the skip structure I realized it has a failure mode if you perform partial unmappings.

For full unmapping, it returns the page table into a fully correct state, since all of the skip pages created by the original mapping get reset. But if you unmap the middle of an allocation you can end up with left-over skip pages that jump past free space, like so:

| alloced | free |
a98765432100000000

|
v

|alloc| free     |
a98765400000000000

It's also the case that if you have a big section of free blocks with skip data, you need to allocate from the front of it for the skip data to stay valid - in the same way otherwise you could end up skipping over free space because you haven't updated the preceding skip data.

I'm not sure how to fix this yet - maybe since skip values only go up to 64, I can just recalculate the skip values for the preceding 64 pages any time I modify pages? EDIT: That is what I did

src/mono/mono/utils/mono-wasm-pagemgr.c

Address PR feedback

src/mono/mono/utils/mono-wasm-pagemgr.c

After freeing pages, repair the skip values for the preceding 64 pages so they don't erroneously skip over newly free pages

src/mono/mono/utils/mono-mmap-wasm.c

src/mono/mono/utils/mono-wasm-pagemgr.h

lambdageek

LGTM. Although we should make a follow-up work item to figure out why MT was unhappy

…otnet#101871) * emscripten libc implements mmap/munmap as a broken adapter on top of malloc (not calloc), which means it has no choice but to invoke memset on every allocation in order to provide properly zeroed bytes for our allocation requests. this commit adds a custom mmap/munmap implementation that can skip zeroing already-zeroed pages * re-enable freeing of pages in sgen on wasm if custom mmap is active * add runtime option for custom mmap * add warning switches to fix build on debian

kg added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) arch-wasm WebAssembly architecture labels May 4, 2024

dotnet-issue-labeler bot added the area-VM-meta-mono label May 4, 2024

dotnet-policy-service bot assigned kg May 4, 2024

This was referenced May 4, 2024

[wasm] Assertion: should not be reached at D:/a/_work/1/s/src/mono/mono/sgen/sgen-scan-object.h:93 in System.Reflection.Tests #95805

Closed

[browser][MT] fail in System.Reflection.RuntimeMethodInfo.InternalInvoke #100831

Open

kg added 19 commits May 7, 2024 03:06

Checkpoint mono-wasm page manager

82214d7

Checkpoint

bad9f56

Fix lock free allocator erroneously using partial pages on wasm

d393549

Checkpoint pagemgr

38dd451

Checkpoint

14afd7c

Fix build

bf61818

Checkpoint

916fe10

Checkpoint

41f568d

Disable tests that leak large amounts of memory and/or fragment the h…

b106599

…eap horribly

Checkpoint

1bda55a

Checkpoint

36d037a

Remove alternate free page scan algorithm because it's worse

9b642a9

Reformat header

Add runtime option configuring whether to use mwpm or emscripten mmap

f5ea182

Cleanup

e617135

Re-enable set of JSON tests that fragment the heap and run slowly

7df107d

Thread safety

6493316

Fix wasi build

3b5ce02

Checkpoint: Skip blocks of pages allocated by third parties

c78da3e

New page table representation where all 4 states can have a skip count

a030fc6

kg force-pushed the wapm branch from 3cfc2c4 to a030fc6 Compare May 7, 2024 10:33

kg marked this pull request as ready for review May 7, 2024 15:06

kg requested review from lambdageek and steveisok as code owners May 7, 2024 15:06

kg requested a review from BrzVlad as a code owner May 7, 2024 15:06

lambdageek requested a review from pavelsavara May 7, 2024 15:08

lambdageek reviewed May 7, 2024

View reviewed changes

src/mono/mono/utils/mono-wasm-pagemgr.h Show resolved Hide resolved

src/mono/mono/utils/mono-wasm-pagemgr.c Outdated Show resolved Hide resolved

src/mono/mono/utils/mono-wasm-pagemgr.c Show resolved Hide resolved

kg removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 7, 2024

lambdageek reviewed May 7, 2024

View reviewed changes

src/mono/mono/utils/mono-wasm-pagemgr.c Outdated Show resolved Hide resolved

Address PR feedback

f26f3b3

lambdageek reviewed May 7, 2024

View reviewed changes

src/mono/mono/utils/mono-wasm-pagemgr.c Show resolved Hide resolved

lambdageek reviewed May 8, 2024

View reviewed changes

src/mono/mono/utils/mono-wasm-pagemgr.c Show resolved Hide resolved

Fix WASI build

958bfae

Address PR feedback

lambdageek reviewed May 8, 2024

View reviewed changes

src/mono/mono/utils/mono-wasm-pagemgr.c Show resolved Hide resolved

kg added 2 commits May 8, 2024 11:35

Improve scan algorithm

bb5c72a

After freeing pages, repair the skip values for the preceding 64 pages so they don't erroneously skip over newly free pages

Disable log message

715212f

pavelsavara reviewed May 9, 2024

View reviewed changes

src/mono/mono/utils/mono-mmap-wasm.c Show resolved Hide resolved

src/mono/mono/utils/mono-wasm-pagemgr.h Show resolved Hide resolved

build-analysis bot mentioned this pull request May 9, 2024

System.Buffers.Tests timeout on wasm MT #102040

Open

Disable custom mmap for MT builds

fd6f6d6

lambdageek approved these changes May 9, 2024

View reviewed changes

kg merged commit 6b03ebd into dotnet:main May 10, 2024
119 checks passed

This was referenced May 15, 2024

[Perf] Linux/x64: 3 Regressions on 5/10/2024 2:50:16 AM dotnet/perf-autofiling-issues#34303

Open

[wasm][perf] Tracking #96444

Open

github-actions bot locked and limited conversation to collaborators Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wasm] Add custom mmap/munmap implementation for anonymous mappings #101871

[wasm] Add custom mmap/munmap implementation for anonymous mappings #101871

kg commented May 4, 2024 •

edited

dotnet-policy-service bot commented May 4, 2024

kg commented May 7, 2024

lambdageek left a comment

lambdageek commented May 7, 2024

kg commented May 7, 2024

kg commented May 8, 2024 •

edited

lambdageek left a comment

[wasm] Add custom mmap/munmap implementation for anonymous mappings #101871

[wasm] Add custom mmap/munmap implementation for anonymous mappings #101871

Conversation

kg commented May 4, 2024 • edited

Introduction

Design overview

Performance

TODOs

dotnet-policy-service bot commented May 4, 2024

kg commented May 7, 2024

lambdageek left a comment

Choose a reason for hiding this comment

lambdageek commented May 7, 2024

kg commented May 7, 2024

kg commented May 8, 2024 • edited

lambdageek left a comment

Choose a reason for hiding this comment

kg commented May 4, 2024 •

edited

kg commented May 8, 2024 •

edited