Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm needs a better memory management story #1397

Open
juj opened this issue Feb 15, 2021 · 61 comments
Open

Wasm needs a better memory management story #1397

juj opened this issue Feb 15, 2021 · 61 comments

Comments

@juj
Copy link

juj commented Feb 15, 2021

Hi all,

after a video call with google last week, I was encouraged to raise a conversation here around issues we at Unity have with Wasm memory allocation.

The short summary is that currently Wasm has grave limitations that make many applications infeasible to be reliably deployed on mobile browsers. Here I stress the word reliably, since things may work on some devices for some % of users you deploy to, depending on how much memory your wasm page needs, but as your application's memory needs grow, the % of users you are able to deploy to can dramatically fall.

These issues already occur when the Wasm page uses only a fraction of total RAM of the device. (e.g. at 300MB-500MB)

These issues have been raised as browser issues, but the underlying theme is recognizing that the wasm spec is not robust enough for mobile deployment to customers.

These troubles stem from the following limitations:

  1. No way to control in a guaranteed fashion when new memory commit vs address space reserve occurs.
  2. No way to uncommit used memory pages.
  3. No way to shrink the allocated Wasm Memory.
  4. No virtual memory support (leading applications to either expect to always be able to grow, or have to implement memory defrag solutions)
  5. If Memory is Shared, then application needs to know the Maximum memory size ahead of time, or gratuitously reserve all that it can.

So basically Wasm memory story is "you can only grab more memory, with no guarantee if the memory you got is a reserve or a commit".

These are not particularly newly recognized issues, the memory model has been the same since MVP, and we have been dealing these ever since early asm.js days, but now that applications are becoming more complex and developers' expectations on what types of applications they want to deploy on which devices is growing, and developers are actually aiming to ship to paying customers, where reliability needs to be near that 100%, we are seeing hard ceilings on this issue in the wild.

Note that listing the limitations above is not implying that fix would be for wasm spec to somehow add support to all of these, but to set the stage that these are the limitations that exist, since their contributed combination is what causes headache to developers.

The way that Wasm VM implementations seem to tackle these issues is to try to be smart/automatic under the hood about reserve vs commit behavior, and esp. around shared vs non-shared memory. However it is still the application developer's responsibility to concretely navigate the app in the low-memory landscape, and this leads to developers needing to "decipher" the VM's behavior patterns around commit vs reserve outside the spec. For an example of the vendor-specific suggestions that this leads to, see https://bugs.chromium.org/p/chromium/issues/detail?id=1175564#c7 .

On desktop, the Wasm spec memory issues have so far fallen in the "awkward" category at most, because i) all OSes and browsers have completed migration to 64-bit already, ii) desktops can afford large 16GB+ RAM sizes (and RAM sizes are expandable on many desktops), and iii) desktops have large disk sizes for the OS to swap pages out to, so even large numbers of committed pages may not be the end of the world (just "awkward") esp. if they go unused for most parts.

On mobile, none of that is true.

Note that wasm memory64 proposal does not relate or solve to this problem. That proposal is about letting applications to use more than 4GB of memory, but this issue is about Wasm applications not being able to safely manage much smaller amounts of memory on mobile devices. (the opposite is probably true, attempting to deploy wasm64 on mobile devices would cause even more issues)

Currently allocating more than ~300MB of memory is not reliable on Chrome on Android without resorting to Chrome-specific workarounds, nor in Safari on iOS. As per the suggestions in the Chromium thread, applications should either know up front at compile time how much memory they will need, or gratuitously reserve everything that they can. Neither of these suggestions is viable.

Why Wasm requires developers to know the needed memory size at compile time

The Wasm spec says that one can conveniently set initial memory size to what they need to launch, and then grow more when the situation demands it. Setting maximum is optional, to allow for unbounded growth. On paper this suggests that developers might not need to know how much they need at compile time.

Reality is quite different, for the following reasons:

  • in the wild we have reports that memory allocation success rate can be better when initially allocate K MB, versus if you first allocate less, and later try to grow to K MB. The conversation in https://bugs.chromium.org/p/chromium/issues/detail?id=1175564#c7 also suggests that.
  • if shared memory is used, one does need to know an upper bound for the maximum memory usage.
  • since an application will need to account for the largest memory usage it may need (or it will fail at some point of its lifetime), practically initial == maximum memory.
  • one cannot set a gratuitous upper bound, since that can fail the allocation,
  • one cannot probe the largest upper bound that works in practice, since that can suffocate the browser or other JS allocations to fail.

In practice, especially on memory constrained devices, the current spec necessitates developers to somehow "just know" how much memory will be needed.

Why expecting developers to set memory size at compile time is not feasible

With respect to memory usage patterns, there are generally three types of apps/app workloads:

  1. app workloads that use an unknown amounts of memory (AutoCAD/OpenOffice/etc document editors with "bring your own workload")
  2. app workloads that use varying amounts of memory ("game menu needs 100MB, game level 1 800MB, game level 2 400MB, etc.")
  3. app workloads that need a known constant amount of memory,

App developers cannot know the wasm memory size of apps of first type. To enable everyone's work size, they must generally reserve everything they can, and this has problems:

App developers of type 2) share much of the above problems that apps of type 1) have, but one might argue they should be expected to be able to find the max needed size throughout their app lifetime and allocate that, but finding that limit can be hard work, and you may not be able to do it with 100% certainty.

Or developers of apps of type 3) might certainly be expected to choose the right needed amount and be happy with it. Initially it sounds like developers who have an app of type 3 can profile their apps to come up with a suitable initial memory size and never grow. However this has issues:

  • sometimes you don't know if your app certainly is of type 3). Hence you might allocate an initial K MB, but choose a maximum of K+delta MB to account for unexpected growth. This can cause failures to your app when you do need to grow, since the mobile device might fail the growth. (but it might have succeeded had you chosen initial:K+delta in the first place). Same goes for apps of type 2)
  • because profiling memory usage can be hard, or it may be something developers don't know how to do, application developers may choose to just allocate everything they can to "remove a problem" without being aware of the consequences. We routinely see this in practice, where e.g. on itch.io you can see simple 2D games that run with a 1.5 Gig Wasm heap of which most is unused. There is uncertainty if that is wasted committed memory, or just reservation, because the spec gives no guarantees. Then they complain that web browsers/wasm is crap when their game doesn't work on mobile.

Android app switching is a major Wasm usability pain

The documentation at https://developer.android.com/topic/performance/memory-overview at the very bottom of the page states:

Note: The less memory your app consumes while in the cache, the better
its chances are not to be killed and to be able to quickly resume.

It is a common game development QA test to perform "fast app switching" testing, which can kill game UX and player interest if it does not work. For example if a user is playing a game, then gets a WhatsApp message, they will quickly switch over to WhatsApp, type in a message, and then switch back in to the game, and expect the game to still be running. Or switch over to email, or Instagram, or whatever you have, and come back a few minutes later.

The less memory your application is consuming, the better chances you have that the page will not need to reload. With native applications this prompts the developer to push their memory usage down as much as possible when they are switched out. Mobile devices do not swap memory back to disk (at least like desktops do), but they will kill background apps if they run out of memory.

For wasm apps running in a browser, this means that for an app that has extra gig in their Wasm heap going unused because they cannot release it back to the OS, the browser will become a prime target for being killed out, and when you task switch back to the app page, the page will reload from scratch, killing fast switching.

Safari even kills you on the foreground if you allocate too much - but you have no way of knowing how much that too much is.

Some applications need address space, not memory

Native compiled wasm applications behave very similar to native applications. It is often a need for a native application to reserve a lot of address space in order to get access to a chunk of linearly consecutive memory (when existing memory allocations cannot find a linear block). Wasm applications sometimes need that too. Currently the only way to do that is to .grow() by a large amount. This means that whatever smaller bits of fragmented memory a wasm app has, can go unused, but still be committed in memory. This causes wasm apps to use more committed memory than their native counterparts.

The amount of this overhead depends on the amount of fragmentation that the wasm app causes. Most native applications have not needed to care about this for ages, but for wasm, this can be all of a sudden a huge issue. Note that memory64 proposal again does not resolve this, because it does not bring virtual memory to wasm - just changes the ISA to accept 64-bit addresses (to my best knowledge)

Summarising the problems

Reiterating, the main problems that we currently see:

  1. wasm spec expects developers to need to know the required memory size, which is not feasible for the reasons described above,
  2. wasm apps may need to run with large overallocated memories, leading to browser failures, JS alloc failures, or if lucky, "just" to Android app switching UX problems,
  3. wasm apps consume more memory than native counterparts, because of memory fragmentation, lack of virtual memory, and lack of unmapping memory pages

What can be done about the problem?

In a recent video call with ARM, we discussed the (lack of) adoption of Unity3D on Wasm on ARM mobile devices, and the short summary is that these memory issues are a hard wall for feasibility of Unity3D on Wasm on Android. There have been existing conversations in #1396 and #1300 about how to shrink memory, but no concrete progress.

On the concrete bugs front, if Chrome eventually migrates to 64-bit process on Android, it can help larger than 300MB Wasm applications to work on chrome. (However an issue here may be is that manufacturers are still releasing 32-bit only Android hardware in 2020, because of old inventory stock or what - we have no idea) If Safari fixes their eager page kill behavior, maybe it will help developers gauge the max limits on iPhones. But those will not help the problem that a committed memory page is still a committed memory page, and a mobile device does have to carry it around somewhere.

Besides that, here are some ideas:

  1. Would it be possible to make the commit vs reserve behavior explicit for Wasm? Maybe as a browser coordinated extension if not for the core spec? This would give guarantees to application developers as to what the best practices initial vs maximum vs grow semantics should be. The current situation where one browser vendor recommends to probe the max amount of memory that can be reserved, vs another browser vendor expecting that apps allocate only the minimum needed amount or be killed if they exceed that, strongly suggests that the spec is missing something to connect the expectations together.

  2. Would it be possible to add support for unmapping memory pages from Wasm? Then e.g. Emscripten could implement unmapping of memory pages into its dlmalloc() and emmalloc() implementations, fixing memory commit issues, and the related Safari "high memory consumption" process killing, and Android task switch killing troubles?

  3. Would it be possible to somehow make a softer version of WebAssembly.Memory maximum field? If an app allocates Memory with maximum=4gb, which risks the rest of the browser/JS losing its address space (in 32-bit contexts), then maybe the browser could start reclaiming the highest parts of that reserved address space for its own purposes if the wasm app hasn't .grow()n that memory into its own use yet?

Then if one allocated a Memory with maximum probed to as much as it can go, but then allocated a large regular ArrayBuffer, maybe the browser could just steal some of that maximum back, if the Wasm app hasn't .grow()n into it? Likewise, if there was a .shrink() operation that an app could make use of, then maybe paired with this kind of address space stealing logic, the Wasm app and the rest of the browser could coordinate to "trade" address space, depending on how much of it was actually committed in the wasm heap, vs not actually used.

I hope the impressions here will not be a "this should be left to implementation details", since when I raised these concerns as a browser implementation bug, the message was that maybe the wasm spec should address this. And currently browsers are certainly not providing common enough implementations to enable developers to succeed with Wasm on mobile devices.

Thanks if you read all the way to the end on the long post!

@conrad-watt
Copy link
Contributor

conrad-watt commented Feb 15, 2021

Thanks @juj, this is a great write-up! I just wanted to add a supplementary comment, but I hope someone else can chime in with a more holistic perspective (I did read the whole thing, I'm just not qualified to respond to most of it):

  1. Would it be possible to somehow make a softer version of WebAssembly.Memory maximum field? If an app allocates Memory with maximum=4gb, which risks the rest of the browser/JS losing its address space (in 32-bit contexts), then maybe the browser could start reclaiming the highest parts of that reserved address space for its own purposes if the wasm app hasn't .grow()n that memory into its own use yet?

IIUC, this is already permitted by the specification, since even when setting a maximum size it is permitted for memory.grow to start failing arbitrarily at a smaller size. This may tie into your point that even though the specification allows certain mitigations for memory problems in theory, browser divergences limit what applications can rely on (edit: and therefore we may need more spec guidance). I appreciate that even if every browser performed this mitigation on mobile, it wouldn't necessarily solve all the problems you bring up.

@pipcet
Copy link

pipcet commented Feb 15, 2021

This is a really interesting read, thank you. I may not be particularly qualified to comment on this, but my outsider's perspective is that wasm as it stands today assumes, and prohibits deviations from, a simulated physical memory model.

It should continue requiring only such a model, but allow for "full" virtual memory capabilities (with the possible exception of such pains as mapping thread-local storage into the shared address space).

This should happen in the wasm spec, rather than simply stating that all memory issues are implementation-dependent. That is because while the virtual memory model does offer near-endless possibilities, most of them can be accessed through standardized and extensible interfaces which would not be beyond the scope of such a specification. We're talking about a small number of POSIX system calls, and having reasonable fallbacks for them (such as copying rather than remapping memory).

In other words, I think this is a case where the benefits of going for a general solution outweigh the burden of implementing a few ENOSYS wrappers on low-end implementations. The initial model was way too limited, and replacing it by one that's still quite limited seems like a bad idea to me.

@KronicDeth
Copy link

KronicDeth commented Feb 15, 2021

Not having access to virtual memory and memory being committed vs reserved is one of the reasons why for the WASM target Lumen (our AoT, single-binary Erlang runtime/compile) needs to have a different memory allocator than one closer to how the BEAM VM for Erlang does memory management. @bitwalker can go into more details of the changes.

@lukewagner
Copy link
Member

lukewagner commented Feb 16, 2021

Hi @juj! There's a lot to address in your comment, but just to focus on the subsection "Why Wasm requires developers to know the needed memory size at compile time", w.r.t this bullet:

one cannot set a gratuitous upper bound, since that can fail the allocation,

Maybe I'm misunderstanding the problem or the current implementation strategies in Chrome/Safari, but the intention of having a separate initial and maximum is that the engine only fails when it isn't able to allocate initial; it should never fail trying to allocate more than initial. For maximum, the engine is encouraged to make a best-effort attempt to reserve some amount of memory between initial and maximum. For example, in Firefox, if reserving maximum fails, Firefox tries iteratively smaller allocations, down to initial. Assuming this implementation, it seems like Unity could set initial to some super-low value (below which it would be impossible to run in any case) and set maximum unconditionally to some gratuitously-high value.

Would that address this part of the problem? If so, perhaps we could ask the Chrome/Safari engineers if this matches their current implementation.

@conrad-watt
Copy link
Contributor

conrad-watt commented Feb 16, 2021

@lukewagner one aspect of the problem mentioned in that same subsection is that, at least on V8, that approach leads to memory.grow failing more often. The OP links this bug report (https://bugs.chromium.org/p/chromium/issues/detail?id=1175564#c7) where it's stated that V8 on 32-bit platforms only allocates exactly the initial memory size and performs subsequent grows using realloc.

This ties into the point made in idea (1) towards the end of the post, that the optimal strategy for picking initial is currently different depending on the browser.

EDIT: if the Firefox implementation is aggressive in reserving as much memory/address space as it can, does it ever try to release any if it's not grown into after some amount of time (in line with my comment)? One other aspect of the OP is that Wasm programs making large reservations can cause problems for mobile devices.

@kmiller68
Copy link
Contributor

kmiller68 commented Feb 17, 2021

Would that address this part of the problem? If so, perhaps we could ask the Chrome/Safari engineers if this matches their current implementation.

JavaScriptCore only reserves the requested initial. That said, currently JSC's wasm only ships on 64-bit so VA space is less of an issue. Although, we do put WASM into a large "caged" VA space so they could out of VA there but they're much more likely to get killed by the OS before that. If we ever shipped on 32-bit we would certainly have the same issue as V8.

EDIT: if the Firefox implementation is aggressive in reserving as much memory/address space as it can, does it ever try to release any if it's not grown into after some amount of time (in line with my comment)? One other aspect of the OP is that Wasm programs making large reservations can cause problems for mobile devices.

My assumption is that FF is mprotecting with PROT_NONE, which only dirties page table data in the OS. I'm not sure what they do for 32-bit, though.

@penzn
Copy link

penzn commented Feb 17, 2021

On the surface, it looks like the biggest pain point is inability to release memory. #1396 describes a workaround - reinstantiate while preserving compiled module, but that requires a high degree of compartmentalization and might not be feasible for some apps.

Shrinking memory within existing model isn't trivial. While we can grow memory by adding more pages at the end of the address space, if we do the same for shrinking it we would require defragmentation (to ensure those are actually empty), which means that a simple free won't be able to release memory. On the other hand, dropping pages from the middle would break linear indexing.

I am not sure using memory buffer for anything else would open the door for security vulnerabilities (probably not in an obvious way, but probably would require a bit of hardening), but more importantly any solution would require new instructions. I think we need a memory buffer management tied to primitives accessible from host memory management routines, which is probably close to approach 2.

There is a multi-memory proposal, maybe it would be possible to map large allocations to new memories which would get GC'd once unreferenced.

@conrad-watt
Copy link
Contributor

conrad-watt commented Feb 17, 2021

I think we need a memory buffer management tied to primitives accessible from host memory management routines, which is probably close to approach 2.

How close is this to adding a GC'd reference type representing a first-class byte buffer, with operations analogous to ArrayBuffer?

Related, there is a JS proposal for a ResizableArrayBuffer, which if implemented successfully could have implications for the viability of shrink in Wasm, at least for non-shared memories/hypothetical first-class byte buffers.

@lukewagner
Copy link
Member

lukewagner commented Feb 17, 2021

@conrad-watt Oops, I had missed that comment, sorry.

Just to give a bit more historical background: half the motivation for adding maximum was specifically to address these tensions Jukka explains w.r.t choosing the right intial size by encouraging the reservation scheme I mentioned above. Firefox used to experience asm.js OOMs acutely on Win32, motivating maximum, and, with the maximum-reservation impl techniques (especially when combined with a fresh process), we had a significant drop in Win32 OOMs, confirmed by partner telemetry.

EDIT: if the Firefox implementation is aggressive in reserving as much memory/address space as it can, does it ever try to release any if it's not grown into after some amount of time (in line with my comment)? One other aspect of the OP is that Wasm programs making large reservations can cause problems for mobile devices.

FF clamps the max reservation size to 1gb which, in practice, seems to leave enough room for the other allocations, although I could imagine also choosing a somewhat lower clamp. It's hard to design a heuristic that knows when you've seen the "last" memory.grow, so attempting to give back the reservation could cause unnecessary late OOMs. But maybe a compromise could be to hook into the system's low-memory notification and at that point release unused virtual address space?

@lukewagner
Copy link
Member

On the separate topic of shrinking: do people actually want a memory.shrink (which, given a normal fragmented malloc heap, will rarely be possible to do for any significant amount -- it seems like you'd need a custom global memory management scheme to shrink with confidence) or just some way to achieve an madvise(MADV_DONTNEED) call (which is already called by some malloc impls, like jemalloc, and in general can be more-easily adopted in an ad hoc manner).

@juj
Copy link
Author

juj commented Feb 18, 2021

To concretely help gauge the differences in browsers on this behavior, I wrote a mobile friendly interactive memory allocation test page, available at http://clb.confined.space/wasm_grow.html (self contained HTML you can download, or run live)

Here is what I see:

Huawei P10 Plus (6GB of RAM) + Android 8.0.0 + Chrome 88.0.4324.152

  1. new WebAssembly.Memory({ initial: 1 }); followed by Wasm grows, followed by JS allocations:
  • heap can be grown up to 512MB. [Chromium 1175564]❌ After that one can still allocate 767MB more of (noncontiguous) JS memory (max chunk size of 256MB) ✔️. Both Wasm and JS grow attempts later fail gracefully as JS exceptions ✔️.
  1. Same, but specify {maximum:32767} for a gratuitous maximum reservation(?) or allocation(?):
  • identical result (no help from {maximum:32767}) ❌
  1. Same as before, but also specify shared: true:
  • identical result (no help from shared: true either) ❌
  1. new WebAssembly.Memory({ initial: ? });: probe maximum allocatable Wasm size, followed by JS allocations:
  • a considerably larger 975.938 MB heap is allocated. ✔️ If this was a reserve, then it might be fine, but based on earlier comments the impression is that this is a commit(?). On top of this, further 1.3GB of (noncontiguous) JS memory can be allocated. ✔️ Concerned what will happen with Chrome on "old browser address space" scenario under this allocation scheme.
  1. new WebAssembly.Memory({ initial: 1, maximum: 900*1024/65536 }) to try to improve on 4) above, by reserving only up to the known max size that it was able to reach (and not a gratuitous maximum):
  • Wasm heap still caps out at 512MB. ❌
  1. 512MB of JS allocations, followed by new WebAssembly.Memory({ initial: 1 });, followed by Wasm grows:
  • (this test tries to simulate a scenario where the browser may have been open and running for a while ("old address space"), and a scenario where the page might allocate some JS content before the first wasm allocation) This is where varying results kick in: after having first allocated JS memory, sometimes Wasm heap caps out at 320MB, other times at 384MB, etc. ❌
  1. Fast app switching test: allocate max wasm heap and 256MB of JS, and 900MB of Wasm heap, then task switch out to some other browser tabs or apps (Instagram, email) for a short period, and come back.
  • Browser is evicted rather immediately, causing page reload when navigating back, losing the app state. ❌

Huawei P10 Plus (6GB of RAM) + Android 8.0.0 + Firefox 85.1.3

  1. from above:
  • heap can be grown up to 2GB-64K, ✔️ after which about 1GB more of JS memory can be allocated ✔️, before browser silently reloading the page (no OOM JS exception) [Bugzilla 1693256]
  1. and 3. from above: 2GB-64K alloc ✔️

  2. from above: 2GB-64K alloc. ✔️

  3. N/A

  4. Can allocate 2GB of JS memory ✔️, after which a 1GB Wasm heap allocation still succeeds ✔️. Attempting to grow wasm heap past that to 2GB will cause silent page reload with no JS exception. ❌

  5. Eviction happens, but subjectively maybe not as fast as with Chrome. ❌

iPhone Xs + iOS Safari 13.3.1

(apologies for not testing on a newer iOS Safari, but iOS update is not working to update to newer version on this phone, and I do not have any other one to test with. I hope this data is still relevant)

  1. new WebAssembly.Memory({ initial: 1 }); followed by Wasm grows, followed by JS allocations:
  • on first test, heap could be grown up to 768MB. After that one can still allocate ~700MB of JS memory, after which the browser will silently reload the page. [WebKit 221530]
  • repeating the test by reloading the page, heap could be grown only to 544MB. After that allocating ~100MB of JS memory resulted in browser reload of the page. ❌
  • repeating the test a third time, heap could be grown to 512MB, and attempting to grow wasm heap even more would cause the browser reload the page. ❌
  1. Specifying maximum: 1GB does not help, page still reloads at ~512MB-768MB ❌

  2. No help from shared: true either. ❌

  3. Probing initial enables a whopping 1.8593GB heap to be acquired! ✔️ But after that, allocating even the tiniest 64KB JS ArrayBuffer will cause the page to immediately reload. ❌

  4. No help from specifying a more modest maximum either. Page reloads at 512MB. ❌

  5. Able to allocate 512MB of JS memory, and after that the same 512MB of Wasm heap. Allocating more JS memory will cause a page reload. ❌

  6. Eviction was noticeably harder to reproduce, but did occur after launching a bit more memory consuming apps. ✔️/❌

Summary

The aforementioned issues pop up in different forms in the tests:

  • not being able to get enough initial memory,
  • needing to overallocate initial memory to do better than .grow(),
  • not getting JS exceptions when memory grow fails, but page reloads/OOMs,
  • not being able to release memory to keep Fast App Switching alive

Being in danger of suffocating browser native address space issues would not show up in this test, mainly because this test does not call out to any memory intensive browser APIs (XHR/Fetch/WebGL/WebAudio) that might risk exhausting memory. It is hard to say how prevalent such issues are on 32-bit Chrome. Firefox had an excellent memory allocation success in this test.

Testing some of this behavior is extremely fuzzy, for two main reasons:

  • it is hard to capture "old browser address space" behavior in test conditions, since it almost requires one to daily drive the device and browser for a while, and then do the testing.
  • it is hard to come up with a solid Fast App Switching eviction test, because what people do in the wild vs "lab conditions" can be quite different.

@juj
Copy link
Author

juj commented Feb 18, 2021

Would it be possible to somehow make a softer version of WebAssembly.Memory maximum field?

IIUC, this is already permitted by the specification, since even when setting a maximum size it is permitted for memory.grow to start failing arbitrarily at a smaller size.

one cannot set a gratuitous upper bound, since that can fail the allocation,

Maybe I'm misunderstanding the problem or the current implementation strategies in Chrome/Safari, but the intention of having a separate initial and maximum is that the engine only fails when it isn't able to allocate initial; it should never fail trying to allocate more than initial.

Just to give a bit more historical background: half the motivation for adding maximum was specifically to address these tensions Jukka explains w.r.t choosing the right intial size by encouraging the reservation scheme I mentioned above. Firefox used to experience asm.js OOMs acutely on Win32, motivating maximum, and, with the maximum-reservation impl techniques (especially when combined with a fresh process), we had a significant drop in Win32 OOMs, confirmed by partner telemetry.

Hi Luke! I recall this thread of conversation well, as I was also working with that partner collaboration. It did indeed help 32-bit Firefox to a great extent based on their telemetry. In the test scheme above, Firefox on Android performs well, and is able to allocate large heaps. (not sure if it is a 64-bit process already on Android?)

Though in the test scheme above, it looks like no browser performed any different when gratuitous maximum: 32767 passed.

Even with that recollection, this current behavior we have been seeing with maximum did still get me confused to think the implementations were attempting to guarantee satisfying the maximum, and hence failing - but the test scheme above shows that is not the case - they are failing already when that is omitted.

Although, we do put WASM into a large "caged" VA space so they could out of VA there but they're much more likely to get killed by the OS before that.

We have received some odd behavior (maybe due to this?) in Safari where people report that when they have "old browser process" (long running process/lots of tabs open?), they may fail to launch Unity pages due to OOMs or page reloads, but killing Safari process and reopening it will help a Unity game to launch again. It has been very difficult to raise a bug report about this since producing an "old browser process" in QA is quite a fuzzy and nonrepeatable procedure. (in fact, we do see get similar reports also occassionally in Firefox and Chrome, but not quite as often as with Safari)

Although now in the above test, this "shrinking" of available memory was reproduced, i.e. first page load got 768MB of Wasm heap, first page reload 544MB, and second page reload was down to 512MB. Opened [WebKit 222097] about this.

On the surface, it looks like the biggest pain point is inability to release memory.

I tend to agree, since if there is a way to release memory, then it would probably fall out of that that the initial commit vs reserve semantics would need to become well defined across implementations. One could then release all the memory that was initially committed (if it happened to cause a commit).

The memory allocation problems with test results that I have in the above post could presumably be dealt with implementation specific bugs (Chrome being 32-bit, not getting graceful JS OOM throws on large alloc failures, etc).

Shrinking memory within existing model isn't trivial. While we can grow memory by adding more pages at the end of the address space, if we do the same for shrinking it we would require defragmentation (to ensure those are actually empty), which means that a simple free won't be able to release memory.

On the separate topic of shrinking: do people actually want a memory.shrink (which, given a normal fragmented malloc heap, will rarely be possible to do for any significant amount -- it seems like you'd need a custom global memory management scheme to shrink with confidence)

On its own, a .shrink() would not be enough. Indeed it would be an opportunistic behavior where an emmalloc/dlmalloc impl could only .shrink() when the freed allocations occurred at the top of the heap, which may not be the case for many applications (and needs the programmer to be memory fragmentation aware). Although in some apps, this could "trivially" be the case when they do large transitions in application lifetime (user closes edited document, player exits a game level back to main menu), where user navigation flow has been able to guarantee this kind of stacking allocation order.

The intent with .shrink() was that maybe it could help give some address space back to a 32-bit browser, i.e. let wasm apps run at all times with the smallest heap size that they need to fit all their own memory into. Then the browser would know also at runtime how much of that gratuitously reserved maximum: 32767 address space it could reclaim, if JS side or other browser operation would cause a large JS/native allocation. I.e. to assist in not suffocating the browser's own address space. (Though maybe browsers may have hard time actually taking benefit of such opportunity in practice?)

One pragmatic thing that such a .shrink() would certainly help if nothing else, are the large number of bug reports people produce about Wasm apps consuming large amounts of memory, or having a memory leak when they enter and exit a full game scene in Unity. What people are doing is they look at their Chrome/Firefox/Safari DevTools Memory tab, and see the effects of .grow() when they enter a scene, but when they exit back and the scene is unloaded and memory cleared, they can not observe any shrink in the wasm heap in DevTools, leading them to think that a memory leak must have occurred. In other words, browser DevTooling is unable to account for the actually used memory in Wasm.

I wonder what would happen on desktop when wasm64 becomes a thing. If a wasm64 app performs a huge/maximum address space reservation, could such operation cause a 64-bit browser to be address space constrained on the native side? Could a wasm64 app be desired to be able to .shrink() the address space back to the browser? Or maybe wasm64 will still not allow an app to reserve the full 64-bit address space, but a much more modest fraction of it, so that browser still will have plenty for its own.

Btw, after seeing https://github.com/bytecodealliance/wasm-micro-runtime earlier my first thought was to wonder how they deal with the lack of .shrink() in extremely memory constrained systems that may not have a concept of virtual address space at all(?).

or just some way to achieve an madvise(MADV_DONTNEED) call (which is already called by some malloc impls, like jemalloc, and in general can be more-easily adopted in an ad hoc manner).

This would certainly be the main remedy that I can think. Instead of .shrink(), that would allow all apps to benefit, and help the Fast App Switching problems.

Orthogonally to all of this, even already today without any spec changes, I wonder if the current browser DevTools implementations could be improved to detect and display how much of the wasm heap is actually committed vs just reserved? Currently all browsers will show a huge opaque block of Memory for the Wasm allocation. It would be nice to have DevTools display a "committed size, reserved size, % committed" type of visuals, where one could then see how much memory their application is impacting in practice.

What this would help is that developers would better understand the behavior they are getting when they are doing browser specific workarounds to WebAssembly.Memory() allocation patterns. Also when writing Emscripten's emmalloc I have wondered whether the memory region marking strategy can cause excess page commits for unused memory pages that applications may not ever be using, so would be great to see how that behaves in practice.

@aardappel
Copy link

Some applications need address space, not memory

Somewhat related: discussion on "Support for reserving address space" in Memory64: WebAssembly/memory64#4

I generally would be very much in support of adding features related to mmap / reservation / shrinking / probing etc. to Wasm. Besides needing them for memory constrained devices, we will also need these for the opposite: programs wishing to manage large amounts of address space.

@lukewagner
Copy link
Member

@juj It seems like, if browsers did implement the FF maximum-reservation scheme, then with a 2gb maximum specified, there should be no difference in your experiment between (1) the memory successfully allocated by new WA.Memory({initial:X}) probing and (2) the size to which you can eventually memory.grow. And (2) would have the added benefit of only being reserved memory.

Jukka, would a good deal of your needs be addressed if:

  1. all browsers implemented the maximum-reservation scheme
  2. there was a discard instruction (as mentioned in future features and briefly entertained as an MVP instruction)

?

@penzn
Copy link

penzn commented Feb 19, 2021

How close is this to adding a GC'd reference type representing a first-class byte buffer, with operations analogous to ArrayBuffer?

@conrad-watt ideally very close, I just wasn't sure how this would work in the existing memory model. Sorry, I have not been following GC proposal close enough, can an object like this be accessed as part of linear memory?

@conrad-watt
Copy link
Contributor

conrad-watt commented Feb 19, 2021

The "simplest" version of (my interpretation of) this idea would be to make such buffers like any other GC object. That is, each buffer would have an entirely disjoint address space from any other (enforced by bounds checking), they'd have their own family of load/store operations, and would be stored (by reference) in a table, or as a field of another GC object, rather than in linear memory.

I wasn't sure if this was what you had in mind, or if the idea was to tie more closely to the existing linear memory (by having a host procedure to manage chunks of linear memory that are still manually accessible through regular load/store?).

@juj
Copy link
Author

juj commented Feb 19, 2021

@juj It seems like, if browsers did implement the FF maximum-reservation scheme, then with a 2gb maximum specified, there should be no difference in your experiment between (1) the memory successfully allocated by new WA.Memory({initial:X}) probing and (2) the size to which you can eventually memory.grow. And (2) would have the added benefit of only being reserved memory.

Jukka, would a good deal of your needs be addressed if:

1. all browsers implemented the `maximum`-reservation scheme

2. there was a `discard` instruction (as mentioned in [future features](https://github.com/WebAssembly/design/blob/master/FutureFeatures.md#finer-grained-control-over-memory) and [briefly entertained as an MVP instruction](https://github.com/WebAssembly/design/issues/384))

?

That would certainly be expected to fix the Chrome and Safari issues that allocating a large initial is better than growing from a small initial. That would also be expected to fix the Fast App Switching problem.

I am not sure if after those we will still have stability issues on 32-bit browsers, caused by a Wasm page reserving a large 2GB part of the process address space, leaving the browser with <=2GB left for its own use. Currently nothing stops a browser from gnawing back from top end of that address space if JS side does large XHRs or memory intensive WebGL operations, but if the page happened to temporarily have done a huge 2GB alloc (to grow() to consume the whole heap) but then freed all of it, that address space would then permanently be off limits for the browser to chip into. Would a .shrink() operation to enable address space stealing be too contrived to implement?

One particular detail about .discard() is the behavior that should happen when an app attempts to touch the memory to commit it again, but there is not enough memory available to commit. Regular JS ArrayBuffer allocations and Wasm .grow()s are "blocky" in that if I want to allocate e.g. 1GB, the allocation is monolithic in that 1GB, and if that fails, I should be able to gracefully get a JS exception/trap out of it, and decide to do something else. This is super-important for stability.

But touching memory to commit it will not be blocky, but will roll in one page at a time, so one might get 900MB of that 1GB reserve committed, and then run into a page that finally exhausts the available physical memory. We would not want the browser to silently reload the page like current Firefox/Safari/Chrome behavior on OOM can be. But instead, one would prefer to have a way to gracefully manage the page commit failure, and avoid the 1GB allocation altogether (and probably uncommit that 900MB from before to avoid browser small OOMing itself right after). So the exact semantics of what should happen when a page commit fails on memory store are important. (also what should happen when one attempts to load memory from an uncommitted page?)

Maybe in addition to memory store implicitly committing a page, there could be a dedicated instruction .commit(addr, length) that would commit the address range rolling from addr to length, and synchronously return the number of pages that were committed, so that it would be possible to implement memory allocators that could check that the memory it is handing out will be guaranteed available as committed to the caller (instead of the caller having to find out on its Nth memory store to the allocated region)?

Would the commit vs reserve page size be fixed (to the same 64K of the wasm page size?), or variable size depending on the underlying architecture? If variable size, can there be an instruction to query this size?

Finally, would it make sense to give applications an instruction to programmatically query a) if a given address (range?) is committed or not, and b) ask the number of committed pages total in wasm memory? Those would be nice to help implement debugging and profiling support to applications and allocators.

@lukewagner
Copy link
Member

I am not sure if after those we will still have stability issues on 32-bit browsers, caused by a Wasm page reserving a large 2GB part of the process address space, leaving the browser with <=2GB left for its own use.

For these issues, I'd like to re-highlight my earlier comment (second half) about (1) assumed engine clamping of the maximum internal reservation and (2) releasing reserved-by-maximum vmem at low-memory or allocation failure notifications.

One particular detail about .discard() is the behavior that should happen when an app attempts to touch the memory to commit it again, but there is not enough memory available to commit.

That's a great point. More generally, from talking about this with @lars-t-hansen today, it seems like, on systems where a random i32.load might OOM-kill the process, the implementation of initial/memory.grow-allocation should try to eagerly and fallibly "populate" the newly-available memory. I'm not positive, but from reading the man pages, this might be achievable on Android with mmap(MAP_ANONYMOUS|MAP_POPULATE) (using MAP_FIXED for in-place memory.grow), which would hopefully fail gracefully (not crash) if the region can't be populated. (Another candidate is madvise(MADV_WILLNEED), but it's not clear if that's just a hint that won't fail in the cases we want it to fail.)

(As a side note on terminology, and I'm not sure if I'm correct here, so happy to have corrections, but, IIUC: "committed" means neither "virtual address space allocated" nor "RAM pages allocated to page table entries"; rather, it means "you can access this region without SIGSEGV, but it might not be backed by RAM, so you may have a kernel trap on access that may OOM-kill you". Given this, it seems like "committed" doesn't imply the desired property of "not crashing at random i32.loads" (c.f., Linux "overcommit"); instead you want this more subtle, ephemeral and heuristic (since presumably the kernel can do whatever it wants) concept of "populated".)

Returning to the hypothetical discard instruction (which called madvise(MADV_DONTNEED)), it seems like this would un-populate the region, and thus have the possibility of crashing on the first i32.load in the region. Incorporating your idea above, maybe there could be an additional populate instruction which took a range and returned a bool (i32) indicating "I was able to populate this region". Semantically, it would have no side-effects, but when used in conjunction with discard it could be used by a malloc impl to achieve the goal of releasing unused RAM to the system while avoiding crash-on-i32.load.

@juj
Copy link
Author

juj commented Feb 20, 2021

(As a side note on terminology, and I'm not sure if I'm correct here, so happy to have corrections, but, IIUC: "committed" means neither "virtual address space allocated" nor "RAM pages allocated to page table entries"; rather, it means "you can access this region without SIGSEGV, but it might not be backed by RAM, so you may have a kernel trap on access that may OOM-kill you".

I must admit that I am not familiar with the Linux/Unix parlance of these terms, but I hope my use of "reserved" vs "committed" in earlier messages follows the correct semantics that Windows uses them with (https://docs.microsoft.com/en-us/previous-versions/ms810627(v=msdn.10) ).

For these issues, I'd like to re-highlight my earlier comment (second half) about (1) assumed engine clamping of the maximum internal reservation and (2) releasing reserved-by-maximum vmem at low-memory or allocation failure notifications.

In the absence of a .shrink() operation, or a way for browser to steal back uncommitted (unpopulated?) pages, it seems to be that such releasing reserved-by-maximum vmem would only work if the app was still in its initial pristine condition. Later in the app lifecycle, there may not exist any reserve left, as the application has grown to consume all of it - but would have no way of telling the browser if it is no longer actually using it or not.

It might be brittle if the reservation stealing would only work if the wasm app was still pristine, but not if it had earlier temporarily used a lot of memory.

Or maybe .shrink() is not needed and such reservation stealing would also work on unpopulated pages at the top end of the heap, where the browser could take those away in low mem scenarios even if app had .grow()n to them but later discarded the pages; and forbid the wasm app from populating any of the high pages if the browser needed to use them to avoid OOMing?

It is true that such .shrink() only from the top end type may require developers to pay extra attention to fragmentation, but I do see that as being better option, compared to the possible problems that might arise if wasm apps that have temporarily used a lot of memory can make the browser more prone to OOMing.

Returning to the hypothetical discard instruction (which called madvise(MADV_DONTNEED)), it seems like this would un-populate the region, and thus have the possibility of crashing on the first i32.load in the region. Incorporating your idea above, maybe there could be an additional populate instruction which took a range and returned a bool (i32) indicating "I was able to populate this region". Semantically, it would have no side-effects, but when used in conjunction with discard it could be used by a malloc impl to achieve the goal of releasing unused RAM to the system while avoiding crash-on-i32.load.

This sounds very good. That would help apps decide to do something else on large OOMs without risking of populating up to last available page in the browser and then failing.

What would the semantics of memory loads and stores in general be like to unpopulated pages, when there is plenty of memory available? Would each touch of a page implicitly populate under the hood? Or would it trap? It feels like either behavior could be useful, not sure which way to lean on this.

Also, would it make sense to have an instruction to switch a page to be read-write vs read-only vs noaccess? Those could be interesting to help debugging and error catching.

@lukewagner
Copy link
Member

In the absence of a .shrink() operation, or a way for browser to steal back uncommitted (unpopulated?) pages, it seems to be that such releasing reserved-by-maximum vmem would only work if the app was still in its initial pristine condition.

This is where it's important to distinguish "reserved-by-maximum vmem" from "memory accessible to wasm via initial or memory.grow". The former is only reserved, not committed, and thus wasm will trap if attempting to access it. Thus, memory reserved by maximum is necessarily pristine/unpopulated, so the browser can unobservably (other than failing a future memory.grow) release it.

What would the semantics of memory loads and stores in general be like to unpopulated pages, when there is plenty of memory available? Would each touch of a page implicitly populate under the hood? Or would it trap? It feels like either behavior could be useful, not sure which way to lean on this.

Although you could imagine a trapping semantics being useful for catching bugs, this would place a major requirement on wasm engines to use signal-handler tricks to avoid costly per-memory-access checks (which not all engines can do now or in the future). That's why I proposed above that populate have no semantic side effect on linear memory or future loads/stores. You could imagine a debug-mode that caught such errors.

Also, would it make sense to have an instruction to switch a page to be read-write vs read-only vs noaccess?

Definitely agreed that these would be valuable, but the same caveat applies that implementing this feature without the benefit of memory-protection+signal-handlers would be pretty expensive. At least, that's what has held us back so far; maybe we should revisit this at some point. There's also challenging questions in the JS API for how to handle typed array views that overlap read-only or inaccessible regions.

@juj
Copy link
Author

juj commented Mar 8, 2021

Thus, memory reserved by maximum is necessarily pristine/unpopulated, so the browser can unobservably (other than failing a future memory.grow) release it.

I do understand that, but that is not the scenario when I am concerned about browser not being able to release it.

If the wasm page temporarily uses all of the reserved max memory, i.e. .grow()s to take over it, but then later releases most of it, then the memory region would again be unpopulated, but currently the browser cannot recognize it, and cannot claim any of it to its own use. This temp large .grow() is what I am concerned about, since that cannot be undone unless a .shrink() would be supported.

That's why I proposed above that populate have no semantic side effect on linear memory or future loads/stores. You could imagine a debug-mode that caught such errors.

That does make sense. discarding a page will then be the same as memsetting it to zero?

Definitely agreed that these would be valuable, but the same caveat applies that implementing this feature without the benefit of memory-protection+signal-handlers would be pretty expensive. At least, that's what has held us back so far; maybe we should revisit this at some point. There's also challenging questions in the JS API for how to handle typed array views that overlap read-only or inaccessible regions.

Gotcha - memory protection is something that I don't see critical at all for solving mobile memory problems, so that can certainly be left out. Was just rather curious whether that would have come practically "for free" on the side.

@lukewagner
Copy link
Member

If the wasm page temporarily uses all of the reserved max memory, i.e. .grow()s to take over it, but then later releases most of it, then the memory region would again be unpopulated, but currently the browser cannot recognize it, and cannot claim any of it to its own use.

Ah, that's a different case than I was replying to. For the memory.shrink case, as I was saying in my earlier comment, unless a custom memory allocation scheme was employed, I would imagine that usual internal fragmentation problems would prevent memory.shrink from helping much with malloc()+free() (e.g., if there is even 1 tiny malloc() performed after the large temporary allocations, it would prevent the memory.shrink). Are you imagining the use of such a custom allocation scheme? When I imagine how such a custom allocation scheme would need to be implemented, it seems tricky: to prevent malloc() from going after the large temporary allocations, you'd need to reserve a fixed amount of space before the large temporary allocations, and sizing this region would require some of the same hard questions (how much, with penalties for too-much and too-little) you outlined at the root of this thread.

For the general case, I think discard would get you most of what you need: releasing the physical RAM without paging -- it's only the vmem range that's not being released back to the browser. When I imagine a typical loading sequence, it seems like load-time (when the temporary large allocation is made) is the point at which vmem is most scarce, and once the app "survives" this pinch-point, it's mostly good. If the app performs a sequence of loads (e.g., loading levels), there's even a risk that, after giving vmem back after the first load, subsequent fragmentation prevents the second load from growing again. Thus, it's a question of whether the engine should even give back vmem to the browser after a memory.shrink.

The reason I push back on memory.shrink is that it opens a big can of worms for shared memory and a small can of worms for non-shared memory, so it's something I was hoping we could avoid, if indeed it has limited practical applicability.

That does make sense. discarding a page will then be the same as memsetting it to zero?

Yep!

@titzer
Copy link

titzer commented Mar 8, 2021

Based on the memory management in V8 for array buffers, especially shared buffers, it would be quite difficult to support memory.shrink, so I mostly concur with @lukewagner here.

@juj
Copy link
Author

juj commented Mar 9, 2021

I would imagine that usual internal fragmentation problems would prevent memory.shrink from helping much with malloc()+free() (e.g., if there is even 1 tiny malloc() performed after the large temporary allocations, it would prevent the memory.shrink). Are you imagining the use of such a custom allocation scheme?

Yes, indeed. I'll repeat the rationale:

Indeed it would be an opportunistic behavior where an emmalloc/dlmalloc impl could only .shrink() when the freed allocations occurred at the top of the heap, which may not be the case for many applications (and needs the programmer to be memory fragmentation aware). Although in some apps, this could "trivially" be the case when they do large transitions in application lifetime (user closes edited document, player exits a game level back to main menu), where user navigation flow has been able to guarantee this kind of stacking allocation order.

In some applications it is easily the case that on "grand scale" the allocations have good stack-like characteristics when one transitions between e.g. main menu and the game levels. Applications may need to develop custom memory pools to manage this kind of behavior, but that is not much different from wasm today.

Already in the absence of .shrink(), Wasm developers need to be mindful about memory fragmentation, so introducing a .shrink() would not change that fact.

When I imagine a typical loading sequence, it seems like load-time (when the temporary large allocation is made) is the point at which vmem is most scarce, and once the app "survives" this pinch-point, it's mostly good.

This is perhaps a bit too simplistic model. If we look at app loading flow, then under current .grow() only model, it will actually be the "second document load" (document/game level/asset/...) that will cause the most simultaneously consumed address space pressure, e.g.:

  1. Load a large, say, 300MB, document to JS memory (e.g. from XHR or IndexedDB - former could stream, latter cannot)
  2. wasm.grow() memory +300MB to fit the document, memcpy the document to Wasm memory
  3. unload doc from JS memory, -300MB,
  4. wasm.grow() a second time, for, say, +1GB to unpack/expand the doc in wasm, and fit a working memory area for processing the document
  5. document unloads, -1.3GB of unused memory in wasm (would .shrink() here if available)
  6. load a (the same?) document again, e.g. +300MB to JS memory, but OOM since cannot find address space for 1.3GB + 300MB simultaneously.

Outside the loading process, applications can also have large persistent JS side memory allocations long after the wasm heap has been .grow()n to its maximum size. E.g. when

  • working with large amounts of data with IndexedDB,
  • using Web Audio,
  • using WebGL or WebGPU,
  • large web requests, or
  • other marshalling of data between JS and Wasm.

but the size of the needed JS side memory can vary between documents/game levels, so when one level might need more of Wasm memory, another level might need more of JS memory. E.g. in Unity game specifically, if there is programmatically heavy computation (pathfinding, AI, noise, skinning, some other game C# computation) in one level, that could amplify a lot of wasm .grow()s to occur. If there is a lot of audio, or data marshalling, or cutscene videos, then there will be a lot of JS memory usage. These Wasm vs JS side maximums will not necessarily happen at the same time, but without a .shrink() operation, one cannot do anything to combat this (and should probably pretend as if these maximums did occur simultaneously).

With wasm32 at least 64-bit browsers will be immune to this, so this will be a 32-bit browser only concern. Not sure what will happen with wasm64.

The reason I push back on memory.shrink is that it opens a big can of worms for shared memory and a small can of worms for non-shared memory, so it's something I was hoping we could avoid, if indeed it has limited practical applicability.

It would certainly not be a 100% cure, since a wasm application that was not fragmentation aware would not be able to benefit. Though if one is developing a wasm page with large data sets, unfortunately one will already need to be fragmentation aware, there is no escaping that with or without .shrink().

I do appreciate the trouble with shared memories.

@lukewagner
Copy link
Member

Thanks for the info @juj. The point I'm trying to dig into, though, is: even though apps may have this stack-like "grand scale" behavior you mention, that doesn't ensure that memory.shrink can be used w/o a very special allocation scheme along with app-wide coordination. (My expectation is that: without explicit global coordination, it would be very easy for tiny mallocs to creep in that break your ability to memory.shrink.) I can theoretically imagine such global coordination schemes, but it seems like potentially a big architectural change, which is why I wonder whether it would be implemented in practice.

In some applications it is easily the case that on "grand scale" the allocations have good stack-like characteristics when one transitions between e.g. main menu and the game levels.

Your loading scenario makes sense, but a slight variation shows how having the browser release vmem dynamically could be equally problematic: imagine step 5 shrinks (and releases vmem) and but then step 6 tries to perform a large new wasm allocation which now fails due to fragmentation. This seems like a difficult tension to resolve in general.

What I can imagine being a more reliable way to avoid this kind of thrashing between wasm and JS memory is to avoid pulling in large allocations directly into linear memory all-at-once by instead streaming bounded-sized chunks (from a backing Blob or ArrayBuffer) into wasm memory on-demand. (I know that's not always possible, though.)

Load a large, say, 300MB, document to JS memory (e.g. from XHR or IndexedDB - former could stream, latter cannot)

On a side note, IIUC, on both Chrome and Firefox, Blobs are not kept in memory. Thus, I think you can "stream" a Blob by .slice()ing it into fixed-size chunks that are individually .arrayBuffer()ed.

@penzn
Copy link

penzn commented Mar 27, 2021

@conrad-watt sorry for taking this long to reply :)

The "simplest" version of (my interpretation of) this idea would be to make such buffers like any other GC object. That is, each buffer would have an entirely disjoint address space from any other (enforced by bounds checking), they'd have their own family of load/store operations, and would be stored (by reference) in a table, or as a field of another GC object, rather than in linear memory.

That should work as long as we can present the allocated objects as something memory-like to the consumers in the module. Do we have enough support for this in the standard or near-future proposals?

or if the idea was to tie more closely to the existing linear memory (by having a host procedure to manage chunks of linear memory that are still manually accessible through regular load/store?).

This is what I originally thought, since that is closer to how accessing memory works in the native world, though after giving it a little more thought I am not sure this would be easy to support within existing model of linear memory.

@penzn
Copy link

penzn commented Apr 20, 2021

@conrad-watt's approach can be extended to support POSIX stack emulation - instead of incrementing a global "stack base" symbol on entry and decrementing it on exit, function can request an object on which would be GC'd after it exits. This would free up linear memory and prevent stack walking.

@lars-t-hansen
Copy link
Contributor

@conrad-watt's approach can be extended to support POSIX stack emulation - instead of incrementing a global "stack base" symbol on entry and decrementing it on exit, function can request an object on which would be GC'd after it exits.

Not in any language that can take the address of stack variables, I think.

@penzn
Copy link

penzn commented Apr 27, 2021

Not necessarily, if some form or reference would be considered an address it would work; also this issue would apply to heap objects too. I am not yet sure how this would work though, my speculation would be that via some combination of interface types and GC we can get an object which can be represented as a bag of bytes and then do something that resembles memory operations on it.

@laughinghan
Copy link

@lukewagner

The point I'm trying to dig into, though, is: even though apps may have this stack-like "grand scale" behavior you mention, that doesn't ensure that memory.shrink can be used w/o a very special allocation scheme along with app-wide coordination. (My expectation is that: without explicit global coordination, it would be very easy for tiny mallocs to creep in that break your ability to memory.shrink.) I can theoretically imagine such global coordination schemes, but it seems like potentially a big architectural change, which is why I wonder whether it would be implemented in practice.

The kind of fragmentation you're describing makes it sound like you're thinking of general-purpose allocators like dlmalloc/jemalloc, but my understanding is that it's common for games to use (for example) arena allocators (aka bump allocators). The way they work is you can't fine-grained free() individual blocks of memory at all, instead you allocate in constant time by incrementing (bumping) a pointer, and then when you're done rendering that frame, you free the entire arena in constant time by resetting the pointer. E.g.:

I think the idea with most of these is that if you need some info across frames/requests, you just store it globally; but separate arenas with different lifetimes are also a thing, typically called region-based memory management. Obviously these are indeed a big architectural decisions with app-wide implications, but I don't think they're unusual at all in practice, especially for games, which are a major use case for Wasm IIUC.

I apologize if you already know all this—most of this ticket is over my head, nor am I a game developer.

@danaugrs
Copy link

I think memory.shrink would make sense in a lot of cases, for both in-browser and out-of-browser applications.
The instruction would mean "I don't need these last few pages of memory anymore".
The runtime should be able to do as it pleases with that information.
Maybe it gives it back to the OS. Maybe it doesn't. But if it wants to it can.

What's the problem with this approach?

@penzn
Copy link

penzn commented Sep 15, 2022

Problem with just shrinking is that free pages might not be at the end. Though I am not sure that is a good enough reason to not introduce it: for usage patterns where that would work it would provide a relief, while the rest would stay unchanged.

@titzer
Copy link

titzer commented Sep 15, 2022

Others have approached an instruction which semantically zeroes memory pages but also hints that they will not be needed soon, so that the underlying implementation can do the equivalent of madvise() calls that requests the OS release the physical pages of memory.

(edit: read up the thread a bit, I think the suggestions cover memory.shrink well).

@devshgraphicsprogramming

Problem with just shrinking is that free pages might not be at the end. Though I am not sure that is a good enough reason to not introduce it: for usage patterns where that would work it would provide a relief, while the rest would stay unchanged.

It seems that this whole "feature" of a shrink method possibly not helping much, seems to be a product of two things:

  • WASM having a linear address space
  • correct me if I'm wrong but seems like WASM runtime/spec does not mention allowing for paging
  • most memory heavy users of WASM except maybe Mono-WASM don't do garbage collection with compaction

As I dev I can probably fix this for myself by employing some techniques for compacting my data and avoiding fragmentation, on the simplest end of the spectrum I'd prevent long-living objects, on the most complex I'd develop my own garbage collection library.

I guess that "paging" of the memory poses a security concern or it can't be done for all OSes?

@juj
Copy link
Author

juj commented Nov 17, 2022

Hey, we are getting towards a 2 year anniversary of this conversation thread - I am wondering if there might have been updated progress or revised thoughts on the WebAssembly group on this topic?

On Unity's side, we are getting growing amounts of issue reports about running out of memory on mobile devices, and about Unity content behaving poorly with respect to trying to avoid application switching eviction behavior. More Unity Wasm developers are trying their feet with targeting mobile, and game developers overwhelmingly report that the mobile space is where gaming dominates. At the moment we are in a hard position to be able to officially call "Mobile WebGL" being a supported platform at Unity, due to the memory challenges that mobile Wasm content faces.

Most recently as of yesterday, we have started getting reports about Unity Wasm content running out of memory on mobile devices in the NASA JPL Artemis moon rocket tracking application: https://www.nasa.gov/specials/trackartemis/ that has been developed with Unity. (Those reports have been anecdotal in that we haven't been able to verify them in action, but it did did remind me to chime in on this issue)

@dtig opened the discussion thread #1439 active for the proposal https://github.com/dtig/memory-control . There the operation memory.discard was proposed. The description sounds like it would address this concern, although I struggled to find the actual parameters for the proposed call (maybe they haven't been crafted yet). Unfortunately it looks like that proposal has not progressed since 10 months ago, so inferring that it went on a pause. I wonder if there is a timeline or plan to pick it up at some point?

Again I want to echo that I would be eager to help test an implementation against Emscripten dlmalloc/emmalloc and Unity Wasm content to provide real-world feedback on how well the feature would work in practice, if/when there would be a browser+LLVM tooling implementation prototype that would become available.

@dtig
Copy link
Member

dtig commented Nov 17, 2022

@juj The proposal did go on a hiatus for some time for bandwidth reasons, and to figure out how to make memory.map/memory.unmap useful for a broader set of use cases, but I'm picking it back up now. They haven't been updated to the proposal repo, but I have a prototype in progress for memory.discard. I'll follow up offline so we can get an end to end experiment going as experimental data would be really useful in this case.

@eqrion
Copy link

eqrion commented Feb 15, 2023

@juj SpiderMonkey now also has a prototype of a memory.discard feature, and it's in Firefox Nightly behind the javascript.options.wasm_memory_control flag. There are more details in WebAssembly/memory-control#6.

@juj
Copy link
Author

juj commented Feb 23, 2023

Hey, this is absolutely amazing news! Made a note to look into experimenting with this, and see how it plays out.

@juj
Copy link
Author

juj commented Mar 1, 2023

I've now created a branch of Emscripten that adds memory.discard support to the emmalloc memory allocator: emscripten-core/emscripten@main...juj:emscripten:memory_discard

From a super-quick test, it is working out as expected in Firefox Nightly. I'll look to do more comprehensive integrated testing as the next steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests