Remove sbrk from MVP, add mmap&friends to AstSemantics #285

kripken · 2015-07-28T21:32:39Z

Based on discussion in #227.

I didn't know how to call the sysconf-like method... ideas are welcome.

sunfishcode · 2015-07-28T21:46:20Z

Is it intended that this removes the intent to implement a null guard page mechanism from the MVP?

The "Adjusting memory size and permissions" seems to only be talking about future features. Would FutureFeatures.md be a better place for this content?

kripken · 2015-07-28T21:49:13Z

Yes - how would we polyfill a null guard page mechanism in the MVP?

I didn't put it in FutureFeatures because, as I see it at least, this isn't a future feature, but more of a PostMVP type thing. But based on the content and other interactions between that document and AstSemantics, AstSemantics seemed better. Unless I am misunderstanding the criteria for divvying content between those two?

pizlonator · 2015-07-28T21:59:47Z

Sorry to be late following these threads. I’ve tried to read these threads, but I’m still confused about the mmap idea.

What is the proposed implementation strategy here and the constraints? Are we claiming that implementors must reserve a huge amount of virtual memory and then protect it all by default, and then unprotect it when an mmap allocates something?

-Filip

On Jul 28, 2015, at 2:33 PM, Alon Zakai notifications@github.com wrote:

Based on discussion in #227 #227.

I didn't know how to call the sysconf-like method... ideas are welcome.

You can view, comment on, or merge this pull request online at:

#285 #285
Commit Summary

remove sbrk from MVP, add mmap&friends to AstSemantics
File Changes

M AstSemantics.md https://github.com/WebAssembly/design/pull/285/files#diff-0 (27)
M MVP.md https://github.com/WebAssembly/design/pull/285/files#diff-1 (1)
Patch Links:

https://github.com/WebAssembly/design/pull/285.patch https://github.com/WebAssembly/design/pull/285.patch
https://github.com/WebAssembly/design/pull/285.diff https://github.com/WebAssembly/design/pull/285.diff
—
Reply to this email directly or view it on GitHub #285.

jfbastien · 2015-07-28T22:17:01Z

AstSemantics.md

+In addition to the `mmap` operation, applications will also have access to
+
+ * `munmap`, to unmap `mmap`ed pages, and
+ * `mmap_get_page_size`, to detect the proper system page size.


I'd keep the name vague for now, since it's related to feature detection.
Also mention it's a "good enough" guess, not necessarily the actual value.

I don’t think this answers my question.

I think we should be clear about whether future performant wasm implementations are required to use page-level protections and large reservations. Do you believe that this is the case?

-Filip

On Jul 28, 2015, at 3:17 PM, JF Bastien notifications@github.com wrote:

In AstSemantics.md #285 (comment):

+In the MVP the size of linear memory is fixed: The initial size of linear
+memory will remain unchanged for the life of that WebAssembly module. Later,
+we will support a limited form of mmap which can:
+

* Allocate pages of memory, in order to increase or decrease

the amount of available memory to the WebAssembly module.

* Adjust the permissions of a page of memory, for example to make

small effective addresses close to 0 behave as if they are

out-of-bounds

(see discussion).

+In addition to the mmap operation, applications will also have access to
+

* munmap, to unmap mmaped pages, and

* mmap_get_page_size, to detect the proper system page size.
I'd keep the name vague for now, since it's related to feature detection.
Also mention it's a "good enough" guess, not necessarily the actual value.

—
Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/pull/285/files#r35708365.

@jfbastien ok, I pushed an edit about the "good enough" aspect.

@pizlonator I might be confused between email and github's interface, but I think @jfbastien was commenting on the pull here, not responding to your question.

Regarding large reservations, I don't think those would be necessary (unless I am missing something?), but page-level protections would be (but they already were necessary in the docs, as they mentioned a way to protect the area around 0).

I also wrote a general response in the main thread of this pull.

On Jul 28, 2015, at 3:37 PM, Alon Zakai notifications@github.com wrote:

In AstSemantics.md #285 (comment):

+In the MVP the size of linear memory is fixed: The initial size of linear
+memory will remain unchanged for the life of that WebAssembly module. Later,
+we will support a limited form of mmap which can:
+

* Allocate pages of memory, in order to increase or decrease

the amount of available memory to the WebAssembly module.

* Adjust the permissions of a page of memory, for example to make

small effective addresses close to 0 behave as if they are

out-of-bounds

(see discussion).

+In addition to the mmap operation, applications will also have access to
+

* munmap, to unmap mmaped pages, and

* mmap_get_page_size, to detect the proper system page size.
@jfbastien https://github.com/jfbastien ok, I pushed an edit about the "good enough" aspect.

@pizlonator https://github.com/pizlonator I might be confused between email and github's interface, but I think @jfbastien https://github.com/jfbastien was commenting on the pull here, not responding to your question.

You’re right! Sorry.

-Filip

Regarding large reservations, I don't think those would be necessary (unless I am missing something?), but page-level protections would be (but they already were necessary in the docs, as they mentioned a way to protect the area around 0).

I also wrote a general response in the main thread of this pull.

—
Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/pull/285/files#r35710043.

kripken · 2015-07-28T22:24:11Z

The general idea is to enable the amount of memory used to be adjusted over time. Previously, the design docs mentioned sbrk as the mechanism for doing so. This pull request suggests to generalize that, by

Instead of just having sbrk that can apply a delta to the size of addressable memory, a limited mmap can be used. For example, if the initial memory allocation is of 100 pages, the application might allocate 10 more pages right after those, then it might munmap some of them, and so forth. This would allow memory usage to be flexible and also not suffer as much from fragmentation as a pure sbrk approach would. (This does not allow the far more complex things that mmap can do in POSIX, like mapping files, etc., although some of that is already in the design docs in FutureFeatures).
Allow protecting memory, with the main use case being debugging. Specifically, this improves things from the current design docs which say that there is a magic way to protect the area around zero; this would let such protection be part of a more general technique.

pizlonator · 2015-07-28T22:39:44Z

It’s a shame I didn’t have time to follow along earlier. I have a philosophical objection to this approach: it precludes good implementations that don’t play OS tricks. On the other hand, I can see that this will basically just work.

For this PR, it seems like excluding mmap from the MVP is unnecessary. What about supporting just this:

mmap(0, N, PROT_READ|PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0)

This can be implemented exactly the same way that we would have implemented sbrk.

-Filip

On Jul 28, 2015, at 3:24 PM, Alon Zakai notifications@github.com wrote:

The general idea is to enable the amount of memory used to be adjusted over time. Previously, the design docs mentioned sbrk as the mechanism for doing so. This pull request suggests to generalize that, by

Instead of just having sbrk that can apply a delta to the size of addressable memory, a limited mmap can be used. For example, if the initial memory allocation is of 100 pages, the application might allocate 10 more pages right after those, then it might munmap some of them, and so forth. This would allow memory usage to be flexible and also not suffer as much from fragmentation as a pure sbrk approach would. (This does not allow the far more complex things that mmap can do in POSIX, like mapping files, etc., although some of that is already in the design docs in FutureFeatures).
Allow protecting memory, with the main use case being debugging. Specifically, this improves things from the current design docs which say that there is a magic way to protect the area around zero; this would let such protection be part of a more general technique.
—
Reply to this email directly or view it on GitHub #285 (comment).

kripken · 2015-07-28T22:49:26Z

That (or sbrk) in the MVP would require that we polyfill it using asm.js memory growth, however, which I thought you were opposed to in #227 (reassigning the HEAP variables, etc.)? It seemed simplest to me to just drop sbrk/mmap from the MVP, and implement mmap later along with other features that we don't expect to be polyfillable anyhow.

pizlonator · 2015-07-28T22:55:36Z

On Jul 28, 2015, at 3:49 PM, Alon Zakai notifications@github.com wrote:

That (or sbrk) in the MVP would require that we polyfill it using asm.js memory growth, however, which I thought you were opposed to in #227 #227 (reassigning the HEAP variables, etc.)?

We support it, you just get punished for doing it. But your code will run.
It seemed simplest to me to just drop sbrk/mmap from the MVP, and implement mmap later along with other features that we don't expect to be polyfillable anyhow.

That’s somewhat fair, I guess. I would have erred on the side of a more powerful MVP that has some features that may sometimes slow you down if you have to go to polyfill.

-Filip

—
Reply to this email directly or view it on GitHub #285 (comment).

jfbastien · 2015-07-28T23:19:22Z

@pizlonator sorry, I think we raced on answers, or at least I didn't see your question :-)

I think I've answered your question here: #227 (comment)

It sounds like you're pretty much thinking about the same thing I am when you suggest mmap(0, N, PROT_READ|PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0).

lukewagner · 2015-07-29T08:55:55Z

First, let's put aside the feature of changing memory protection; that's almost orthogonal to the rest of this discussion and could be provided regardless of mmap vs sbrk. Similarly, the features of mmaping files and sharing memory between separate linear memories can be implemented by orthogonal builtins.

Now, on the subject of memory allocation, I'm having trouble understanding what mmap is actually buying anyone compared to sbrk in a realistic implementation (signal-support or no). Let's say an app starts out with an initial heap range A. Then later the app mmaps another range B. If the wasm engine wants linear memory accesses to be guarded by a single bounds check and implemented by a load base[index], the wasm engine has to put the allocation of B following A in memory. Yes, we could specify mmap to allocate memory at a non-deterministic address (so there could be a gap between A and B), but we can't use this freedom to stick arbitrary stuff in between A and B since this memory can be directly accessed by the load base[index]. So what does this feature buy us other than a prevalent form of nondeterminism? I say "prevalent" since it has been seen many times that even small changes to a nondeterministic allocation algorithm end up breaking real world code. Even mmap, with its coarse-granularity allocation isn't immune (c.f. ADDR_COMPAT_LAYOUT).

Similarly, munmap achieves two separate things: releases virtual address space and releases committed physical pages. However:

an impl of munmap cannot release the actual virtual address space since that range is still accessible via untrusted loads/stores and so the engine must prevent arbitrary browser allocations from going into that space by keeping it reserved;
releasing the committed physical pages (w/o releasing the virtual address space) can be exposed as an orthogonal builtin madvise(MADVISE_DONTNEED).

I expect the root motivation here is an anticipated separate-process wasm impl (where you can use NaCl-like sandboxing). However:

the nondeterminism and ensuing portability hazards remain;
it's a fresh process; what's the win over reserving the virtual address space up front?

Thus, it's important not to think of this issue as mmap vs sbrk ("no brainer, I'll take the one that does 10 things") but, rather, "deterministic allocation" vs. "nondeterministic allocation without teeth" and we'll get the other 9 features of mmap independently.

titzer · 2015-07-29T09:04:31Z

On Wed, Jul 29, 2015 at 10:56 AM, Luke Wagner notifications@github.com
wrote:

First, let's put aside the feature of changing memory protection; that's
almost orthogonal to the rest of this discussion and could be provided
regardless of mmap vs sbrk. Similarly, the features of mmaping files and
sharing memory between separate linear memories can be implemented by
orthogonal builtins.

Now, on the subject of memory allocation, I'm having trouble understanding
what mmap is actually buying anyone compared to sbrk in a realistic
implementation (signal-support or no). Let's say an app starts out with an
initial heap range A. Then later the app mmaps another range B. If the
wasm engine wants linear memory accesses to be guarded by a single bounds
check and implemented by a load base[index], the wasm engine has to put
the allocation of B following A in memory. Yes, we could specify mmap to
allocate memory at a non-deterministic address (so there could be a gap
between A and B), but we can't use this freedom to stick arbitrary stuff in
between A and B since this memory can be directly accessed by the load
base[index]. So what does this feature buy us other than a prevalent form
of nondeterminism? I say "prevalent" since it has been seen many times that
even small changes to a nondetermi nistic allocation algorithm end up
breaking real world code. Even mmap, with its coarse-granularity
allocation isn't immune (c.f. ADDR_COMPAT_LAYOUT).

Similarly, munmap achieves two separate things: releases virtual address
space and releases committed physical pages. However:

an impl of munmap cannot release the actual virtual address space
since that range is still accessible via untrusted loads/stores and so the
engine must prevent arbitrary browser allocations from going into that
space by keeping it reserved;

releasing the committed physical pages (w/o releasing the virtual
address space) can be exposed as an orthogonal builtin
madvise(MADVISE_DONTNEED).

I expect the root motivation here is an anticipated separate-process wasm
impl (where you can use NaCl-like sandboxing). However:

the nondeterminism and ensuing portability hazards remain;

it's a fresh process; what's the win over reserving the virtual
address space up front?

Thus, it's important not to think of this issue as mmap vs sbrk ("no
brainer, I'll take the one that does 10 things") but, rather,
"deterministic allocation" vs. "nondeterministic allocation without teeth"
and we'll get the other 9 features of mmap independently.

I agree with Luke. We're starting to wander into territory that is
dictating virtual memory tricks that would be either prohibitively
expensive to emulate and/or severely impact colocation in a host process.

In particular, we've floated a couple prototypes past the Chrome Security
Team that use signal handling for various purposes, not limited to bounds
checks. It's not a simple situation, since there are several attack modes
that we didn't initially consider, and Chrome is rightly conservative in
the face of a new class of tricky low-level techniques.

—
Reply to this email directly or view it on GitHub
#285 (comment).

kg · 2015-07-29T14:51:07Z

I think address space management is a reality in modern software and we should design it in from the beginning. That's why I'm personally pushing for mmap. sbrk is sufficient for trivial scenarios (I want to make my allocated heap larger) but not for anything more complicated.

I guess the argument I'm hearing is that we want sbrk to be our fundamental address space model, and have modifying chunks of the sbrk region be the primitive you use to handle things that would have been mmap before? I assume one of the motives here is that the sbrk model is much easier to optimize bounds checks against and requires less effort to secure.

I think the sbrk model is reasonable from a compromise perspective, but we should be really sure that it won't prevent us from doing important things down the road. Will things like read-only pages, copy-on-write pages, guard pages be viable with this model? It seems like all of those features could layer on top of sbrk, and you're just forcing the user to write their own mmap in user space. So maybe that's okay.

One way or another, address space management will be required once we have shared heaps and load-time dynamic linking - the hard coded heap offsets in each module will conflict with each other so you end up with each module having a separate base offset in the reserved address space for its statics, etc. It seems like this isn't necessarily incompatible with sbrk, because you can figure out how much reserved static space you need for all your modules at startup. For run-time dynamic linking (which we can't really avoid having eventually), how would this work with the sbrk model? You won't be able to grow the static region because that space will potentially be in use by the heap already. The only option there I can think of would be to put the static region for the new module at some random available offset in the heap and then sbrk() to make more space, which is that same brand of nondeterminsm you despise (also: super gross)

Is sbrk-only ultimately just saying that we want address space management to be done in user-space with malloc?

pizlonator · 2015-07-29T17:27:30Z

On Jul 29, 2015, at 7:51 AM, Katelyn Gadd notifications@github.com wrote:

I think address space management is a reality in modern software and we should design it in from the beginning. That's why I'm personally pushing for mmap. sbrk is sufficient for trivial scenarios (I want to make my allocated heap larger) but not for anything more complicated.

I agree with this view. I think that an sbrk-only model is restrictive, and it would be nice to have something better. But because of other constraints, having a mmap/munmap style model that actually does what it claims to do - allows the wasm host to reuse memory freed by the wasm guest - will incur some memory overhead beyond what we might want.

I guess the argument I'm hearing is that we want sbrk to be our fundamental address space model, and have modifying chunks of the sbrk region be the primitive you use to handle things that would have been mmap before? I assume one of the motives here is that the sbrk model is much easier to optimize bounds checks against and requires less effort to secure.

Right. If we had an efficient way of enabling per-wasm-process page permissions in the host, then munmap would be viable. I actually don’t know if this is profitable.

OS page protections probably won’t work, since then calls into and out of wasm will get hammered with an enormous overhead - a bunch of syscalls to change page permissions.

I’ve heard academic results on software-only page protections - I vaguely recall ~5% overhead results in some MS Research paper. I don’t remember the details or the citation. I also don’t have experience optimizing such checks. Hypothetically, you could imagine that instead of a bounds check and offset math for each memory access:

if (address <_{unsigned} limit)
access(address + base)

you’d have a page check and offset math:

if (perms[address >> logPageSize])
access(address + base)

Here I’m assuming that “perms” is a byte array rather than bitvector, just to reduce the number of cycles needed to do the check.

I don’t know if anyone has experimented with this or attempted to optimize it. I’d be open to such an approach if we knew that it could be made to be fast enough. Personally, I’d support this even if it was slower than the simple bounds check because I take it as a given that an elastic memory reuse model for wasm is a Good Thing. I’d probably support the software page check and mmap/munmap over bounds check with sbrk even if it meant 10% throughput overhead (versus the bounds check) on some reputable benchmark suite.

In the absence of a performant page permission check, munmap wouldn’t actually be able to return the memory to the wasm host. So, the host would suffer the same virtual memory consumption as if the munmap was emulated on top of sbrk with the wasm guest keeping the “unmapped” page on a free list.

I think the sbrk model is reasonable from a compromise perspective, but we should be really sure that it won't prevent us from doing important things down the road. Will things like read-only pages, copy-on-write pages, guard pages be viable with this model? It seems like all of those features could layer on top of sbrk, and you're just forcing the user to write their own mmap in user space. So maybe that's okay.

One way or another, address space management will be required once we have shared heaps and load-time dynamic linking - the hard coded heap offsets in each module will conflict with each other so you end up with each module having a separate base offset in the reserved address space for its statics, etc. It seems like this isn't necessarily incompatible with sbrk, because you can figure out how much reserved static space you need for all your modules at startup. For run-time dynamic linking (which we can't really avoid having eventually), how would this work with the sbrk model? You won't be able to grow the static region because that space will potentially be in use by the heap already. The only option there I can think of would be to put the static region for the new module at some random available offset in the heap and then sbrk() to make more space, which is that same brand of nondeterminsm you despise (also: super gross)

Is sbrk-only ultimately just saying that we want address space management to be done in user-space with malloc?

Yeah, though I think of it as mmap/munmap emulation in user-space - simply because we’ll probably end up literally implementing mmap/munmap since that’s what so much code expects.

—
Reply to this email directly or view it on GitHub #285 (comment).

jfbastien · 2015-07-29T18:03:22Z

@titzer @lukewagner I think you misunderstand what I propose. #227 (comment) should explain my proposal pretty concisely:

It doesn't have munmap or madvise.
It has basic protection because it removes the other PROT_NONE magic.
It otherwise behaves *exactly the same as sbrk: a module's memory needs to be linear (at least for MVP).

I don't know where the discussion about signals came from. That's not in my proposal.

@lukewagner said:

I expect the root motivation here is an anticipated separate-process wasm impl (where you can use NaCl-like sandboxing)

That's incorrect. The motivation comes from having a clean and non-magical API at MVP, especially since we know we'll do a more capable mmap post-MVP.

@titzer said:

We're starting to wander into territory that is dictating virtual memory tricks that would be either prohibitively expensive to emulate and/or severely impact colocation in a host process.

That's also incorrect, since what I propose is implemented pretty much the same way as sbrk. The API is different, and some of the magic is developer-accessible instead of hidden behind wasm implementations, but the code browsers contain is almost the same.

Let's not recreate POSIX by replaying its history. Why standardize sbrk and then add mmap when we can get to a better API (albeit restricted) from MVP on, and then un-restrict later? POSIX isn't perfect, maybe we want a slightly different API, but going back to sbrk is silly: it'll make it hard to add mmap later! Starting off with a restricted mmap gives us the same capabilities as with sbrk, none of the headache, and implies less magic.

@pizlonator I think the sanitizers are a decent perf measurement of developer-side page protection overhead checks (shadow memory isn't quite the same, but almost).

lukewagner · 2015-07-29T18:27:44Z

@kg I definitely think we can separate the problems of acquiring virtual address space from doing interesting stuff with it (read-only, copy-on-write, etc). It's really only the acquiring step which has all the continuity/bounds-checking concerns.

@pizlonator That manual protection checking route seems like a pretty significant departure from our general approach of removing sources of sandboxing overhead in wasm by design.

@jfbastien So how do you envision implementing mmap other than reserving all the address space up front (and using signal handling to catch PROT_NONE access) OR allocating in disjoint regions and doing per-access protection checks as @pizlonator was describing.

pizlonator · 2015-07-29T18:59:34Z

On Jul 29, 2015, at 11:28 AM, Luke Wagner notifications@github.com wrote:

@kg https://github.com/kg I definitely think we can separate the problems of acquiring virtual address space from doing interesting stuff with it (read-only, copy-on-write, etc). It's really only the acquiring step which has all the continuity/bounds-checking concerns.

@pizlonator https://github.com/pizlonator That manual protection checking route seems like a pretty significant departure from our general approach of removing sources of sandboxing overhead in wasm by design.

It’s really not a departure. :-) I’m proposing removing a source of sandboxing space overhead by slightly increasing an already-existing source of time overhead.

It’s fair to say that we want to remove sources of time overhead at the expense of space overhead. It’s also fair to say that the space costs of a sbrk approach versus the mmap+pagechecks approach that it’s not worth slowing things down. Do you believe that we should always reduce time overhead even when it means space overhead, or are you just of the view that the trade-off doesn’t make sense in this particular case?

-Filip

@jfbastien https://github.com/jfbastien So how do you envision implementing mmap other than reserving all the address space up front (and using signal handling to catch PROT_NONE access) OR allocating in disjoint regions and doing per-access protection checks as @pizlonator https://github.com/pizlonator was describing.

—
Reply to this email directly or view it on GitHub #285 (comment).

jfbastien · 2015-07-29T19:21:53Z

@jfbastien So how do you envision implementing mmap other than reserving all the address space up front (and using signal handling to catch PROT_NONE access) OR allocating in disjoint regions and doing per-access protection checks as @pizlonator was describing.

Exactly the same way you'd implement it for sbrk: one page at a time, all pages are contiguous in the process' virtual address space (you can of course aggregate allocations). The mmap API restrictions would make sure that's all you can do, exactly like for sbrk, and we can loosen these restrictions in the future.

You can implement PROT_NONE without signals on a per-access basis, or not. That's left up to the implementation. If your implementation wants to use signals then they're not visible to the developer, only to the embedder, it's an implementation detail.

lukewagner · 2015-07-29T19:27:50Z

@pizlonator You're right, it's a balance and I don't think we should a priori favor one over the other. But in this case, I think it leans heavily toward time. My biggest reason for thinking so is that mmap doesn't give any memory savings per se, but it avoids OOMs due to fragmentation which is mainly a problem on 32-bit. 32-bit OOMs are a serious problem, but less over time as 64-bit becomes prevalent and I think there are alternate mitigations like some sort of on-page-load hint (say, another tag) to pre-reserve a large contiguous range early and ensure the page load happens in a fresh process.

pizlonator · 2015-07-29T19:36:11Z

On Jul 29, 2015, at 12:28 PM, Luke Wagner notifications@github.com wrote:

@pizlonator https://github.com/pizlonator You're right, it's a balance and I don't think we should a priori favor one over the other. But in this case, I think it leans heavily toward time. My biggest reason for thinking so is that mmap doesn't give any memory savings per se, but it avoids OOMs due to fragmentation which is mainly a problem on 32-bit. 32-bit OOMs are a serious problem, but less over time as 64-bit becomes prevalent and I think there are alternate mitigations like some sort of on-page-load hint (say, another tag) to pre-reserve a large contiguous range early and ensure the page load happens in a fresh process.

FWIW, on iOS, there is a limit to the amount of virtual memory that a process can have. So, 64-bit iOS behaves “like” a 32-bit OS in this regard.

Also, on 64-bit OSes that don’t have this restriction, reducing the total number of virtual address space slabs you’re using is a great way of reducing TLB misses. I don’t know if the benefits from TLB miss reduction in a software page-check mmap/munmap approach will outweigh the overhead of the page-check.

-Filip

lukewagner · 2015-07-29T19:49:30Z

@jfbastien Ah, that was not clear from the PR which seems to hop from no-heap-resizing straight to noncontiguous mmap. I think the real question then is this noncontiguity; figuring out that has a great deal to do with how this stripped-down mmap would evolve. I have no doubt we can get most of the raw functionality of mmap, but do we need to mimic POSIX's API design of overloading one syscall to do everything? We're not trying to preserve Interrupt Vector Table space here.

@pizlonator So does iOS have literally just 4GiB? Are there plans to loosen this over time? Given that it would avoid the runtime protection-checking penalty, what do you think about the "reserve up front hint" idea I mentioned? Also, I don't quite follow the TLB argument: I thought TLBs operate at page granularity and thus wouldn't care whether our address spaces were contiguous or not, so long as they were page granularity and we were touching the same number of pages.

jfbastien · 2015-07-29T19:59:19Z

@lukewagner agreed POSIX isn't perfect. I want something akin to mmap, and more than sbrk, and am open to changing the API (while leaving the door open to evolving new features later). Looking at what other OSes have done would be the right way to go IMO.

@pizlonator I'm not sure I understand your point about TLB. Are you suggesting we expose bigger (huge) pages if the OS has them, or that allocating too many virtual pages is bad? It'll only increase TLB misses if they're used (and it'll increase page table walk a bit if not used), but that's not a real problem if you look at how NaCl works on x86-64: allocate 84GiB of virtual space. I'm not saying we should do this (we are in-process for wasm!), I'm saying that wasm's limit on virtual reservation can't possibly reach NaCl's, and therefore can't be as expensive. So, my point is that TLB cost is still just "pay for what you use", and that's irrespective of how much virtual space we allow developers to reserve unless we let them reserve huge pages (and there may be security issues with that).

pizlonator · 2015-07-29T20:05:46Z

On Jul 29, 2015, at 12:49 PM, Luke Wagner notifications@github.com wrote:

@jfbastien https://github.com/jfbastien Ah, that was not clear from the PR which seems to hop from no-heap-resizing straight to noncontiguous mmap. I think the real question then is this noncontiguity; figuring out that has a great deal to do with how this stripped-down mmap would evolve. I have no doubt we can get most of the raw functionality of mmap, but do we need to mimic POSIX's API design of overloading one syscall to do everything? We're not trying to preserve Interrupt Vector Table space here.

@pizlonator https://github.com/pizlonator So does iOS have literally just 4GiB? Are there plans to loosen this over time?

It’s more subtle. On some versions, the “64-bit” virtual address space, which is usually really 48-bit on most 64-bit architectures, is less than 48-bit. I don’t remember how much less. Also, there is a limit of how much virtual memory you’re allowed to reserve. That limit can be quite tight in some cases (different processes have different policies, etc). That limit can be as small as hundreds of MBs in some extreme cases. I don’t think that a WebKit process ever runs under such a quota, but JavaScriptCore has many non-WebKit clients and it will probably support wasm even if it’s used outside WebKit, and so we like to be mindful of the kinds of exotic restrictions that any of our clients may have.
Given that it would avoid the runtime protection-checking penalty, what do you think about the "allocate up front hint" idea I mentioned?

Not sure, will have to think about it.
Also, I don't quite follow the TLB argument: I thought TLBs operate at page granularity and thus wouldn't care whether our address spaces were contiguous or not, so long as they were page granularity and we were touching the same number of pages.

You might be right, but I think regardless of granularity, it would start to matter once you had multiple wasm guests inside a host. My dream is that starting a wasm guest should be cheap enough that you can start many of them. That stops being the case if each guest needs a large reservation. Even if the OS allows you to make many large reservations, it will surely put pressure on the TLB.

-Filip

kg · 2015-07-29T20:24:28Z

You might be right, but I think regardless of granularity, it would start to matter once you had multiple wasm guests inside a host. My dream is that starting a wasm guest should be cheap enough that you can start many of them.

If this is a use case we care about we should consider it seriously - I assume this means things like every tab having one or more wasm guests, or a single host process running dozens of wasm applications in it. That host process is probably 32-bit or has address space constraints like the ones pizlo described. Do we want to address those use cases? Off the top of my head, this sounds like it would explicitly rule out address space reservations and it would make heap growth considerably more difficult.

lukewagner · 2015-07-29T20:30:08Z

More on the "do we need to mimic POSIX's API" question: here are the set of APIs I've been imagining that cover the space of mmap functionality we want to expose. Starting with the APIs that don't require signals to implement efficiently:

sbrk(delta) : maps/unmaps new pages contiguously
map_file(addr, length, blob, file-offset) : semantically just copies specified range from blob into existing range (addr, length), but implementation encouraged to mmap(addr, length, MAP_FIXED | MAP_PRIVATE, fd).
dontneed(addr, length) : semantically zeros, but implementation encouraged to madvise(DONTNEED)
shmem_create(length) : create a shared memory object that can be mapped by...
map_shmem(addr, length, shmem, shmem-offset) like map_file except MAP_SHARED, which isn't valid on read-only blobs. Maybe we'd want to skip shmems and go straight to a new Web API proposal for a shareable MutableFile object. I dunno.

Features requiring signal handlers for efficient operation:

mprotect(addr, length, prot-flags) : change protection on existing range
sbrk(delta, prot-flags) : optimize the common case of allocating new memory and immediately setting protection flags

Lastly, if we decide to go down the route of noncontiguous allocation:

map_noncontiguous(length [, prot-flags]) : like sbrk, but drop continuity requirement (as well as deterministic return)

Are there any use cases of mmap not covered here? If not, it seems better API design to split out all these quite-different operations into separate builtins.

lukewagner · 2015-07-29T20:46:40Z

Also, there is a limit of how much virtual memory you’re allowed to reserve. That limit can be quite tight
in some cases (different processes have different policies, etc). That limit can be as small as hundreds
of MBs in some extreme cases.

Wow, that's surprising. Are you sure that is the limit on virtual address space reservation (so PROT_NONE)? I could see that for PROT_READ|WRITE or RSS, but PROT_NONE seems surprising.

That being said, I don't think there is a problem here. It's important to distinguish the limitations of:

the hard 4GiB virtual address limit imposed by 32-bit archs
the OS-dependent quota on virtual address space reservations

For all iOS cases, where 1 isn't an issue, it seems like you could start out by mmaping the initial heap size (no extra reservation) and using mremap to resize on demand. In this case, you never reserve more than you need, there is (hopefully) no copying overhead on growth, but you stay contiguous. The only reason this isn't a solution in general is because limitation 1, but that's what I'm positing will fade over time.

My dream is that starting a wasm guest should be cheap enough that you can start many of them.
That stops being the case if each guest needs a large reservation.

I share that dream. I'm hoping the above mremap strategy makes this work well on 64-bit.

Even if the OS allows you to make many large reservations, it will surely put pressure on the TLB.

I'm not a TLB expert so maybe I'm out of date now, but if these are just PROT_NONE reservations, I don't see how they would influence the TLB, which gets populated on accesses, page at a time.

jfbastien · 2015-07-29T20:53:34Z

I think one big usecase you're not building for is dynamic linking. We should avoid painting ourselves into a corner here. Supporting a zero page, even without page protections, is easy because it's a single comparison on read/write. Supporting .rodata protections for writes is similarly easy when you only have one module and no dynamic linking because you can just map it after the zero page, and then it's still just one bounds check per write (albeit with a different limit than the check for reads). Add dynamic linking and the discussions we're having become much more complicated.

I don't think we want an entire new set of memory APIs when we do add dynamic linking post-MVP.

lukewagner · 2015-07-29T20:57:51Z

@jfbastien I don't know who "you" is in "you're not building for". In general, .rodata (dynamic linking or no) doesn't seem any different than mprotect in my list above: it requires signal handling to implement efficiently, otherwise you'd need per-access protection checks as @pizlonator was describing above.

jfbastien · 2015-07-29T21:16:42Z

@jfbastien I don't know who "you" is in "you're not building for". In general, .rodata (dynamic linking or no) doesn't seem any different than mprotect in my list above: it requires signal handling to implement efficiently, otherwise you'd need per-access protection checks as @pizlonator was describing above.

Agreed, but these per-access protections are way cheaper if you only have one .rodata section mapped at load time than if new ones can be added dynamically: it's a single comparison if you only have one and map it adjacent to the zero page!

pizlonator · 2015-07-29T21:23:55Z

On Jul 29, 2015, at 1:47 PM, Luke Wagner notifications@github.com wrote:

Also, there is a limit of how much virtual memory you’re allowed to reserve. That limit can be quite tight
in some cases (different processes have different policies, etc). That limit can be as small as hundreds
of MBs in some extreme cases.

Wow, that's surprising. Are you sure that is the limit on virtual address space reservation (so PROT_NONE)? I could see that for PROT_READ|WRITE or RSS, but PROT_NONE seems surprising.

I believe that reservation is the thing that matters.

That being said, I don't think there is a problem here. It's important to distinguish the limitations of:

the hard 4GiB virtual address limit imposed by 32-bit archs

the OS-dependent quota on virtual address space reservations

For all iOS cases, where 1 isn't an issue, it seems like you could start out by mmaping the initial heap size (no extra reservation) and using mremap to resize on demand. In this case, you never reserve more than you need, there is (hopefully) no copying overhead on growth, but you stay contiguous. The only reason this isn't a solution in general is because limitation 1, but that's what I'm positing will fade over time.

My point is that when you have this constraint, then multiple wasm guests in the same process start to be a problem, if free memory in one wasm guest cannot be used by a different wasm guest.

My dream is that starting a wasm guest should be cheap enough that you can start many of them.
That stops being the case if each guest needs a large reservation.

I share that dream. I'm hoping the above mremap strategy makes this work well on 64-bit.

Even if the OS allows you to make many large reservations, it will surely put pressure on the TLB.

I'm not a TLB expert so maybe I'm out of date now, but if these are just PROT_NONE reservations, I don't see how they would influence the TLB, which gets populated on accesses, page at a time.

I don’t really know the details. I just vaguely recall from my GC days that having widely dispersed virtual memory reservations is less efficient than virtual memory reservations that are close together.

-Filip

—
Reply to this email directly or view it on GitHub #285 (comment).

jfbastien · 2015-07-29T21:29:20Z

I think some of us are thinking of more than just browser embeddings here, specifically of IoT type stuff, when it comes to restricted virtual address space. Let's leave that open, without going into details.

I'm not a TLB expert so maybe I'm out of date now, but if these are just PROT_NONE reservations, I don't see how they would influence the TLB, which gets populated on accesses, page at a time.

I don’t really know the details. I just vaguely recall from my GC days that having widely dispersed virtual memory reservations is less efficient than virtual memory reservations that are close together.

TLB is only affected when pages are accessed, so PROT_NONE virtual reservations don't affect the TLB. Huge pages do affect TLBs because they only occupy a single entry, but those need a single protection for the entire huge page, aren't available on every HW / OS (the OS needs to opt in, even if the HW has it), and imply security issues.

So, TLBs aren't an issue for this discussion.

pizlonator · 2015-07-29T21:30:58Z

On Jul 29, 2015, at 1:24 PM, Katelyn Gadd notifications@github.com wrote:

You might be right, but I think regardless of granularity, it would start to matter once you had multiple wasm guests inside a host. My dream is that starting a wasm guest should be cheap enough that you can start many of them.

If this is a use case we care about we should consider it seriously - I assume this means things like every tab having one or more wasm guests, or a single host process running dozens of wasm applications in it. That host process is probably 32-bit or has address space constraints like the ones pizlo described. Do we want to address those use cases? Off the top of my head, this sounds like it would explicitly rule out address space reservations and it would make heap growth considerably more difficult.

The issue isn’t multiple tabs - those get different processes anyway, at least in WebKit - but rather multiple wasm guests started in the same web page. If we restrict ourselves to gaming applications than the entire web page is probably one monolithic wasm app. But web pages usually have many independent ES things going on, and I can imagine each of those things wanting a wasm module. Probably if two ES modules that know nothing about each other each have some wasm code, then those wasm codes will have independent sandboxes. That’s the case I’m interested in.

-Filip

lukewagner · 2015-07-29T21:47:20Z

My point is that when you have this constraint, then multiple wasm guests in the same process start to
be a problem, if free memory in one wasm guest cannot be used by a different wasm guest.

With the scheme I described to you (no over-reservation, just mremap on growth), each guest would be using exactly as many pages as it needed. To do better than that, it seems like you need the guests sharing a malloc heap which means you want them all dynamically linked and using a single linear memory.

lukewagner · 2015-07-29T21:51:37Z

@jfbastien Sure, but what is it you are arguing for or against here based on these observations?

pizlonator · 2015-07-29T21:55:35Z

On Jul 29, 2015, at 2:47 PM, Luke Wagner notifications@github.com wrote:

My point is that when you have this constraint, then multiple wasm guests in the same process start to
be a problem, if free memory in one wasm guest cannot be used by a different wasm guest.

With the scheme I described to you (no over-reservation, just mremap on growth), each guest would be using exactly as many pages as it needed. To do better than that, it seems like you need the guests sharing a malloc heap which means you want them all dynamically linked and using a single linear memory.

No, there is a difference. What would your scheme do if one wasm guest freed a lot of memory, leading to the malloc to munmap that memory?

Granted, many UNIXish mallocs don’t munmap, but then again many do (WebKit’s certainly does). In an environment where multiple guests are sharing the same memory space, being able to share pages that got freed is a nice property to have. It’s not as nice as shared malloc, but isn’t as hard to get right from a security standpoint.

-Filip

kg · 2015-07-29T22:50:18Z

But web pages usually have many independent ES things going on, and I can imagine each of those things wanting a wasm module. Probably if two ES modules that know nothing about each other each have some wasm code, then those wasm codes will have independent sandboxes. That’s the case I’m interested in.

I strongly believe we should solve these cases by encouraging page authors to load multiple wasm modules into a single wasm runtime environment, much how the vast majority of Windows (and presumably OS X, Linux?) applications are a single process that pulls in a diverse set of libraries. The libraries have their own code and statics and perhaps even threads, but they aren't separate runtimes. We've decided to solve dynamic linking Later and I agree with that choice but we shouldn't let that accidentally guide us towards a design that favors bad engineering :-)

pizlonator · 2015-07-29T23:10:07Z

On Jul 29, 2015, at 3:50 PM, Katelyn Gadd notifications@github.com wrote:

But web pages usually have many independent ES things going on, and I can imagine each of those things wanting a wasm module. Probably if two ES modules that know nothing about each other each have some wasm code, then those wasm codes will have independent sandboxes. That’s the case I’m interested in.

I strongly believe we should solve these cases by encouraging page authors to load multiple wasm modules into a single wasm runtime environment, much how the vast majority of Windows (and presumably OS X, Linux?) applications are a single process that pulls in a diverse set of libraries. The libraries have their own code and statics and perhaps even threads, but they aren't separate runtimes. We've decided to solve dynamic linking Later and I agree with that choice but we shouldn't let that accidentally guide us towards a design that favors bad engineering :-)

This is not bad engineering. A web page is not a monolothic application consisting of code written by one person. Typically you’ll have some JS payload for the things you want to do, plus you’ll pull in many (tens) frameworks for doing some service for you:

Some UI framework for some funky button or form.
Some social framework for … socializing.
Some ad framework so that you make money.

You may pull in many of each of these. It’s not reasonable to expect that users will come up with a JS-side protocol for loading multiple wasm modules into the same address space. Also, it would be insecure to do so - for example if a social networking site wanted to run some wasm then they probably wouldn’t like it too much if they had to share an address space with the wasm in a competing ad framework. Currently, it’s common to use frames to achieve some encapsulation, and imperfect as that may be, I don’t think we should make it easier for people to do shady things by enforcing a single shared address space for every web page.

On the other hand, I completely agree that using isolated wasm modules should not be the way that an app modularizes its own code. That’s a separate issue. My concern is about modules that are written by different authors, with no intent to share anything with each other.

-Filip

kg · 2015-07-29T23:18:47Z

Also, it would be insecure to do so - for example if a social networking site wanted to run some wasm then they probably wouldn’t like it too much if they had to share an address space with the wasm in a competing ad framework. Currently, it’s common to use frames to achieve some encapsulation, and imperfect as that may be, I don’t think we should make it easier for people to do shady things by enforcing a single shared address space for every web page.

OK, I can see how this fits into the picture, now. I still think this is a serious concern we should evaluate, then - if things like ad networks and social buttons move to wasm, we've increased the cost profile of those scripts from a few GC heap objects and jitted functions to an arbitrarily-sized reserved heap and compiled functions. What happens if every library author is a bad actor that makes their heap too big? Would that cause browser aborts on iOS? I can see how maybe this is just an unavoidable reality of how wasm will work in production, since we can't enforce anything resembling a same-origin policy on individual modules unless they have separate heaps.

lukewagner · 2015-07-29T23:31:10Z

@pizlonator Ah, so you're talking about pages in the middle being unused and then unmapped. FWIW, FF uses jemalloc which appears (if I'm reading correctly) to madvise(DONTNEED) when purging internal free space. I think the reasoning is that the big win is freeing up committed pages, not virtual address space (except in the case of huge allocations which jemalloc does synchronously munmap). Furthermore, if you only madvise(DONTNEED), you don't have to re-mmap before use again, you can just go touch it which avoids syscalls.

I can imagine pathological multi-guest situations where each guest consumes some peak amount of memory (claiming a bunch of virtual address space), then shrink down and so you end up in a state where all your virtual address space is allocated while very little is in use. I just don't know that this case sufficiently motivates a design (non-continuity) with such cross-cutting performance costs.

pizlonator · 2015-07-29T23:31:32Z

On Jul 29, 2015, at 4:19 PM, Katelyn Gadd notifications@github.com wrote:

Also, it would be insecure to do so - for example if a social networking site wanted to run some wasm then they probably wouldn’t like it too much if they had to share an address space with the wasm in a competing ad framework. Currently, it’s common to use frames to achieve some encapsulation, and imperfect as that may be, I don’t think we should make it easier for people to do shady things by enforcing a single shared address space for every web page.

OK, I can see how this fits into the picture, now. I still think this is a serious concern we should evaluate, then - if things like ad networks and social buttons move to wasm, we've increased the cost profile of those scripts from a few GC heap objects and jitted functions to an arbitrarily-sized reserved heap and compiled functions.

Yes. It’s a problem, and we need to get this right. :-) The key to me is just elasticity: if one wasm module frees memory, that memory should have some chance of becoming available to other wasm modules. I don’t think this has to be perfect but it does have to be good enough, assuming you’re using a sensible malloc and that malloc can call munmap. The elasticity would then be on page level.

What do you think of this?

What happens if every library author is a bad actor that makes their heap too big? Would that cause browser aborts on iOS?

Well hopefully we’ll implement this using an exception throw when you cross some quota. Currently at worst it’ll kill the process associated with the tab but the browser will be fine. It’s already the case that you can OOM from JS, and it’s not a tremendous problem.
I can see how maybe this is just an unavoidable reality of how wasm will work in production, since we can't enforce anything resembling a same-origin policy on individual modules unless they have separate heaps.

Right.

—
Reply to this email directly or view it on GitHub #285 (comment).

jfbastien · 2015-07-30T02:04:43Z

@pizlonator I'm not sure I understand: do you want to reclaim virtual pages, physical pages, or both?

pizlonator · 2015-07-30T03:00:43Z

I meant both.

Hearing the arguments though, I agree that doing this would be prohibitive.

-Filip

On Jul 29, 2015, at 7:05 PM, JF Bastien notifications@github.com wrote:

@pizlonator https://github.com/pizlonator I'm not sure I understand: do you want to reclaim virtual pages, physical pages, or both?

—
Reply to this email directly or view it on GitHub #285 (comment).

jfbastien · 2015-07-30T03:34:31Z

Reclaiming physical pages is cheap :-)

lukewagner · 2015-07-30T04:03:36Z

Trying to sum up the discussion to this point as I understand it:

I've gotten some positive feedback offline from @sunfishcode and @jfbastien on splitting mmap into separate functions as suggested above. (Not the exact functions, but the rough idea.)
The main sticking point is sbrk.
- Distilling the sbrk issue further, the main question is: will linear memory be contiguous?
  - Because if "yes", then there's no real difference between sbrk and mmap given the above splitting of mmap into separate functions.
- The reason to do non-contiguous memory is to allow the engine to be able to stick other stuff (so not reserved PROT_NONE memory, but other modules' heaps or browser data etc) in between.
  - Even for engines using signal support, the above requires some sort of software page protection as described above that is expected to have non-trivial overhead.
- So, if we agree that we don't want the overhead of software page protection, then we don't actually want non-contiguous memory so we want sbrk.

Is that accurate?

jfbastien · 2015-07-30T04:12:02Z

Yes, linear memory it is if we want to avoid software-based page protection to protect the virtual memory holes left around. Though I still dislike the sbrk name, if we're creating new API names I'd change that one too :-)

I think dynamic linking may through more wrenches our way. @dschuff is working towards specifying this better.

It may also be good to quantify the overhead of software page protection, and figure out if / when it's useful. I think asan provides a good upper bound on cost? We definitely don't want to have these types of overheads by default, or as a not-so-obvious glassjaw.

lukewagner · 2015-07-30T04:18:02Z

Yeah, the sbrk name clearly has baggage, happy to rename :)

lukewagner · 2015-08-05T04:25:54Z

To try to wrap up this discussion, I created #288.

kripken · 2015-08-06T17:10:51Z

We can close this then.

kripken mentioned this pull request Jul 28, 2015

How is linear memory allocated #227

Closed

jfbastien reviewed Jul 28, 2015
View reviewed changes

remove sbrk from MVP, add mmap&friends to AstSemantics

aa38660

kripken force-pushed the how-is-linear-memory branch from b708c83 to aa38660 Compare July 28, 2015 22:35

jfbastien mentioned this pull request Jul 28, 2015

Consolidate explanation of modules into a new Modules.md and improve explanation #270

Merged

lukewagner mentioned this pull request Aug 5, 2015

Clarify, rename, and FAQ memory allocation #288

Merged

kripken closed this Aug 6, 2015

jfbastien deleted the how-is-linear-memory branch August 6, 2015 17:31

kripken mentioned this pull request Aug 12, 2015

Remove memory resizing from the MVP #294

Closed

lukewagner mentioned this pull request Aug 20, 2015

How do data segments interact with dynamic linking? #302

Closed

Remove sbrk from MVP, add mmap&friends to AstSemantics #285

Remove sbrk from MVP, add mmap&friends to AstSemantics #285

Conversation

kripken commented Jul 28, 2015

sunfishcode commented Jul 28, 2015

kripken commented Jul 28, 2015

pizlonator commented Jul 28, 2015

jfbastien Jul 28, 2015

Choose a reason for hiding this comment

pizlonator Jul 28, 2015

Choose a reason for hiding this comment

kripken Jul 28, 2015

Choose a reason for hiding this comment

pizlonator Jul 28, 2015

Choose a reason for hiding this comment

kripken commented Jul 28, 2015

pizlonator commented Jul 28, 2015

kripken commented Jul 28, 2015

pizlonator commented Jul 28, 2015

jfbastien commented Jul 28, 2015

lukewagner commented Jul 29, 2015

titzer commented Jul 29, 2015

kg commented Jul 29, 2015

pizlonator commented Jul 29, 2015

jfbastien commented Jul 29, 2015

lukewagner commented Jul 29, 2015

pizlonator commented Jul 29, 2015

jfbastien commented Jul 29, 2015

lukewagner commented Jul 29, 2015

pizlonator commented Jul 29, 2015

lukewagner commented Jul 29, 2015

jfbastien commented Jul 29, 2015

pizlonator commented Jul 29, 2015

kg commented Jul 29, 2015

lukewagner commented Jul 29, 2015

lukewagner commented Jul 29, 2015

jfbastien commented Jul 29, 2015

lukewagner commented Jul 29, 2015

jfbastien commented Jul 29, 2015

pizlonator commented Jul 29, 2015

jfbastien commented Jul 29, 2015

pizlonator commented Jul 29, 2015

lukewagner commented Jul 29, 2015

lukewagner commented Jul 29, 2015

pizlonator commented Jul 29, 2015

kg commented Jul 29, 2015

pizlonator commented Jul 29, 2015

kg commented Jul 29, 2015

lukewagner commented Jul 29, 2015

pizlonator commented Jul 29, 2015

jfbastien commented Jul 30, 2015

pizlonator commented Jul 30, 2015

jfbastien commented Jul 30, 2015

lukewagner commented Jul 30, 2015

jfbastien commented Jul 30, 2015

lukewagner commented Jul 30, 2015

lukewagner commented Aug 5, 2015

kripken commented Aug 6, 2015