-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove sbrk from MVP, add mmap&friends to AstSemantics #285
Conversation
Is it intended that this removes the intent to implement a null guard page mechanism from the MVP? The "Adjusting memory size and permissions" seems to only be talking about future features. Would FutureFeatures.md be a better place for this content? |
Yes - how would we polyfill a null guard page mechanism in the MVP? I didn't put it in FutureFeatures because, as I see it at least, this isn't a future feature, but more of a PostMVP type thing. But based on the content and other interactions between that document and AstSemantics, AstSemantics seemed better. Unless I am misunderstanding the criteria for divvying content between those two? |
Sorry to be late following these threads. I’ve tried to read these threads, but I’m still confused about the mmap idea. What is the proposed implementation strategy here and the constraints? Are we claiming that implementors must reserve a huge amount of virtual memory and then protect it all by default, and then unprotect it when an mmap allocates something? -Filip
|
In addition to the `mmap` operation, applications will also have access to | ||
|
||
* `munmap`, to unmap `mmap`ed pages, and | ||
* `mmap_get_page_size`, to detect the proper system page size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep the name vague for now, since it's related to feature detection.
Also mention it's a "good enough" guess, not necessarily the actual value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think this answers my question.
I think we should be clear about whether future performant wasm implementations are required to use page-level protections and large reservations. Do you believe that this is the case?
-Filip
On Jul 28, 2015, at 3:17 PM, JF Bastien notifications@github.com wrote:
In AstSemantics.md #285 (comment):
+In the MVP the size of linear memory is fixed: The initial size of linear
+memory will remain unchanged for the life of that WebAssembly module. Later,
+we will support a limited form ofmmap
which can:
+
- * Allocate pages of memory, in order to increase or decrease
- the amount of available memory to the WebAssembly module.
- * Adjust the permissions of a page of memory, for example to make
- small effective addresses close to
0
behave as if they are- out-of-bounds
- (see discussion).
+In addition to the
mmap
operation, applications will also have access to
+
- *
munmap
, to unmapmmap
ed pages, and- *
mmap_get_page_size
, to detect the proper system page size.
I'd keep the name vague for now, since it's related to feature detection.
Also mention it's a "good enough" guess, not necessarily the actual value.—
Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/pull/285/files#r35708365.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfbastien ok, I pushed an edit about the "good enough" aspect.
@pizlonator I might be confused between email and github's interface, but I think @jfbastien was commenting on the pull here, not responding to your question.
Regarding large reservations, I don't think those would be necessary (unless I am missing something?), but page-level protections would be (but they already were necessary in the docs, as they mentioned a way to protect the area around 0).
I also wrote a general response in the main thread of this pull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Jul 28, 2015, at 3:37 PM, Alon Zakai notifications@github.com wrote:
In AstSemantics.md #285 (comment):
+In the MVP the size of linear memory is fixed: The initial size of linear
+memory will remain unchanged for the life of that WebAssembly module. Later,
+we will support a limited form ofmmap
which can:
+
- * Allocate pages of memory, in order to increase or decrease
- the amount of available memory to the WebAssembly module.
- * Adjust the permissions of a page of memory, for example to make
- small effective addresses close to
0
behave as if they are- out-of-bounds
- (see discussion).
+In addition to the
mmap
operation, applications will also have access to
+
- *
munmap
, to unmapmmap
ed pages, and- *
mmap_get_page_size
, to detect the proper system page size.
@jfbastien https://github.com/jfbastien ok, I pushed an edit about the "good enough" aspect.@pizlonator https://github.com/pizlonator I might be confused between email and github's interface, but I think @jfbastien https://github.com/jfbastien was commenting on the pull here, not responding to your question.
You’re right! Sorry.
-Filip
Regarding large reservations, I don't think those would be necessary (unless I am missing something?), but page-level protections would be (but they already were necessary in the docs, as they mentioned a way to protect the area around 0).
I also wrote a general response in the main thread of this pull.
—
Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/pull/285/files#r35710043.
The general idea is to enable the amount of memory used to be adjusted over time. Previously, the design docs mentioned
|
b708c83
to
aa38660
Compare
It’s a shame I didn’t have time to follow along earlier. I have a philosophical objection to this approach: it precludes good implementations that don’t play OS tricks. On the other hand, I can see that this will basically just work. For this PR, it seems like excluding mmap from the MVP is unnecessary. What about supporting just this: mmap(0, N, PROT_READ|PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0) This can be implemented exactly the same way that we would have implemented sbrk. -Filip
|
That (or |
That’s somewhat fair, I guess. I would have erred on the side of a more powerful MVP that has some features that may sometimes slow you down if you have to go to polyfill. -Filip
|
@pizlonator sorry, I think we raced on answers, or at least I didn't see your question :-) I think I've answered your question here: #227 (comment) It sounds like you're pretty much thinking about the same thing I am when you suggest |
First, let's put aside the feature of changing memory protection; that's almost orthogonal to the rest of this discussion and could be provided regardless of Now, on the subject of memory allocation, I'm having trouble understanding what Similarly,
I expect the root motivation here is an anticipated separate-process wasm impl (where you can use NaCl-like sandboxing). However:
Thus, it's important not to think of this issue as |
On Wed, Jul 29, 2015 at 10:56 AM, Luke Wagner notifications@github.com
I agree with Luke. We're starting to wander into territory that is In particular, we've floated a couple prototypes past the Chrome Security
|
I think address space management is a reality in modern software and we should design it in from the beginning. That's why I'm personally pushing for mmap. sbrk is sufficient for trivial scenarios (I want to make my allocated heap larger) but not for anything more complicated. I guess the argument I'm hearing is that we want sbrk to be our fundamental address space model, and have modifying chunks of the sbrk region be the primitive you use to handle things that would have been mmap before? I assume one of the motives here is that the sbrk model is much easier to optimize bounds checks against and requires less effort to secure. I think the sbrk model is reasonable from a compromise perspective, but we should be really sure that it won't prevent us from doing important things down the road. Will things like read-only pages, copy-on-write pages, guard pages be viable with this model? It seems like all of those features could layer on top of sbrk, and you're just forcing the user to write their own mmap in user space. So maybe that's okay. One way or another, address space management will be required once we have shared heaps and load-time dynamic linking - the hard coded heap offsets in each module will conflict with each other so you end up with each module having a separate base offset in the reserved address space for its statics, etc. It seems like this isn't necessarily incompatible with sbrk, because you can figure out how much reserved static space you need for all your modules at startup. For run-time dynamic linking (which we can't really avoid having eventually), how would this work with the sbrk model? You won't be able to grow the static region because that space will potentially be in use by the heap already. The only option there I can think of would be to put the static region for the new module at some random available offset in the heap and then sbrk() to make more space, which is that same brand of nondeterminsm you despise (also: super gross) Is sbrk-only ultimately just saying that we want address space management to be done in user-space with malloc? |
I agree with this view. I think that an sbrk-only model is restrictive, and it would be nice to have something better. But because of other constraints, having a mmap/munmap style model that actually does what it claims to do - allows the wasm host to reuse memory freed by the wasm guest - will incur some memory overhead beyond what we might want.
Right. If we had an efficient way of enabling per-wasm-process page permissions in the host, then munmap would be viable. I actually don’t know if this is profitable. OS page protections probably won’t work, since then calls into and out of wasm will get hammered with an enormous overhead - a bunch of syscalls to change page permissions. I’ve heard academic results on software-only page protections - I vaguely recall ~5% overhead results in some MS Research paper. I don’t remember the details or the citation. I also don’t have experience optimizing such checks. Hypothetically, you could imagine that instead of a bounds check and offset math for each memory access: if (address <_{unsigned} limit) you’d have a page check and offset math: if (perms[address >> logPageSize]) Here I’m assuming that “perms” is a byte array rather than bitvector, just to reduce the number of cycles needed to do the check. I don’t know if anyone has experimented with this or attempted to optimize it. I’d be open to such an approach if we knew that it could be made to be fast enough. Personally, I’d support this even if it was slower than the simple bounds check because I take it as a given that an elastic memory reuse model for wasm is a Good Thing. I’d probably support the software page check and mmap/munmap over bounds check with sbrk even if it meant 10% throughput overhead (versus the bounds check) on some reputable benchmark suite. In the absence of a performant page permission check, munmap wouldn’t actually be able to return the memory to the wasm host. So, the host would suffer the same virtual memory consumption as if the munmap was emulated on top of sbrk with the wasm guest keeping the “unmapped” page on a free list.
Yeah, though I think of it as mmap/munmap emulation in user-space - simply because we’ll probably end up literally implementing mmap/munmap since that’s what so much code expects.
|
@titzer @lukewagner I think you misunderstand what I propose. #227 (comment) should explain my proposal pretty concisely:
I don't know where the discussion about signals came from. That's not in my proposal. @lukewagner said:
That's incorrect. The motivation comes from having a clean and non-magical API at MVP, especially since we know we'll do a more capable @titzer said:
That's also incorrect, since what I propose is implemented pretty much the same way as Let's not recreate POSIX by replaying its history. Why standardize @pizlonator I think the sanitizers are a decent perf measurement of developer-side page protection overhead checks (shadow memory isn't quite the same, but almost). |
@kg I definitely think we can separate the problems of acquiring virtual address space from doing interesting stuff with it (read-only, copy-on-write, etc). It's really only the acquiring step which has all the continuity/bounds-checking concerns. @pizlonator That manual protection checking route seems like a pretty significant departure from our general approach of removing sources of sandboxing overhead in wasm by design. @jfbastien So how do you envision implementing |
It’s really not a departure. :-) I’m proposing removing a source of sandboxing space overhead by slightly increasing an already-existing source of time overhead. It’s fair to say that we want to remove sources of time overhead at the expense of space overhead. It’s also fair to say that the space costs of a sbrk approach versus the mmap+pagechecks approach that it’s not worth slowing things down. Do you believe that we should always reduce time overhead even when it means space overhead, or are you just of the view that the trade-off doesn’t make sense in this particular case? -Filip
|
Exactly the same way you'd implement it for You can implement |
@pizlonator You're right, it's a balance and I don't think we should a priori favor one over the other. But in this case, I think it leans heavily toward time. My biggest reason for thinking so is that mmap doesn't give any memory savings per se, but it avoids OOMs due to fragmentation which is mainly a problem on 32-bit. 32-bit OOMs are a serious problem, but less over time as 64-bit becomes prevalent and I think there are alternate mitigations like some sort of on-page-load hint (say, another tag) to pre-reserve a large contiguous range early and ensure the page load happens in a fresh process. |
FWIW, on iOS, there is a limit to the amount of virtual memory that a process can have. So, 64-bit iOS behaves “like” a 32-bit OS in this regard. Also, on 64-bit OSes that don’t have this restriction, reducing the total number of virtual address space slabs you’re using is a great way of reducing TLB misses. I don’t know if the benefits from TLB miss reduction in a software page-check mmap/munmap approach will outweigh the overhead of the page-check. -Filip |
@jfbastien Ah, that was not clear from the PR which seems to hop from no-heap-resizing straight to noncontiguous @pizlonator So does iOS have literally just 4GiB? Are there plans to loosen this over time? Given that it would avoid the runtime protection-checking penalty, what do you think about the "reserve up front hint" idea I mentioned? Also, I don't quite follow the TLB argument: I thought TLBs operate at page granularity and thus wouldn't care whether our address spaces were contiguous or not, so long as they were page granularity and we were touching the same number of pages. |
@lukewagner agreed POSIX isn't perfect. I want something akin to @pizlonator I'm not sure I understand your point about TLB. Are you suggesting we expose bigger (huge) pages if the OS has them, or that allocating too many virtual pages is bad? It'll only increase TLB misses if they're used (and it'll increase page table walk a bit if not used), but that's not a real problem if you look at how NaCl works on x86-64: allocate 84GiB of virtual space. I'm not saying we should do this (we are in-process for wasm!), I'm saying that wasm's limit on virtual reservation can't possibly reach NaCl's, and therefore can't be as expensive. So, my point is that TLB cost is still just "pay for what you use", and that's irrespective of how much virtual space we allow developers to reserve unless we let them reserve huge pages (and there may be security issues with that). |
You might be right, but I think regardless of granularity, it would start to matter once you had multiple wasm guests inside a host. My dream is that starting a wasm guest should be cheap enough that you can start many of them. That stops being the case if each guest needs a large reservation. Even if the OS allows you to make many large reservations, it will surely put pressure on the TLB. -Filip |
If this is a use case we care about we should consider it seriously - I assume this means things like every tab having one or more wasm guests, or a single host process running dozens of wasm applications in it. That host process is probably 32-bit or has address space constraints like the ones pizlo described. Do we want to address those use cases? Off the top of my head, this sounds like it would explicitly rule out address space reservations and it would make heap growth considerably more difficult. |
More on the "do we need to mimic POSIX's API" question: here are the set of APIs I've been imagining that cover the space of
Features requiring signal handlers for efficient operation:
Lastly, if we decide to go down the route of noncontiguous allocation:
Are there any use cases of |
Wow, that's surprising. Are you sure that is the limit on virtual address space reservation (so PROT_NONE)? I could see that for PROT_READ|WRITE or RSS, but PROT_NONE seems surprising. That being said, I don't think there is a problem here. It's important to distinguish the limitations of:
For all iOS cases, where 1 isn't an issue, it seems like you could start out by
I share that dream. I'm hoping the above mremap strategy makes this work well on 64-bit.
I'm not a TLB expert so maybe I'm out of date now, but if these are just PROT_NONE reservations, I don't see how they would influence the TLB, which gets populated on accesses, page at a time. |
I think one big usecase you're not building for is dynamic linking. We should avoid painting ourselves into a corner here. Supporting a zero page, even without page protections, is easy because it's a single comparison on read/write. Supporting I don't think we want an entire new set of memory APIs when we do add dynamic linking post-MVP. |
@jfbastien I don't know who "you" is in "you're not building for". In general, .rodata (dynamic linking or no) doesn't seem any different than |
Agreed, but these per-access protections are way cheaper if you only have one |
I believe that reservation is the thing that matters.
My point is that when you have this constraint, then multiple wasm guests in the same process start to be a problem, if free memory in one wasm guest cannot be used by a different wasm guest.
I don’t really know the details. I just vaguely recall from my GC days that having widely dispersed virtual memory reservations is less efficient than virtual memory reservations that are close together. -Filip
|
I think some of us are thinking of more than just browser embeddings here, specifically of IoT type stuff, when it comes to restricted virtual address space. Let's leave that open, without going into details.
TLB is only affected when pages are accessed, so So, TLBs aren't an issue for this discussion. |
The issue isn’t multiple tabs - those get different processes anyway, at least in WebKit - but rather multiple wasm guests started in the same web page. If we restrict ourselves to gaming applications than the entire web page is probably one monolithic wasm app. But web pages usually have many independent ES things going on, and I can imagine each of those things wanting a wasm module. Probably if two ES modules that know nothing about each other each have some wasm code, then those wasm codes will have independent sandboxes. That’s the case I’m interested in. -Filip |
With the scheme I described to you (no over-reservation, just |
@jfbastien Sure, but what is it you are arguing for or against here based on these observations? |
No, there is a difference. What would your scheme do if one wasm guest freed a lot of memory, leading to the malloc to munmap that memory? Granted, many UNIXish mallocs don’t munmap, but then again many do (WebKit’s certainly does). In an environment where multiple guests are sharing the same memory space, being able to share pages that got freed is a nice property to have. It’s not as nice as shared malloc, but isn’t as hard to get right from a security standpoint. -Filip |
I strongly believe we should solve these cases by encouraging page authors to load multiple wasm modules into a single wasm runtime environment, much how the vast majority of Windows (and presumably OS X, Linux?) applications are a single process that pulls in a diverse set of libraries. The libraries have their own code and statics and perhaps even threads, but they aren't separate runtimes. We've decided to solve dynamic linking Later and I agree with that choice but we shouldn't let that accidentally guide us towards a design that favors bad engineering :-) |
This is not bad engineering. A web page is not a monolothic application consisting of code written by one person. Typically you’ll have some JS payload for the things you want to do, plus you’ll pull in many (tens) frameworks for doing some service for you:
You may pull in many of each of these. It’s not reasonable to expect that users will come up with a JS-side protocol for loading multiple wasm modules into the same address space. Also, it would be insecure to do so - for example if a social networking site wanted to run some wasm then they probably wouldn’t like it too much if they had to share an address space with the wasm in a competing ad framework. Currently, it’s common to use frames to achieve some encapsulation, and imperfect as that may be, I don’t think we should make it easier for people to do shady things by enforcing a single shared address space for every web page. On the other hand, I completely agree that using isolated wasm modules should not be the way that an app modularizes its own code. That’s a separate issue. My concern is about modules that are written by different authors, with no intent to share anything with each other. -Filip |
OK, I can see how this fits into the picture, now. I still think this is a serious concern we should evaluate, then - if things like ad networks and social buttons move to wasm, we've increased the cost profile of those scripts from a few GC heap objects and jitted functions to an arbitrarily-sized reserved heap and compiled functions. What happens if every library author is a bad actor that makes their heap too big? Would that cause browser aborts on iOS? I can see how maybe this is just an unavoidable reality of how wasm will work in production, since we can't enforce anything resembling a same-origin policy on individual modules unless they have separate heaps. |
@pizlonator Ah, so you're talking about pages in the middle being unused and then unmapped. FWIW, FF uses jemalloc which appears (if I'm reading correctly) to I can imagine pathological multi-guest situations where each guest consumes some peak amount of memory (claiming a bunch of virtual address space), then shrink down and so you end up in a state where all your virtual address space is allocated while very little is in use. I just don't know that this case sufficiently motivates a design (non-continuity) with such cross-cutting performance costs. |
What do you think of this?
Right.
|
@pizlonator I'm not sure I understand: do you want to reclaim virtual pages, physical pages, or both? |
I meant both. Hearing the arguments though, I agree that doing this would be prohibitive. -Filip
|
Reclaiming physical pages is cheap :-) |
Trying to sum up the discussion to this point as I understand it:
Is that accurate? |
Yes, linear memory it is if we want to avoid software-based page protection to protect the virtual memory holes left around. Though I still dislike the I think dynamic linking may through more wrenches our way. @dschuff is working towards specifying this better. It may also be good to quantify the overhead of software page protection, and figure out if / when it's useful. I think asan provides a good upper bound on cost? We definitely don't want to have these types of overheads by default, or as a not-so-obvious glassjaw. |
Yeah, the |
To try to wrap up this discussion, I created #288. |
We can close this then. |
Based on discussion in #227.
I didn't know how to call the
sysconf
-like method... ideas are welcome.