Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

current_memory doesn't seem useful #904

Closed
jfbastien opened this issue Dec 13, 2016 · 21 comments
Closed

current_memory doesn't seem useful #904

jfbastien opened this issue Dec 13, 2016 · 21 comments
Milestone

Comments

@jfbastien
Copy link
Member

I'm worried that when we add threads it'll be actively misleading because of TOCTOU (same mistake as the POSIX filesystem APIs). We'll have to make it a sequentially-consistent operation as well, to prevent IRIW problems, which means it's a tool to implement Dekker's (same as some library functions in C++).

It's probably better to drop it.

@binji
Copy link
Member

binji commented Dec 14, 2016

I assumed we were not going to allow grow_memory on shareable linear memories, because of various synchronization issues. If that's true, then current_memory is fine for both.

@jfbastien
Copy link
Member Author

I disagree, and think we'll allow growing SABs. It'll likely require threads checking in to avoid issues you mention, but disallowing it because it's hard seems to be the wrong tradeoffs for developers.

@gahaas
Copy link
Contributor

gahaas commented Dec 14, 2016

I think the only good reason to remove current_memory is that current_memory can be replaced by grow_memory(0). Other than that I think all problems we may get with current_memory we will also get with grow_memory(0).

@jfbastien
Copy link
Member Author

grow_memory already has the expectation that calling it concurrent will grow to at least what was requested, and it already has the implicit guarantee of being sequentially-consistent (because it's related to memory allocation, it's effectively a syscall).

To draw another parallel to C++: std::shared_ptr has methods use_count and unique. Those are error prone in multi-threaded environments, and shouldn't be used that way. unique is deprecated in C++17, and use_count has a warning that it's inexact when shared between threads, and has relaxed semantics.

Don't be that method!

@lukewagner
Copy link
Member

@binji Just to answer your question in a bit more detail: I think we only want to allow sharing a Memory when that memory has a declared maximum field and this allows an impl to reserve as much as it can (up to maximum once, up front) and never have to do a moving realloc.

@jfbastien That's a good question. Practically speaking, the only code that will use current_memory is the impl of malloc et al, and they will of course have plenty of other bookkeeping data structures that probably totally subsume current_memory that will be properly synchronized when accessed. So maybe it does make sense to remove current_memory.

@kripken Do you know if Binaryen ever emits it?

@binji
Copy link
Member

binji commented Dec 16, 2016

@lukewagner thanks, that makes sense.

Removing current_memory doesn't completely fix the situation, because if the memory is exported (or imported) you can just access the buffer's byteLength, right?

@jfbastien
Copy link
Member Author

Removing current_memory doesn't completely fix the situation, because if the memory is exported (or imported) you can just access the buffer's byteLength, right?

That's outside of wasm 😉
But yeah that's up to SAB.

@kripken
Copy link
Member

kripken commented Dec 16, 2016

@lukewagner: looks like asm2wasm doesn't (as you said, the malloc bookkeeping is done elsewhere) but the wasm backend might, s2wasm has code to translate it and it is in the s2wasm tests. Perhaps that's out of date though, @sunfishcode or @dschuff should know.

@sunfishcode
Copy link
Member

The s2wasm support is there because we added a way to emit current_memory through LLVM, however I'm not aware of anyone making use of it yet.

@lukewagner
Copy link
Member

Ok, so then this probably wouldn't be much of a breaking change, and one we could do in-place (noting the change on webassembly.org/roadmap).

One thing I just noticed is that technically, according to current semantics, grow_memory/Memory.grow, even with a delta of 0, detach the current ArrayBuffer and create a new one with the "new" size to alias Memory. Thus, grow_memory(0) isn't an exact replacement for current_memory as a way to query memory size.

@titzer
Copy link

titzer commented Dec 19, 2016

I don't see the value of removing current_memory. Since this is semantically equivalent to grow_memory 0, then it has no more race conditions than grow_memory 0 would. The original rationale for adding this, as I recall, was not to overload grow_memory (a state changing operator) with a mode where it acts like a query.

@lukewagner
Copy link
Member

(Well, not semantically equivalent to grow_memory 0, as I just mentioned, but I think that is more of a reason to keep it.)

@jfbastien
Copy link
Member Author

@titzer as I detailed above, when we add threads it will not have the same synchronization guarantees unless we make it more than a mere relaxed load.

The point @lukewagner makes is interesting though. It detaches the ArrayBuffer, but the WebAssembly.Memory is still intact right? It would be a bug if:

  1. Share WebAssembly.Memory across multiple WebAssembly.Instance
  2. One WebAssembly.Instance does grow_memory
  3. Sharing WebAssembly.Instances don't see the same WebAssembly.Memory anymore

That's not the case, right?

Further, the above sharing may be a valid usecase for current_memory which I hadn't thought about: when multiple WebAssembly.Instances share the same WebAssembly.Memory then they either need to invent their own current_memory or use the built-in one. This isn't the case for single WebAssembly.Instance because the implementation of mmap / malloc has to do the bookkeeping. That being said, when sharing a WebAssembly.Memory one has to have a shared implementation of mmap which does the bookkeeping, so I'm not sure this actually is a valid usecase for current_memory.

WDYT?

@lukewagner
Copy link
Member

@jfbastien The Memory is always valid/coherent between all sharing Instances, it's just that any extant ArrayBuffer s that are viewing the Memory get detached on resize. So in your example, there is no problem.

You're also right that when multiple Instances share the same Memory, their mallocs need to coordinate so that there is only one malloc heap; but this is just a fundamental requirement of dynamic linking (which is what multiple-Instance-single-Memory really is) and should just fall out of however the toolchain implements dynamic linking (statically-linked CRT vs. dynamically-linked CRT etc).

@pipcet
Copy link

pipcet commented Dec 19, 2016

Well, concurrent malloc implementations based on interleaved sbrk calls are a thing, so it might be better to say that sbrk needs to be co-ordinated across instances, probably by keeping the value of the current brk in a globally-known absolute location in memory (that's what I do).

I can think of a single use case that requires a reasonably fast current_memory opcode: testing whether a pointer is valid to write to, as, for example, syscalls are required to do on POSIX systems. I would like to suggest a check_access opcode for that purpose, with an argument specifying whether it is read or write access that's being tested, to replace current_memory in that one case.

@lukewagner
Copy link
Member

@pipcet Agreed that malloc impls will need their own synchronized internal bookkeeping. That's a good use case for which current_memory would be perfectly suitable.

@jfbastien
Copy link
Member Author

@jfbastien The Memory is always valid/coherent between all sharing Instances, it's just that any extant ArrayBuffer s that are viewing the Memory get detached on resize. So in your example, there is no problem.

Agreed that's what we want. I want to ensure that's what the spec says.

You're also right that when multiple Instances share the same Memory, their mallocs need to coordinate so that there is only one malloc heap; but this is just a fundamental requirement of dynamic linking (which is what multiple-Instance-single-Memory really is) and should just fall out of however the toolchain implements dynamic linking (statically-linked CRT vs. dynamically-linked CRT etc).

Right, that's why I'm asking: is current_memory useful in this case? I still think not.

@pipcet Agreed that malloc impls will need their own synchronized internal bookkeeping. That's a good use case for which current_memory would be perfectly suitable.

I don't think that's what was should be done: there should be a single, dynamically-linked, heap.wasm module which owns memory. It exposes mmap and friends and does all the bookkeeping.

@pipcet
Copy link

pipcet commented Dec 19, 2016

@jfbastien Can you explain in what sense heap.wasm would be dynamically-linked? It seems to me that any dynamic loader worthy of the name would need memory management facilities (not very complicated ones) to load heap.wasm, so there would no longer be a single module owning memory.

I also fail to see why mmap would be a good interface given our very limited memory model; it seems to me we can't do more than what malloc does.

To reiterate: I think the ABI should provide a canonical memory location which contains the current memory limit. If and when our view of memory becomes more complicated than a linear memory starting at 0 and being fully readable and writable, we can extend that, and maybe add a check_access opcode testing whether a range of memory is readable or writable. current_memory isn't needed, AFAICT, unless it would actually be faster than reading a word from memory.

@lukewagner I'm not sure I'm reading that right. If we have a synchronized variable containing the current memory limit, why bother with current_memory?

@jfbastien
Copy link
Member Author

@pipcet you need some module which tracks total heap size, and which blocks are allocated / free'd. heap.wasm could be that owner: all other dylibs ask it for memory (hence mmap, but call it malloc if you want) and it hands it out, calling grow_memory if needed.

Why even expose something in the ABI? I'd rather have heap.wasm own memory, it hands out memory ranges through its mmap / malloc export, and other dylibs aren't allowed to know the total size nor the allocation bitmap.

But all this is orthogonal to the issue: we seem to agree current_memory isn't useful. We agree there's a bunch of ways to better achieve what it does, we just don't agree on what it looks like. Let take that other discussion elsewhere.

@lukewagner
Copy link
Member

@pipcet Well if one wants to test whether a given pointer is in-bounds, one may not know/care what malloc (or other memory mgmt scheme) is being used. For that use current_memory would be ideal.

@jfbastien
Copy link
Member Author

Considering this has been shipped in FF and Chrome, I think it's too late to change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants