New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1795: [Plasma] Create flag to make Plasma store use a single memory-mapped file. #1327
Conversation
cc @luchy0120 With this PR, it should be possible to do the following. First start a plasma store with the
Then import pyarrow.plasma as plasma
import numpy as np
c = plasma.connect('/tmp/store1', '', 64)
def create_and_release(size):
obj_id = plasma.ObjectID(np.random.bytes(20))
c.create(obj_id, size)
c.seal(obj_id)
create_and_release(5 * 10 ** 8)
create_and_release(6 * 10 ** 8) |
just curious , will it produce fragments when we evict objects ? |
Yeah, there can be fragmentation, but the malloc implementation we use (written by doug lea) is designed to minimize fragmentation if possible, see http://g.oswego.edu/dl/html/malloc.html |
then, we still have potential to get outofmemory after long run. But then our require_space() will try to |
Hey, this following instruction will make sure that the big mmap memory file will be reused and no new one will be created: Line 825 in cacbacd
If dlmemalign sees that there is not enough space and the mmap file is already at system_memory, it will return a null pointer and then we call the eviction policy to evict something (see the loop around dlmemalign). If hypothetically dlfree was called on the last object in the store, the mmap file might be unmapped and remapped when a new object is dlmemaligned; this would only happen in corner cases (i.e. if there is only one or very few large objects that are already released and they need to be evicted). We could prevent it by mallocing a very small object and the beginning and not freeing it. If you are concerned about this and can show it actually happens and want to submit a PR please go ahead (in this case it makes sense to increase the system_memory by the size of the small object), but it is not even clear that dlmalloc would unmap the last memory mapped file. And even if it happens, the behavior would still be correct... Hope that helps! |
Most recent build stalled in macOS, we should keep an eye on if it becomes a recurring thing
|
Thanks for letting us know. We'll keep an eye out for it. |
@pcmoritz , thanks for the explanation , I'll do more testing. |
This adds the
-f
flag which tells the plasma store to use a single memory-mapped file. This is accomplished by simply callingdlmemalign
/dlfree
on the entire space at startup.Question: Why does the test pass? Given that
plasma_client
is constructed with a release delay of 64, shouldn't the store by unable to evict some objects? Yet they all seem to get evicted just fine.cc @pcmoritz @atumanov