Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1795: [Plasma] Create flag to make Plasma store use a single memory-mapped file. #1327

Closed
wants to merge 4 commits into from
Closed

Conversation

robertnishihara
Copy link
Contributor

This adds the -f flag which tells the plasma store to use a single memory-mapped file. This is accomplished by simply calling dlmemalign/dlfree on the entire space at startup.

Question: Why does the test pass? Given that plasma_client is constructed with a release delay of 64, shouldn't the store by unable to evict some objects? Yet they all seem to get evicted just fine.

cc @pcmoritz @atumanov

@robertnishihara
Copy link
Contributor Author

cc @luchy0120

With this PR, it should be possible to do the following.

First start a plasma store with the -f flag

plasma_store -m 800000000 -s /tmp/store1 -f

Then

import pyarrow.plasma as plasma
import numpy as np

c = plasma.connect('/tmp/store1', '', 64)

def create_and_release(size):
    obj_id = plasma.ObjectID(np.random.bytes(20))
    c.create(obj_id, size)
    c.seal(obj_id)

create_and_release(5 * 10 ** 8)
create_and_release(6 * 10 ** 8)

@pcmoritz pcmoritz self-requested a review November 17, 2017 05:52
@pcmoritz pcmoritz closed this in cacbacd Nov 17, 2017
@luchy0120
Copy link
Contributor

just curious , will it produce fragments when we evict objects ?

@pcmoritz
Copy link
Contributor

Yeah, there can be fragmentation, but the malloc implementation we use (written by doug lea) is designed to minimize fragmentation if possible, see http://g.oswego.edu/dl/html/malloc.html

@luchy0120
Copy link
Contributor

then, we still have potential to get outofmemory after long run. But then our require_space() will try to
evict small objects and "doug lea" will help us to make up small free fragments into large whole free space. But how do we promise that every time we call dlmemalign ,it will reuse the big mmap memory and every time we call dlfree it will not unmap ?

@robertnishihara robertnishihara deleted the allocateupfront branch November 17, 2017 08:17
@pcmoritz
Copy link
Contributor

pcmoritz commented Nov 17, 2017

Hey, this following instruction will make sure that the big mmap memory file will be reused and no new one will be created:

plasma::dlmalloc_set_footprint_limit((size_t)system_memory);

If dlmemalign sees that there is not enough space and the mmap file is already at system_memory, it will return a null pointer and then we call the eviction policy to evict something (see the loop around dlmemalign).

If hypothetically dlfree was called on the last object in the store, the mmap file might be unmapped and remapped when a new object is dlmemaligned; this would only happen in corner cases (i.e. if there is only one or very few large objects that are already released and they need to be evicted). We could prevent it by mallocing a very small object and the beginning and not freeing it. If you are concerned about this and can show it actually happens and want to submit a PR please go ahead (in this case it makes sense to increase the system_memory by the size of the small object), but it is not even clear that dlmalloc would unmap the last memory mapped file. And even if it happens, the behavior would still be correct...

Hope that helps!

@wesm
Copy link
Member

wesm commented Nov 18, 2017

Most recent build stalled in macOS, we should keep an eye on if it becomes a recurring thing

2: [ RUN      ] TestPlasmaStore.MultipleClientTest
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

@robertnishihara
Copy link
Contributor Author

Thanks for letting us know. We'll keep an eye out for it.

@luchy0120
Copy link
Contributor

@pcmoritz , thanks for the explanation , I'll do more testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants