Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support directly mmap-ing datafiles #5242

Merged
merged 5 commits into from Oct 29, 2018
Merged

Support directly mmap-ing datafiles #5242

merged 5 commits into from Oct 29, 2018

Conversation

danpat
Copy link
Member

@danpat danpat commented Oct 20, 2018

Issue

This PR completes the vision initially described in #1947. It builds on the work done by @TheMarex in #4881

Turns out, #4881 was almost there, it really only required the following changes:

  • ensuring the bit-vector on-disk layout matched the memory format expected by vector_view<bool>,
  • implementing a DataLayout override that understands data block offsets inside tarfiles directly, and
  • some special handling for the R-Tree .fileIndex

Fortunately, .tar files are block aligned on 512 byte boundaries - this conveniently means the data is properly word aligned if we directly mmap the entire .tar file.

There is no change here to the pre-processing tooling. However, osrm-routed (and the NodeJS bindings), now support the --mmap (or mmap_memory: true for NodeJS) options. When triggered, instead of loading all data into RAM, this causes OSRM to directly mmap datafiles.

This means you can run OSRM in quite constrained memory environments, if you're willing to sacrifice performance. Supplying sufficient memory to allow page caching to have effect means that if your data has some common access patterns, you can achieve quite good performance with significantly less runtime RAM. Realistic benchmarking for your particular scenario is necessary to decide how much RAM will give you acceptible performance, but at least now this knob exists to tune.

Tasklist

  • Rename the ContiguousDataFacade - it no longer depends on the data being contiguous
  • Rename DataLayout to ContiguousDataLayout - this is where the contiguous idea is encalsulated
  • CHANGELOG.md entry (How to write a changelog entry)
  • update relevant Wiki pages
  • add tests (see testing documentation
  • review
  • adjust for comments
  • cherry pick to release branch

include/storage/serialization.hpp Show resolved Hide resolved
include/storage/serialization.hpp Outdated Show resolved Hide resolved
include/storage/serialization.hpp Outdated Show resolved Hide resolved
include/storage/shared_data_index.hpp Show resolved Hide resolved
include/storage/shared_datatype.hpp Outdated Show resolved Hide resolved
src/engine/datafacade/process_memory_allocator.cpp Outdated Show resolved Hide resolved
src/storage/storage.cpp Outdated Show resolved Hide resolved
src/storage/storage.cpp Outdated Show resolved Hide resolved
src/storage/storage.cpp Outdated Show resolved Hide resolved
src/storage/storage.cpp Outdated Show resolved Hide resolved
@danpat danpat force-pushed the ghoshkaj_mmaperize branch 2 times, most recently from 5065ac0 to f0c72cc Compare October 27, 2018 06:53
danpat and others added 3 commits October 26, 2018 23:53
…ut for vector_view<bool> so that data can be directly mmapped.
… data into separate mmapped block

Co-authored-by: Kajari Ghosh <ghoshkaj@gmail.com>
…est suite in this mode, as well as shared memory mode.
@danpat danpat merged commit 535647e into master Oct 29, 2018
@DennisOSRM DennisOSRM deleted the ghoshkaj_mmaperize branch November 6, 2022 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants