Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tar-format to encapsulate data #4960

Merged
merged 53 commits into from
Mar 27, 2018
Merged

Use tar-format to encapsulate data #4960

merged 53 commits into from
Mar 27, 2018

Conversation

TheMarex
Copy link
Member

@TheMarex TheMarex commented Mar 15, 2018

Issue

This PR is another step towards implementing #4952. The goal is to make any .osrm.* file a tar file internally and address data by (file)name. As example the .mldgr file now looks like this:

/mld/multilevelgraph/node_array.meta
/mld/multilevelgraph/node_array
/mld/multilevelgraph/edge_array.meta
/mld/multilevelgraph/edge_array
/mld/multilevelgraph/node_to_edge_offset.meta
/mld/multilevelgraph/node_to_edge_offset
/mld/multilevelgraph/connectivity_checksum.meta
/mld/multilevelgraph/connectivity_checksum

Where every file contains raw binary data for data structures in OSRM that we can deserialize or read directly to memory. The .meta files contain additional information for setting up the data layout, e.g. the number of values for serialized std::vector, or the fingerprint.

Tasklist

@TheMarex TheMarex self-assigned this Mar 15, 2018
@TheMarex TheMarex force-pushed the refactor/tar_files branch 3 times, most recently from c72d85a to 24dc369 Compare March 22, 2018 16:43
@TheMarex TheMarex requested a review from oxidase March 22, 2018 18:19
@TheMarex TheMarex requested a review from danpat March 22, 2018 18:27
Copy link
Contributor

@oxidase oxidase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have some comments to the last commit


namespace detail
{
template <typename T, typename BlockT = unsigned char>
inline unsigned char packBits(const T &data, std::size_t index, std::size_t count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return value must be BlockT

{
static_assert(std::is_same<typename T::value_type, bool>::value, "value_type is not bool");
const unsigned char mask = 1 << (count - 1);
const BlockT mask = 1 << (count - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here must be BlockT{1} << (count - 1)


const auto decode = [&](const std::uint64_t block) {
auto read_size = std::min<std::size_t>(count - index, WORD_BITS);
unpackBits(data, index, read_size, block);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here unpackBits<VectorT, std::uint64_t>


// FIXME on old boost version the function_input_iterator does not work with lambdas
// so we need to wrap it in a function here.
const std::function<std::uint64_t()> encode_function = [&]() -> char {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/-> char/-> std::uint64_t/

return packed;
};

std::uint64_t number_of_blocks = std::ceil((double)count / WORD_BITS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(count + WORD_BITS - 1) / WORD_BITS would be better to avoid floating-point arithmetic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice trick. I know a few places were this would come in useful.

@oxidase
Copy link
Contributor

oxidase commented Mar 27, 2018

@TheMarex i pushed fixes for msvc to trigger CI builds ("fixes" are reducing c++ syntactic sugar for msvc). Feel free to kick-out or squash the last commit.

@TheMarex
Copy link
Member Author

unknown location(0): fatal error: in "route/test_manual_setting_of_annotations_property": class osrm::util::exception: File "C:/projects/osrm/test/data/ch/monaco.osrm.fileIndex" mapping failed: failed opening file: The process cannot access the file because it is being used by another process.

Some more problems with windows. I suspect this is because I had to change the r-tree to allow writing to an mmap files. Maybe I can work around this.

@TheMarex
Copy link
Member Author

TheMarex commented Mar 27, 2018

Wuhu finally! It compiles. 🚀 I need to merge this because we add the new dependency tree which doesn't work with rebase + merge.

@TheMarex TheMarex merged commit 2690dd0 into master Mar 27, 2018
@TheMarex TheMarex deleted the refactor/tar_files branch March 27, 2018 12:34
@TheMarex TheMarex mentioned this pull request Mar 28, 2018
@mloskot
Copy link
Member

mloskot commented Mar 28, 2018

Follow-up to @TheMarex -s #2242 (comment) here

One of #2242 requirements was "Only one final file". Does this implement it too. It's unclear form the description. I've failed to fish out from the commits if it is about tar-ifying individual files only or also about packaging into single master file.

Will the docs or wiki:Processing Flow wiki updated on how to use the new format?

@TheMarex
Copy link
Member Author

@mloskot you are right, this doesn't meet the single-file requirement yet.

However, since we now use named identifiers for data, osrm-datastore does not really need to care anymore in which files the data is stored, we could figure that out automatically. That would unlock just passing a single file to osrm-datastore that is just a bunch of merged tar files. I captured this in #4978 for further discussion.

@mloskot
Copy link
Member

mloskot commented Mar 28, 2018

@TheMarex It makes sense, thanks for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants