New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor dictionary properties #109

Merged
merged 29 commits into from Jan 14, 2019

Conversation

Projects
None yet
3 participants
@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented Nov 28, 2018

re-factor the file properties (header parts of a keyvi file to store entry points, type information etc.) into 2 property classes (for the FSA part and the value part), both for reading and writing.

json parsing/writing now uses rapidjson instead of boost property trees.

fixes #80

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 28, 2018

Coverage Status

Coverage increased (+0.1%) to 95.337% when pulling 8ca2378 on hendrikmuhs:refactor-dictionary-properties into 4fcc8d8 on KeyviDev:master.

@hendrikmuhs

This comment has been minimized.

Copy link
Contributor Author

hendrikmuhs commented Dec 7, 2018

The re-factored code is now writing the same way as property trees, meaning it writes all numbers as strings. Looks ugly but keeps backwards compatibility, which is more important. It would be a good idea to avoid this string conversion, but if we do, we should revisit the whole format.

I plan to implement some more tests before this is ready.

@hendrikmuhs hendrikmuhs removed the wip label Dec 18, 2018

@@ -295,7 +290,12 @@ class Generator final {
}

stream << KEYVI_FILE_MAGIC;
WriteHeader(stream);

keyvi::dictionary::DictionaryProperties p(2, start_state_, number_of_keys_added_, number_of_states_,

This comment has been minimized.

@erik-cliqz

erik-cliqz Dec 19, 2018

Collaborator

The 2 here refers to the version. I think it makes sense to create a constant somewhere with the current version.

}

private:
size_t offset_ = 0;

This comment has been minimized.

@erik-cliqz

erik-cliqz Dec 19, 2018

Collaborator

What does offset_ refer to?

This comment has been minimized.

@hendrikmuhs

hendrikmuhs Dec 19, 2018

Author Contributor

a keyvi file has basically this format:

[MAGIC ("KEYVIFSA")]
[HEADER overall + keys ( -> dictionary_properties)]
[FSA labels (the labels of transitions in an FSA)]
[FSA transitions (basically pointers that connect the states)]
[HEADER values ( -> value_store_properties)]
[Values (defined by value store implementations)]

offset_ is the pointer/file offset where the last part [Values] start. The arithmetic is straight forward, if you want to get the value at position 10, offset_ + 10 would be the position in the file. (In practice we memory map the file starting at offset_ ending at offset_ + size_)

pt.put("values", std::to_string(number_of_values_));
pt.put("unique_values", std::to_string(number_of_unique_values_));
// todo: preserve compression
ValueStoreProperties properties(0, values_buffer_size_, number_of_values_, number_of_unique_values_, std::string());

This comment has been minimized.

@erik-cliqz

erik-cliqz Dec 19, 2018

Collaborator

Try to use either "" or std::string() consistently.

pt.put("size", std::to_string(values_buffer_size_));
pt.put("values", std::to_string(number_of_values_));
pt.put("unique_values", std::to_string(number_of_unique_values_));
// TODO(hendrik) write compressor

This comment has been minimized.

@erik-cliqz

erik-cliqz Dec 19, 2018

Collaborator

Just checking: This is not required before merging?

This comment has been minimized.

@hendrikmuhs

hendrikmuhs Dec 19, 2018

Author Contributor

no, I noticed this being broken, but it is broken already. We basically loose information about applied compression when merging 2 files. But that information has no functional impact, it's only meta-information to know whether compression has been enabled when compiling or not.

@hendrikmuhs hendrikmuhs merged commit f1f2bbf into KeyviDev:master Jan 14, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.1%) to 95.337%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment