New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DONOTMERGE for Summer School (everything in memory, only minimally needed files) #135
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Implemented the Prefix Compression Heuristic using a Trie-based greedy algorithm - Integrated Prefix compression into the Vocabulary class (templated for prefix compression / no prefix compression) - Converted SPO and SOP to Mmap based data - faster startup time for ServerMain by not iterating over the complete Mmap Vector for statistics - Added nlohnmann/json as a submodule - IndexBuilder writes a configuration file (json) that contains information needed for the prefix compression. Can be extended for other settings and statistics - it is also possible to externalize entities from the vocabulary (e.g. Wikidata statement ids) - it is possible to pass a settings-json file to the index builder. Currently supported: Prefixes that shall be externalized (see above) - Storing statistics in MMap based Meta data - only calculate expensive statistics at index creation time - also add a simple versioning system to the meta data
- Todo until Merging: - Unit tests for compressed vocabulary ( they are currently a little bit sparse) - Better output of the converter (maybe tell what was created and what has to be renamed) - verify, that the converter works as expected (maybe write verification script);
- this boolean flag can not be chosen by the user and has to match the settings chosen at index-build time. This information is passed via the .meta-data.json file
we ALWAYS add a code prefix even if there's nothing to compress that way, the Prefix " will also be compressed (one byte of saving per literal that is not being compressed otherwise)
We have disabled externalizing parts of an uncompressed vocabulary, because there the method resolving IDs is returning a reference which also does not work with external literals.
Setting is determined automatically adapted Documentation correspond to that behavio Using -l while starting the Server will cause a warning and be ignored Changed default parameters of dockerfile to match this behavior
Not finished nor compiling
TODO: Fix, Debug and test regexes
next step : correct character classes
Some more rules tested
-- multiple backslashes before quote must be handled correctly
…into f.everythingMerged
Parallel Sorting is now the default
Seems like a unnecessary optimization and overcomplicates things Deactivate vocabularyGeneratorTest since it does not yet reflect the new status
Also added utiliztion of having for CountAvailablePredicates by automatically replacing filters on a ql:has-predicate triple with HAVING statements if the pattern trick is used.
…nd should not be merged into master.
Filter format had to be changed (addFilter stillj needs a GraphPattern because of the language filter
@joka921 can this PR be closed now? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.