Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DONOTMERGE for Summer School (everything in memory, only minimally needed files) #135

Closed
wants to merge 78 commits into from

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Sep 21, 2018

No description provided.

- Implemented the Prefix Compression Heuristic using a Trie-based
  greedy algorithm

- Integrated Prefix compression into the Vocabulary class
  (templated for prefix compression / no prefix compression)

- Converted SPO and SOP to Mmap based data

- faster startup time for ServerMain by not iterating over the complete
  Mmap Vector for statistics

- Added nlohnmann/json as a submodule
- IndexBuilder writes a configuration file (json) that contains
  information needed for the prefix compression. Can be extended for
  other settings and statistics

- it is also possible to externalize entities from the vocabulary (e.g.
  Wikidata statement ids)

- it is possible to pass a settings-json file to the index builder.
  Currently supported: Prefixes that shall be externalized (see above)

 - Storing statistics in MMap based Meta data
   - only calculate expensive statistics at index creation time
- also add a simple versioning system to the meta data
- Todo until Merging:
- Unit tests for compressed vocabulary ( they are  currently a little
bit  sparse)
- Better output of the converter (maybe  tell what was created and what
has to be renamed)

- verify, that the converter works as expected (maybe write verification
script);
- this boolean flag can not be chosen by the user and has to match the
  settings chosen at index-build time. This information is passed via
  the .meta-data.json file
we ALWAYS add a code prefix even if there's nothing to compress

that way, the Prefix " will also be compressed (one byte of saving per
literal that is not being compressed otherwise)
We have disabled externalizing parts of an uncompressed vocabulary,
because there the method resolving IDs is returning a reference which
also does not work with external literals.
Setting is determined automatically
adapted Documentation correspond to that behavio
Using -l while starting the Server will cause a warning and be ignored
Changed default parameters of dockerfile to match this behavior
Not finished nor compiling
TODO:
Fix, Debug and test  regexes
next step : correct character classes
joka921 and others added 27 commits September 7, 2018 11:46
-- multiple backslashes before quote must be handled correctly
Parallel Sorting is now the default
Seems like a unnecessary optimization and overcomplicates things

Deactivate vocabularyGeneratorTest since it does not yet reflect the new
status
Also added utiliztion of having for CountAvailablePredicates by
automatically replacing filters on a ql:has-predicate triple with
HAVING statements if the pattern trick is used.
Filter format had to be changed (addFilter stillj needs a GraphPattern
because of the language filter
@niklas88
Copy link
Member

@joka921 can this PR be closed now?

@niklas88 niklas88 closed this Oct 31, 2018
@joka921 joka921 deleted the f.singlePassMemoryPermuts branch August 23, 2022 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants