DONOTMERGE for Summer School (everything in memory, only minimally needed files) #135

joka921 · 2018-09-21T15:54:44Z

No description provided.

- Implemented the Prefix Compression Heuristic using a Trie-based greedy algorithm - Integrated Prefix compression into the Vocabulary class (templated for prefix compression / no prefix compression) - Converted SPO and SOP to Mmap based data - faster startup time for ServerMain by not iterating over the complete Mmap Vector for statistics - Added nlohnmann/json as a submodule - IndexBuilder writes a configuration file (json) that contains information needed for the prefix compression. Can be extended for other settings and statistics - it is also possible to externalize entities from the vocabulary (e.g. Wikidata statement ids) - it is possible to pass a settings-json file to the index builder. Currently supported: Prefixes that shall be externalized (see above) - Storing statistics in MMap based Meta data - only calculate expensive statistics at index creation time - also add a simple versioning system to the meta data

- Todo until Merging: - Unit tests for compressed vocabulary ( they are currently a little bit sparse) - Better output of the converter (maybe tell what was created and what has to be renamed) - verify, that the converter works as expected (maybe write verification script);

- this boolean flag can not be chosen by the user and has to match the settings chosen at index-build time. This information is passed via the .meta-data.json file

we ALWAYS add a code prefix even if there's nothing to compress that way, the Prefix " will also be compressed (one byte of saving per literal that is not being compressed otherwise)

We have disabled externalizing parts of an uncompressed vocabulary, because there the method resolving IDs is returning a reference which also does not work with external literals.

Setting is determined automatically adapted Documentation correspond to that behavio Using -l while starting the Server will cause a warning and be ignored Changed default parameters of dockerfile to match this behavior

Not finished nor compiling

TODO: Fix, Debug and test regexes

next step : correct character classes

Some more rules tested

-- multiple backslashes before quote must be handled correctly

…into f.everythingMerged

Parallel Sorting is now the default

Seems like a unnecessary optimization and overcomplicates things Deactivate vocabularyGeneratorTest since it does not yet reflect the new status

Also added utiliztion of having for CountAvailablePredicates by automatically replacing filters on a ql:has-predicate triple with HAVING statements if the pattern trick is used.

…nd should not be merged into master.

Filter format had to be changed (addFilter stillj needs a GraphPattern because of the language filter

niklas88 · 2018-10-11T09:35:11Z

@joka921 can this PR be closed now?

joka921 added 30 commits August 28, 2018 13:48

Eliminated -l flag for ServerMain

d3f7cce

- this boolean flag can not be chosen by the user and has to match the settings chosen at index-build time. This information is passed via the .meta-data.json file

Prefix Compression now also takes into account that

0025d28

we ALWAYS add a code prefix even if there's nothing to compress that way, the Prefix " will also be compressed (one byte of saving per literal that is not being compressed otherwise)

Better comment on the range of prefix characters

d51b3b3

Separation between different types of vocabulary.

da0ac97

We have disabled externalizing parts of an uncompressed vocabulary, because there the method resolving IDs is returning a reference which also does not work with external literals.

Fix: Removed -l from server

3ccb9ec

Setting is determined automatically adapted Documentation correspond to that behavio Using -l while starting the Server will cause a warning and be ignored Changed default parameters of dockerfile to match this behavior

First draft of some parsing rules

7625409

some first regexes for tokenizing turtle inputs

87562cd

added many more regexes. completely untested yet. changed to wstring

691ce39

Raw tokenizer version, seems to work at least to some extent

16402d3

First draft of wstring based parser, not finished nor tested

f6ee5b0

Some more rules for parser

3e2d22a

finished beta for nonTerminals

c1c12e0

Added many more regexes

7b608a8

Not finished nor compiling

Parser is compiling

d3a095f

Before conversion to google's re2

84cbf93

Integrated re2 in Makefiles

fe8fc8f

Compiling with re2

b5aadfb

TODO: Fix, Debug and test regexes

Removed unused Regexes from Tokenizer

d331648

Unit tests for some of the string literal regexes

eb5e3db

Added many more unit tests

d277610

next step : correct character classes

Character classes working now

3d28807

Tests and updates for prefixed names and blank nodes

a017d3b

First tests for Parser

3b33561

Bugfix for getNextToken with multiple candidates

1fd33ed

Some more rules tested

Implemented blankNodePropertyList and additional tests

9858058

Unit tests for all lists etc.

ab80452

Finished for today

61d48ae

Mmap and getline for TurtleParser + bugfix

bf7e8f8

joka921 and others added 27 commits September 7, 2018 11:46

No testing of RE2 lib

fe7854d

Fixed StringParse Bug

16034eb

-- multiple backslashes before quote must be handled correctly

Log output when parser is not successful

7a88bbf

Merge branch 'f.everythingMerged' of https://github.com/joka921/QLever …

5497854

…into f.everythingMerged

Parallel Vocabulary Generation

14a06bf

Status message

7eb7d77

Bugfix for last Vocabulary entry

c998268

Last (partial) buffer also has to be handled

5b01950

Reduced Memory Footprint

223438d

Moved Items to inner loop. There seems to be memory leaking

219651d

This compiles. Let's see if it also works

2194cc2

Bugfix: writer for ExtVec has to be finished

45bd1bd

Fixed Usage of OPENMP

ff82ec9

Parallel Sorting is now the default

LanguagePredicate is no longer at beginning of vocab

f774a52

Seems like a unnecessary optimization and overcomplicates things Deactivate vocabularyGeneratorTest since it does not yet reflect the new status

Compilation bugfix

5e322dc

Added support for HAVING. Fixes ad-freiburg#104.

7f509f8

Also added utiliztion of having for CountAvailablePredicates by automatically replacing filters on a ql:has-predicate triple with HAVING statements if the pattern trick is used.

Replaced resultSortedBy column with a vector of columns.

f2a5c8f

Added group by operations to the optimizer.

255e17b

Fixed GroupBy::computeSortColumns returning wrong columns

9e9ebb5

Current version for Hannah

ed601a7

Added multithreading as default to Dockerfile

912780a

Added a prefix filter

e688afb

HACK: added support for filters on verbatim columns. This is a hack a…

53280d0

…nd should not be merged into master.

Merge branch 'florians-prefix-filter' into f.singlePass

2590586

Filter format had to be changed (addFilter stillj needs a GraphPattern because of the language filter

Reverted everything to in-memory metadata and removed all tmp files

fd325f9

Also delete partial files

d89d59d

No more "unneccessary files"

b8a85a2

niklas88 closed this Oct 31, 2018

joka921 deleted the f.singlePassMemoryPermuts branch August 23, 2022 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DONOTMERGE for Summer School (everything in memory, only minimally needed files) #135

DONOTMERGE for Summer School (everything in memory, only minimally needed files) #135

joka921 commented Sep 21, 2018

niklas88 commented Oct 11, 2018

DONOTMERGE for Summer School (everything in memory, only minimally needed files) #135

DONOTMERGE for Summer School (everything in memory, only minimally needed files) #135

Conversation

joka921 commented Sep 21, 2018

niklas88 commented Oct 11, 2018