New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WikidataFull for Panarea-Demo. Do NOT MERGE #134
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Implemented the Prefix Compression Heuristic using a Trie-based greedy algorithm - Integrated Prefix compression into the Vocabulary class (templated for prefix compression / no prefix compression) - Converted SPO and SOP to Mmap based data - faster startup time for ServerMain by not iterating over the complete Mmap Vector for statistics - Added nlohnmann/json as a submodule - IndexBuilder writes a configuration file (json) that contains information needed for the prefix compression. Can be extended for other settings and statistics - it is also possible to externalize entities from the vocabulary (e.g. Wikidata statement ids) - it is possible to pass a settings-json file to the index builder. Currently supported: Prefixes that shall be externalized (see above) - Storing statistics in MMap based Meta data - only calculate expensive statistics at index creation time - also add a simple versioning system to the meta data
- Todo until Merging: - Unit tests for compressed vocabulary ( they are currently a little bit sparse) - Better output of the converter (maybe tell what was created and what has to be renamed) - verify, that the converter works as expected (maybe write verification script);
- this boolean flag can not be chosen by the user and has to match the settings chosen at index-build time. This information is passed via the .meta-data.json file
we ALWAYS add a code prefix even if there's nothing to compress that way, the Prefix " will also be compressed (one byte of saving per literal that is not being compressed otherwise)
We have disabled externalizing parts of an uncompressed vocabulary, because there the method resolving IDs is returning a reference which also does not work with external literals.
Setting is determined automatically adapted Documentation correspond to that behavio Using -l while starting the Server will cause a warning and be ignored Changed default parameters of dockerfile to match this behavior
Not finished nor compiling
TODO: Fix, Debug and test regexes
next step : correct character classes
Some more rules tested
-- multiple backslashes before quote must be handled correctly
…into f.everythingMerged
Parallel Sorting is now the default
Seems like a unnecessary optimization and overcomplicates things Deactivate vocabularyGeneratorTest since it does not yet reflect the new status
Also added utiliztion of having for CountAvailablePredicates by automatically replacing filters on a ql:has-predicate triple with HAVING statements if the pattern trick is used.
Tried to build it on panarea, but got this:
…-- Performing Test STXXL_HAVE_SYNC_ADD_AND_FETCH - Success
CMake Warning (dev) at
/usr/share/cmake-3.10/Modules/FindOpenMP.cmake:310 (if):
if given arguments:
"TRUE"
An argument named "TRUE" appears in a conditional statement. Policy
CMP0012 is not set: if() recognizes numbers and boolean constants. Run
"cmake --help-policy CMP0012" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
Call Stack (most recent call first):
/usr/share/cmake-3.10/Modules/FindOpenMP.cmake:425
(_OPENMP_GET_SPEC_DATE)
third_party/stxxl/CMakeLists.txt:499 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found OpenMP_C: -fopenmp
CMake Warning (dev) at
/usr/share/cmake-3.10/Modules/FindOpenMP.cmake:310 (if):
if given arguments:
"TRUE"
An argument named "TRUE" appears in a conditional statement. Policy
CMP0012 is not set: if() recognizes numbers and boolean constants. Run
"cmake --help-policy CMP0012" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
Call Stack (most recent call first):
/usr/share/cmake-3.10/Modules/FindOpenMP.cmake:425
(_OPENMP_GET_SPEC_DATE)
third_party/stxxl/CMakeLists.txt:499 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found OpenMP_CXX: -fopenmp
-- Found OpenMP: TRUE
-- OpenMP found, enabling built-in parallel algorithms.
-- Looking for C++ include parallel/algorithm
-- Looking for C++ include parallel/algorithm - found
-- Using POSIX pthread library functions.
-- Looking for mallinfo
-- Looking for mallinfo - found
-- Looking for mlock
-- Looking for mlock - found
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- OpenMP found, enabling built-in parallel algorithms.
-- ---
-- CXX_FLAGS are : -Wall -Wextra -Wno-missing-field-initializers
-DGTEST_HAS_TR1_TUPLE=0 -DGTEST_USE_OWN_TR1_TUPLE=0 -fopenmp -fopenmp
-- CXX_FLAGS_RELEASE are : -O3 -DNDEBUG -O3 -DLOGLEVEL=4
-- CXX_FLAGS_DEBUG are : -g -DLOGLEVEL=4
-- IMPORTANT: Make sure you have selected the desired CMAKE_BUILD_TYPE
-- ---
CMake Error at CMakeLists.txt:101 (add_subdirectory):
The source directory
/app/third_party/re2
does not contain a CMakeLists.txt file.
-- Configuring incomplete, errors occurred!
See also "/app/build/CMakeFiles/CMakeOutput.log".
See also "/app/build/CMakeFiles/CMakeError.log".
The command '/bin/sh -c cmake -DCMAKE_BUILD_TYPE=Release .. && make -j
$(nproc)' returned a non-zero code: 1
On 18.09.2018 22:56, joka921 wrote:
Throws together all current features.
------------------------------------------------------------------------
You can view, comment on, or merge this pull request online at:
#134
Commit Summary
* Prefix Compression and faster startup time
* Updated MetaDataConverter to handle basically any old format
* Eliminated -l flag for ServerMain
* Prefix Compression now also takes into account that
* Better comment on the range of prefix characters
* Separation between different types of vocabulary.
* Fix: Removed -l from server
* First draft of some parsing rules
* some first regexes for tokenizing turtle inputs
* added many more regexes. completely untested yet. changed to wstring
* Raw tokenizer version, seems to work at least to some extent
* First draft of wstring based parser, not finished nor tested
* Some more rules for parser
* finished beta for nonTerminals
* Added many more regexes
* Parser is compiling
* Before conversion to google's re2
* Integrated re2 in Makefiles
* Compiling with re2
* Removed unused Regexes from Tokenizer
* Unit tests for some of the string literal regexes
* Added many more unit tests
* Character classes working now
* Tests and updates for prefixed names and blank nodes
* First tests for Parser
* Bugfix for getNextToken with multiple candidates
* Implemented blankNodePropertyList and additional tests
* Unit tests for all lists etc.
* Finished for today
* Mmap and getline for TurtleParser + bugfix
* Bugfix in statement rule
* Turtle Parser Beta (parses from uncompressed files)
* Possibly working TurtleParser
* Beta-Integration of Turtle parser into indexBuilder
* included bzip2 in dockerfile
* Eliminated Regexes in expensive places
* Added code that inserts triples for language relation
* Completed Functionality
* Fixed language filter for externalized languages
* Fixed the externalization character output
* Fixed test (unfortunately manually)
* Removed some unused code
* Efficient Permutation creation
* Removed outcommented code
* Added special triples like <me> rdfs:label.en ***@***.***
* Changed the format of languagePredicates
* Output every 10 million lines only for second pass
* Merge branch 'f.turtleParser' into f.tmp
* Merge branch 'f.efficientLanguageFilter' into f.everythingMerged
* Merge branch 'f.EfficientMultiplicities' into f.everythingMerged
* Minor fix for stringParse
* No testing of RE2 lib
* Fixed StringParse Bug
* Log output when parser is not successful
* Merge branch 'f.everythingMerged' of
https://github.com/joka921/QLever into f.everythingMerged
* Parallel Vocabulary Generation
* Status message
* Bugfix for last Vocabulary entry
* Last (partial) buffer also has to be handled
* Reduced Memory Footprint
* Moved Items to inner loop. There seems to be memory leaking
* This compiles. Let's see if it also works
* Bugfix: writer for ExtVec has to be finished
* Fixed Usage of OPENMP
* LanguagePredicate is no longer at beginning of vocab
* Compilation bugfix
* Current version for Hannah
File Changes
* *M* .dockerignore
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-0> (1)
* *M* .gitmodules
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-1> (3)
* *M* CMakeLists.txt
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-2> (41)
* *M* Dockerfile
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-3> (11)
* *A* src/TurtleParserMain.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-4> (21)
* *M* src/global/Constants.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-5> (11)
* *M* src/index/ConstantsIndexCreation.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-6> (3)
* *M* src/index/Index.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-7> (376)
* *M* src/index/Index.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-8> (73)
* *M* src/index/IndexBuilderMain.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-9> (8)
* *M* src/index/IndexMetaData.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-10> (2)
* *M* src/index/MetaDataHandler.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-11> (47)
* *M* src/index/Vocabulary.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-12> (6)
* *M* src/index/VocabularyGenerator.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-13> (109)
* *M* src/index/VocabularyGenerator.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-14> (14)
* *M* src/index/VocabularyImpl.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-15> (11)
* *A* src/parser/Bzip2Wrapper.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-16> (8)
* *A* src/parser/Bzip2Wrapper.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-17> (137)
* *A* src/parser/Bzip2WrapperMain.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-18> (22)
* *M* src/parser/CMakeLists.txt
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-19> (5)
* *M* src/parser/ParsedQuery.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-20> (15)
* *M* src/parser/SparqlParser.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-21> (37)
* *A* src/parser/Tokenizer.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-22> (52)
* *A* src/parser/Tokenizer.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-23> (249)
* *A* src/parser/TurtleParser.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-24> (409)
* *A* src/parser/TurtleParser.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-25> (353)
* *M* src/util/Conversions.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-26> (27)
* *M* src/util/File.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-27> (5)
* *M* src/util/MmapVector.h
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-28> (3)
* *M* test/CMakeLists.txt
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-29> (7)
* *M* test/IndexTest.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-30> (236)
* *A* test/TokenTest.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-31> (230)
* *A* test/TurtleParserTest.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-32> (243)
* *M* test/VocabularyGeneratorTest.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-33> (2)
* *M* test/VocabularyTest.cpp
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-34> (2)
* *A* third_party/re2
<https://github.com/ad-freiburg/QLever/pull/134/files#diff-35> (1)
Patch Links:
* https://github.com/ad-freiburg/QLever/pull/134.patch
* https://github.com/ad-freiburg/QLever/pull/134.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#134>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANUinEhdBrKxtWnxLQSja8I6pcgQiFMPks5ucV4UgaJpZM4Wuu4g>.
|
I just sent you a workflow that works (via the mailing list). I think you have to call |
Runs from panarea:7001 now, wow :o
http://qlever.informatik.uni-freiburg.de/?backend=8
…On 18.09.2018 23:28, joka921 wrote:
I just sent you a workflow that works (via the mailing list). I think
you have to call |git submodule init| or |git submodule update| or
both again, since the google/re2 is new in this branch.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#134 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANUinLGxGq1q3yMR1YYtTSt5ryvX2NCuks5ucWVigaJpZM4Wuu4g>.
|
…nd should not be merged into master.
Filter format had to be changed (addFilter stillj needs a GraphPattern because of the language filter
@joka921 can this PR be closed now? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Throws together all current features.