Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F.temp optimizer and sort order merged (DONT MERGE, but Hannah may test it) #326

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
cc17525
Treat Graph Patterns in Query Bodies (More) Correctly
joka921 Apr 16, 2020
797613d
fix submodule
joka921 Apr 17, 2020
a7254e4
Moved Filters back into the Graph pattern and made VALUES a separate …
joka921 Apr 18, 2020
0c410ea
This version passes e2e tests and does some things more correctly.
joka921 Apr 18, 2020
14a07b0
Made the standalone JOIN and the MERGE method of QueryPlanner really …
joka921 Apr 19, 2020
fb24636
The join and the merge method are now symmetric and work.
joka921 Apr 19, 2020
f7f6ad9
removed the duplication between join and merge. merge now calls join.
joka921 Apr 19, 2020
d0dea1e
Commented the changes in the optimize function in an exhaustive way.
joka921 Apr 19, 2020
0593eb5
Merge branch 'f.UnifySortOrderBy' into f.TempOptimizerAndSortOrderMerged
joka921 Apr 19, 2020
4bf37a0
merged the SortFix and the Optimizer FIX into one PR for Hannah to test.
joka921 Apr 19, 2020
d8a22fb
first stub of a BIND class. TODO: finish ...
joka921 Apr 19, 2020
cfb81d5
Started copying some code for the actual computeResult implementation…
joka921 Apr 20, 2020
6cc4fb0
Fixed a bug in the query planner that prevented the usage of the patt…
joka921 Apr 20, 2020
b8e3cc5
Merge branch 'f.FixOptionalAndOptimizer2' into f.TempOptimizerAndSort…
joka921 Apr 20, 2020
e6bc34d
Added debug output for the upper and lower bound of prefix_range to t…
joka921 Apr 21, 2020
7b65a12
Make this compile
joka921 Apr 21, 2020
0954ead
Fixed the case of the empty transformation string.
joka921 Apr 21, 2020
1b73464
fixed string literal.
joka921 Apr 21, 2020
2c5c474
Implemented disjunction (or) via '|| ' for prefix filters.
joka921 Apr 21, 2020
94ae97c
Merge branch 'f.prefixFilterOr' into f.TempOptimizerAndSortOrderMerged
joka921 Apr 21, 2020
5f50545
Implemented disjunction (or) via '|| ' for prefix filters.
joka921 Apr 21, 2020
04e3ca7
This is more serious with less severe bugs when there are different l…
joka921 Apr 22, 2020
e39575b
Merge branch 'f.prefixFilterOr' into f.TempOptimizerAndSortOrderMerged
joka921 Apr 22, 2020
4fed78b
Filters after the last OPTIONAl are no longer ignored.
joka921 Apr 23, 2020
a72eeee
Merge branch 'f.FixOptionalAndOptimizer2' into f.TempOptimizerAndSort…
joka921 Apr 23, 2020
4d8a7ed
A little non-compiling stub in the SparqlParser,
joka921 Apr 23, 2020
6110cd0
Unbound selected variables now appear in the correct order of the out…
joka921 Apr 23, 2020
28a3d94
Merge branch 'f.fix329' into f.TempOptimizerAndSortOrderMerged
joka921 Apr 23, 2020
f749338
Merge branch 'f.FixOptionalAndOptimizer2' into f.AS
joka921 Apr 23, 2020
7d701ce
Started to make this whole stuf
joka921 Apr 24, 2020
af6a247
Fixed the next bug.
joka921 Apr 24, 2020
d62c829
Merge branch 'f.prefixFilterOr' into f.TempOptimizerAndSortOrderMerged
joka921 Apr 24, 2020
6412bd9
Does this help the performance and is correct to optimize them?
joka921 Apr 24, 2020
444a699
Implemented BIND(<constant> as ?variable) and BIND(?variable as ?othe…
joka921 Apr 26, 2020
cc589fa
Merge branch 'f.AS' into f.TempOptimizerAndSortOrderMerged
joka921 Apr 26, 2020
e0e8493
Fixed two small inaccuracies
joka921 Apr 26, 2020
fd54533
Merge branch 'f.AS' into f.TempOptimizerAndSortOrderMerged
joka921 Apr 26, 2020
0cf9bef
Implemented a limiting allocator.
joka921 Apr 27, 2020
e83819e
Made the IdTable class use the LimitedAllocator
joka921 Apr 27, 2020
7db3b58
Integrated the memory Limit into the complete project.
joka921 Apr 27, 2020
c29922e
Merge branch 'f.memoryLimit' into f.TempOptimizerAndSortOrderMerged
joka921 Apr 27, 2020
218baee
Disabled clang-format for this dirty hacking branch.
joka921 Apr 27, 2020
ad04d48
Fixed an error int the getopt for ServerMain.cpp
joka921 Apr 27, 2020
cfde6a4
Option for the memory setting
joka921 Apr 27, 2020
1b6788e
Bugfix + logging of free bytes.
joka921 Apr 27, 2020
bbeeadd
Merge remote-tracking branch 'remotes/origin/f.memoryLimit' into f.Te…
joka921 Apr 27, 2020
2979221
Dockerfile with memory limit
joka921 Apr 27, 2020
ea4f604
Merge branch 'f.TempOptimizerAndSortOrderMerged' of https://github.co…
joka921 Apr 27, 2020
7217cf0
wrong number of bytes!
joka921 Apr 27, 2020
6d848df
Merge branch 'f.TempOptimizerAndSortOrderMerged' of https://github.co…
joka921 Apr 27, 2020
b4b31de
Started the shenanigans s.t. the vocabulary also can hold floats dire…
joka921 May 5, 2020
22ac680
This seems to work.
joka921 May 5, 2020
c2bb193
Merge branch 'f.fasterNumbers' into f.TempOptimizerAndSortOrderMerged
joka921 May 5, 2020
bd2b253
Currently HardCode a memory Limit and the size of the Wikidata vocabu…
joka921 May 5, 2020
eae6d0f
Try debugging.
joka921 May 5, 2020
a6157c3
local dockerfile
joka921 May 5, 2020
e4d9f8d
Merge branch 'f.TempOptimizerAndSortOrderMerged' of https://github.co…
joka921 May 5, 2020
3a86ea3
small fix for the inherently wrong and troublesom id-float conversion.
joka921 May 5, 2020
77d998b
Merge branch 'f.TempOptimizerAndSortOrderMerged' of https://github.co…
joka921 May 5, 2020
5145e52
small fix for the inherently wrong and troublesom id-float conversion.
joka921 May 8, 2020
5850300
Merge branch 'f.selectStar' into f.TempOptimizerAndSortOrderMerged
joka921 May 8, 2020
5bee7bf
Merge branch 'f.TempOptimizerAndSortOrderMerged' of https://github.co…
joka921 May 8, 2020
4486a88
the one decisive change
joka921 May 8, 2020
3d4c40c
First step: Force the input of the TransitivePath to be sorted.
joka921 May 10, 2020
f8890a7
Alternative Operation for the bindLeft stuff.
joka921 May 10, 2020
49fcc09
Merge remote-tracking branch 'remotes/origin/f.fasterTranspath' into …
joka921 May 10, 2020
8e354be
Fixed two small bugs, now it could do something useful.
joka921 May 10, 2020
dc2d6a7
Debug output for transitive path (why was ist stuck with the mosquitos)
joka921 May 11, 2020
96749a0
Perform caching and respect min/max distances
joka921 May 11, 2020
2978bd2
got rid of the unreadable logs.
joka921 May 11, 2020
fac3987
Inefficiency when using IdTable.resize() hopefully fixed + additional…
joka921 May 11, 2020
ee13238
Build fail fixed.
joka921 May 11, 2020
6069eda
Mixed up LeftSubCol and RightSubCol
joka921 May 11, 2020
0fdb714
Mix the two modes, to make small stuff and big stuff fast.
joka921 May 11, 2020
03774bb
Stupid stupid me.
joka921 May 11, 2020
50b4aa2
Track the stuck transpath...
joka921 May 12, 2020
3551180
Track the stuck transpath...
joka921 May 12, 2020
76d68cd
Use a parallel taskloop for the CountPredicates operation
joka921 May 14, 2020
4c8ee61
Merge branch 'f.parallelPatternTrick' into f.TempOptimizerAndSortOrde…
joka921 May 14, 2020
0f10fac
Updated dockerfile etc, continuing on local machine
joka921 May 14, 2020
1d77bbc
Better parallelization and better logging.
joka921 May 14, 2020
15be583
Make the csv and tsv work again
joka921 May 19, 2020
ca37a07
Fixed an off-by-one-error in the Floating point conversions when read…
joka921 Jun 20, 2020
a57da97
Fixed a tricky relocation error when a new batch is read.
joka921 Jun 21, 2020
ebe7a85
Better loggin on parsing errors.
joka921 Jun 21, 2020
98aba4b
Allow externalization of Literals (untested, let hannah try it out)
joka921 Jun 29, 2020
0403ab2
Literals with datatypes that are not "converted" can now also be exte…
joka921 Jun 30, 2020
83fbfb8
added file from hannah
joka921 Aug 6, 2020
e52a052
Merge remote-tracking branch 'remotes/upstream/master' into f.AS
joka921 Aug 11, 2020
7c340d6
clang-format
joka921 Aug 11, 2020
799fb5a
Integrated binding of String Constants
joka921 Aug 19, 2020
592484a
Merge branch 'f.AS' into f.TempOptimizerAndSortOrderMerged
joka921 Aug 19, 2020
5e67ad3
Only binding of positive integers works, but therefore as expected.
joka921 Aug 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
12 changes: 8 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,22 +1,25 @@
FROM ubuntu:18.04 as base
FROM ubuntu:20.04 as base
LABEL maintainer="Niklas Schnelle <schnelle@informatik.uni-freiburg.de>"
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV LC_CTYPE C.UTF-8

FROM base as builder
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y build-essential cmake clang-format-8 libsparsehash-dev libicu-dev
COPY . /app/

# Check formatting with the .clang-format project style
WORKDIR /app/
RUN misc/format-check.sh
# disable for this dirty merging branch
#RUN misc/format-check.sh

WORKDIR /app/build/
RUN cmake -DCMAKE_BUILD_TYPE=Release -DLOGLEVEL=DEBUG -DUSE_PARALLEL=true .. && make -j $(nproc) && make test
RUN cmake -DCMAKE_BUILD_TYPE=Release -DLOGLEVEL=DEBUG -DUSE_PARALLEL=true .. && make -j $(nproc)

FROM base as runtime
WORKDIR /app
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1

ARG UID=1000
Expand All @@ -33,8 +36,9 @@ EXPOSE 7001
VOLUME ["/input", "/index"]

ENV INDEX_PREFIX index
ENV MEMORY_FOR_QUERIES 70
# Need the shell to get the INDEX_PREFIX envirionment variable
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -p 7001 \"$@\"", "--"]
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -M 650000000 -p 7001 \"$@\"", "--"]

# docker build -t qlever-<name> .
# # When running with user namespaces you may need to make the index folder accessible
Expand Down
59 changes: 45 additions & 14 deletions e2e/scientists_queries.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -381,20 +381,6 @@ queries:
- contains_row: ["<Albert_Einstein>", "<Nobel_Prize_in_Physics>"]
- contains_row: ["<Albert_Fert>", "<Wolf_Prize_in_Physics>"]
- contains_row: ["<Albert_Overhauser>", "<National_Medal_of_Science_for_Physical_Science>"]
- query : having-height
type: no-text
sparql: |
SELECT (COUNT(?profession) as ?count) ?height WHERE {
?x <Profession> ?profession .
?x <Height> ?height
}
GROUP BY ?height
HAVING (?height > 1.7)
checks:
- num_rows: 32
- num_cols: 2
- selected: ["?count", "?height"]
- contains_row: ["5", "1.803"]
- query : having-predicate-religion
type: no-text
sparql: |
Expand Down Expand Up @@ -765,3 +751,48 @@ queries:
- contains_row: ["<Al_Gore>", "<Nobel_Peace_Prize>"]
- contains_row: ["<Dennis_Gabor>", "<Nobel_Prize_in_Physics>"]

- query : having-height
type: no-text
sparql: |
SELECT (COUNT(?profession) as ?count) ?height WHERE {
?x <Profession> ?profession .
?x <Height> ?height
}
GROUP BY ?height
HAVING (?height > 1.7)
checks:
- num_rows: 32
- num_cols: 2
- selected: ["?count", "?height"]
- contains_row: ["5", "1.803"]
- query : prefix-filter-disjunction
type: no-text
sparql: |
SELECT ?s WHERE {
?s <is-a> <Scientist> .
FILTER ((regex(?s, "^<Albert") || regex(?s, "^<Marie"))) .
}
checks:
- num_rows: 106
- num_cols: 1
- selected: ["?s"]
- contains_row: ["<Albert_Einstein>"]
- contains_row: ["<Albert_Fert>"]
- contains_row: ["<Albert_Overhauser>"]
- contains_row: ["<Marie_Curie>"]
- query : prefix-filter-disjunction-different-lhs
type: no-text
sparql: |
SELECT ?s ?a WHERE {
?s <is-a> <Scientist> .
?s <Award_Won> ?a .
FILTER (regex(?s, "^<Albert") || regex(?a, "^<Nobel"))
}
checks:
- num_rows: 579
- num_cols: 2
- selected: ["?s", "?a"]
- contains_row: ["<Albert_Einstein>", "<Nobel_Prize_in_Physics>"]
- contains_row: ["<Albert_Fert>", "<Wolf_Prize_in_Physics>"]
- contains_row: ["<Albert_Overhauser>", "<National_Medal_of_Science_for_Physical_Science>"]
- contains_row: ["<Andre_Geim>", "<Nobel_Prize_in_Physics>"]
24 changes: 21 additions & 3 deletions src/ServerMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,13 @@ using std::vector;
struct option options[] = {{"help", no_argument, NULL, 'h'},
{"index", required_argument, NULL, 'i'},
{"worker-threads", required_argument, NULL, 'j'},
{"memory-for-queries", required_argument, NULL, 'm'},
{"on-disk-literals", no_argument, NULL, 'l'},
{"port", required_argument, NULL, 'p'},
{"no-patterns", no_argument, NULL, 'P'},
{"no-pattern-trick", no_argument, NULL, 'T'},
{"text", no_argument, NULL, 't'},
{"max-vocab-size", no_argument, NULL, 'M'},
{NULL, 0, NULL, 0}};

void printUsage(char* execName) {
Expand All @@ -48,6 +50,11 @@ void printUsage(char* execName) {
<< "The location of the index files." << endl;
cout << " " << std::setw(20) << "p, port" << std::setw(1) << " "
<< "The port on which to run the web interface." << endl;
cout << " " << std::setw(20) << "m, memory-for-queries" << std::setw(1)
<< " "
<< "The number of GB that may be used by query (intermediate) results, "
"including the cache"
<< endl;
cout << " " << std::setw(20) << "no-patterns" << std::setw(1) << " "
<< "Disable the use of patterns. This disables ql:has-predicate."
<< endl;
Expand All @@ -59,6 +66,8 @@ void printUsage(char* execName) {
<< "Enables the usage of text." << endl;
cout << " " << std::setw(20) << "j, worker-threads" << std::setw(1) << " "
<< "Sets the number of worker threads to use" << endl;
cout << " " << std::setw(20) << "M, max-vocab-size" << std::setw(1) << " "
<< "Must be bigger than wc -l on the vocabulary file, else will crash" << endl;
cout.copyfmt(coutState);
}

Expand All @@ -79,11 +88,14 @@ int main(int argc, char** argv) {
int numThreads = 1;
bool usePatterns = true;
bool enablePatternTrick = true;
size_t maxVocabSize = 1000000;

size_t memLimit = MAX_MEM_FOR_QUERIES_IN_GB;

optind = 1;
// Process command line arguments.
while (true) {
int c = getopt_long(argc, argv, "i:p:j:tauhmlT", options, NULL);
int c = getopt_long(argc, argv, "i:p:j:tauhm:lTM:", options, NULL);
if (c == -1) break;
switch (c) {
case 'i':
Expand All @@ -104,6 +116,12 @@ int main(int argc, char** argv) {
case 'j':
numThreads = atoi(optarg);
break;
case 'm':
memLimit = atoi(optarg);
break;
case 'M':
maxVocabSize = atoi(optarg);
break;
case 'h':
printUsage(argv[0]);
exit(0);
Expand Down Expand Up @@ -142,8 +160,8 @@ int main(int argc, char** argv) {
cout << "Set locale LC_CTYPE to: " << locale << endl;

try {
Server server(port, numThreads);
server.initialize(index, text, usePatterns, enablePatternTrick);
Server server(port, numThreads, memLimit * 1 << 30u);
server.initialize(index, text, usePatterns, enablePatternTrick, maxVocabSize);
server.run();
} catch (const std::exception& e) {
// This code should never be reached as all exceptions should be handled
Expand Down
5 changes: 4 additions & 1 deletion src/SparqlEngineMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,10 @@ int main(int argc, char** argv) {
index.addTextFromOnDiskIndex();
}

QueryExecutionContext qec(index, engine, &cache, &pinnedSizes);
ad_utility::LimitedAllocator<Id> allocator{
ad_utility::makeAllocationState(MAX_MEM_FOR_QUERIES_IN_GB)};

QueryExecutionContext qec(index, engine, &cache, &pinnedSizes, allocator);
if (costFactosFileName.size() > 0) {
qec.readCostFactorsFromTSVFile(costFactosFileName);
}
Expand Down
4 changes: 3 additions & 1 deletion src/WriteIndexListsMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,9 @@ int main(int argc, char** argv) {
Engine engine;
SubtreeCache cache(NOF_SUBTREES_TO_CACHE);
PinnedSizes pinnedSizes;
QueryExecutionContext qec(index, engine, &cache, &pinnedSizes);
ad_utility::LimitedAllocator<Id> allocator{
ad_utility::makeAllocationState(MAX_MEM_FOR_QUERIES_IN_GB)};
QueryExecutionContext qec(index, engine, &cache, &pinnedSizes, allocator);
ParsedQuery q;
if (!freebase) {
q = SparqlParser("SELECT ?x WHERE {?x <is-a> <Scientist>}").parse();
Expand Down