New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat optional #43
Feat optional #43
Conversation
Will start reviewing now, in the meantime can you rebase against master? This should pull in the
Note that for rebasing |
…er by for simple queries
2b29478
to
39e2425
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this work, I'm really impressed how fast you got a grasp of the code base. I don't really have the expertise to review this adequately to be honest. So added only some nitpicking comments.
Do I see this right, that we don't have any end-to-end tests that actually parse⇒plan⇒execute some queries? Maybe we could add those as an external script using SparqlEngineMain. I found some query lists that look like an attempt of this in misc/
but some look like they still use an older text search syntax so they are definitely not run
@@ -88,20 +123,34 @@ int main(int argc, char** argv) { | |||
case 'a': | |||
allPermutations = true; | |||
break; | |||
case 'h': | |||
printUsage(argv[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning this up!
cerr << "ERROR: No index specified, but an index is required." << endl; | ||
} | ||
if (port == -1) { | ||
cerr << "ERROR: No port specified, but the port is required." << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/parser/ParsedQuery.cpp
Outdated
os << "{\n"; | ||
for (size_t i = 0; i < _whereClauseTriples.size(); ++i) { | ||
os << "\n\t" << _whereClauseTriples[i].asString(); | ||
if (i + 1 < _whereClauseTriples.size()) { os << ','; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't like the if in the loop I personally prefer the following pattern.
for (size_t i = 0; i < vec.size() - 1; i++) {
out << vec[i] << ", ";
}
out << vec.back();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to your definitely nicer style in the latest commit, and added some proper indentation to the output.
@@ -121,7 +125,46 @@ void SparqlParser::parseWhere(const string& str, ParsedQuery& query) { | |||
while (start < inner.size()) { | |||
size_t k = start; | |||
while (inner[k] == ' ' || inner[k] == '\t' || inner[k] == '\n') { ++k; } | |||
if (inner[k] == 'F') { | |||
if (inner[k] == 'O' || inner[k] == 'o') { | |||
if (inner.substr(k, 8) == "OPTIONAL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we case insensitve or only all-upper or all-lower? Would maybe prefer a toLower() and then check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only all upper and all lower case are supported. The standard says most keywords are case insensitive (apart from the keyword 'a' which may only be lower case). It may be cleaner to change that in another pull request /. commit though.
src/engine/Engine.h
Outdated
jcls_a |= (1 << jc[0]); | ||
jcls_b |= (1 << jc[1]); | ||
} | ||
// avoid compiler warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which warning exactly and how?
src/engine/Engine.h
Outdated
template<typename A, typename B, typename R, int K> | ||
static void optionalJoin(const A& a, const B& b, | ||
bool aOptional, bool bOptional, | ||
const vector<array<size_t, 2>>& jcls, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you provide more desriptive names here? I have no idea what jcls
stands for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced all mentions of jcl and jcls with joinColumn and joinColumns respectively.
src/engine/Engine.h
Outdated
for (size_t ib = 0; ib < b.size(); ib++) { | ||
R res = newOptionalResult<R, K>()(resultSize); | ||
createOptionalResult<A, B, R, true, false>( | ||
static_cast<typename A::value_type*>(0), &b[ib], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullptr
should work without a cast, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally relied on the compiler deducing the template arguments, thats why I had to cast the nullptrs to be pointers of the right type. Replaced them with actual nullptrs now.
@@ -60,11 +94,12 @@ int main(int argc, char** argv) { | |||
bool interactive = false; | |||
bool onDiskLiterals = false; | |||
bool allPermutations = false; | |||
bool optimizeOptionals = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hannah said this currently slows down queries? I assume this only happens if there is an OPTIONAL? Is this the right default at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slowdown happens due to the query optimizer effectively having to optimize a query with another triple, representing the optional part. For queries with many triples (around 9 or more) the slowdown starts becoming noticeable. For queries with fewer triples the change in speed is minmal, and using the optimizer can of course lead to significantly faster execution time. If the default use case includes queries which, on average, have many triples changing the setting to false by default would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this case is actually more non-trivial than I thought.
src/engine/Engine.h
Outdated
} | ||
|
||
// add sentinels | ||
A& l1 = const_cast<A&>(a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should explain why it's ok to cast away the const here
return os.str(); | ||
} | ||
|
||
// Used to generate all up to 125 combinations of left, right and result size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allowed C++14/17 constexpr
with loops this could be done much nicer, right? If yes, then we probably should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. I'll have to look into that later today or tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bit of googling for the uses of constexpr in c++ 14 and 17, as I've never used it extensively before.
As far as I could find out, there is no way, apart from the same recursion that is implemented with the templates right now, to create a constexpr loop. As the problem with finding the right template parameters is, that we need to transform a runtime value with a limited number of values into a compile time constant, I didn't find a nicer solution using constexpr, as any assignment of a constexpr would depend on a runtime value.
Was there a specific trick you were thinking of, or is there a feature of constexpr I didn't find, that would help with these problems?
src/engine/Engine.h
Outdated
@@ -389,25 +389,25 @@ class Engine { | |||
static void createOptionalResult(const typename A::value_type* a, | |||
const typename B::value_type* b, | |||
size_t sizeA, | |||
int jcls_a, int jcls_b, | |||
const std::vector<size_t>& jclAToB, | |||
int joinColumBitmap_a, int joinColumBitmap_b, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's "column" so use either "..Col" or "..Column"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Added support for OPTIONAL query parts. Fixes #22
I changed the query parsing to be able to recursively parse optional parts, storing them in a new GraphPattern struct. The QueryPlanner then adds optional parts to the optimiations dp table during the seeding phase. There is a new OptionalJoin Operation, that joins two ResultTables on any number of columns, adding new ID_NO_VALUE constants to represent fields empty due to optional sides.