-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat union #151
Feat union #151
Conversation
Since this adds End-to-End Tests, you could give #150 a quick review and we could merge this and change the new tests on a rebase. |
So it looks like we have a test failure and also the following compiler warning still persits:
|
2f48760
to
698855b
Compare
The test failure is really strange. I suddenly got errors in that test locally due to the order of nodes of the triple graph being permuted, which then broke the string based comparison used in the test. |
@floriankramer I've seen something similar when changing the implementation of the hash map. When hash map iteration is randomized (as for example in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being able to add a complex new feature like UNION
now and with relatively manageable changes is a really great sign that we're on a good track with QLever. Well done! I'm still a bit confused about a few details but overall this already looks quite good.
e2e/scientists_queries.yaml
Outdated
sparql: | | ||
SELECT ?a ?t WHERE { | ||
{?a <Height> ?t .} | ||
uNiON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if one of the UNION
keywords is written in weird case that's nice to test case insensitivity but then the other one should use the normal all uppercase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also since I find UNION
in SPARQL a bit weird maybe a comment to the spec could be added in a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the case of the second e2e test UNION keyword and added an explanation for UNIONS and OPTIONAL to the README. I can add a comment into the code as well, if you think that's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@floriankramer I think you might have forgotten to push those changes or are you still working on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was still working on the changes, I pushed them just now.
@@ -72,8 +76,40 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) const { | |||
|
|||
LOG(DEBUG) << "Creating execution plan.\n"; | |||
childPlans.clear(); | |||
for (const ParsedQuery::GraphPattern* child : pattern->_children) { | |||
childPlans.push_back(&patternPlans[child->_id]); | |||
std::vector<SubtreePlan> unionPlans; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is only done at the top level I'd guess we currently don't support nested UNION
, though these are also part of the spec? How about OPTIONAL
inside UNION
? Honestly by now I'm quite confused about what actually happens in createExecutionTree()
and how ParsedQuery::GraphPatternOperation
and SubtreePlan
interact. Any chance we can split this in a couple of more manageable methods? I mean it's obviously not your fault that this code is now quite hard to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said I think we could also do the simplification in a separate PR but we should probably do something to make these methods more understandable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query is parsed recursively, so any nesting of unions and optionals is parsed corectly. The result of that are ParsedQueries
, which contain a tree of alternating GraphPatterns and GraphPatternOperations, where the GraphPattern represents a set of triples, filters and further oeprations while the GraphPatternOperation represents either an OPTIONAL or a UNION (and in the future possibly also a subquery). I added the additional layer of GraphPatternOperations as UNIONs require two GraphPatterns a children while OPTIONAL only requires one, but I wanted a unified abstraction for the two, to keep the query planning simpler.
The optimizer then creates a topological ordering of that tree, effectively skipping over GraphPatternOperations and only looking at the layers containing GraphPatterns. Every element of that ordering is then optimized separately, starting at the back. This way, whenever a GraphPattern is optimized all of its children have already been optimized. When optimizing a GraphPattern all GraphPatternOperations SubtreePlans are added to the childPlans
vector, and then effectively treated as if they were another node in the triple graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more comments for now, but I can definitely try to make the method easier to read and understand in a later PR.
src/engine/Union.cpp
Outdated
} | ||
|
||
size_t Union::getResultWidth() const { | ||
// the width depends on how many columns use the same variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean the width is the number of unique columns? Maybe reformulate that comment. Also what's a columnOrigin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the comment. The columnOrigin vector stores from what column in each of the subtrees the values for a column in the result come.
src/engine/Union.cpp
Outdated
vector<size_t> Union::resultSortedOn() const { return {}; } | ||
|
||
std::unordered_map<string, size_t> Union::getVariableColumns() const { | ||
std::unordered_map<string, size_t> variableColumns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ad_utility::HashMap
though that also depends on whether we want to merge #137
src/engine/Union.h
Outdated
vector<Res>* res, const vector<L>* left, const vector<R>* right, | ||
const std::vector<std::array<size_t, 2>>& columnOrigins); | ||
|
||
/*template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unused code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/engine/Union.h
Outdated
|
||
virtual vector<size_t> resultSortedOn() const override; | ||
|
||
std::unordered_map<string, size_t> getVariableColumns() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above about ad_utility::HashMap
currentPattern->_children.back()->_id = query._numGraphPatterns; | ||
size_t cb = ad_utility::findClosingBracket(inner, ob, '{', '}'); | ||
currentPattern->_children.push_back( | ||
new ParsedQuery::GraphPatternOperation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are these freed? Can't find it in GraphPattern
and that has std::vector
of raw pointers not std::shared_ptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GraphPattern destructor deletes them.
currentPattern->_children.push_back( | ||
new ParsedQuery::GraphPatternOperation( | ||
ParsedQuery::GraphPatternOperation::Type::OPTIONAL, | ||
{new ParsedQuery::GraphPattern()})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and where are these freed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GraphPatternOperation destructor deletes them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I propose we merge the HashMap/HashSet changes first, then rebase this on them and also change the unordered_map
uses
Also added a Union operation that is not yet used.
The changes are necessary for the test to complete locally but seem to break it for travis.
0f6d746
to
6d9edb5
Compare
This pr adds support for the UNION feature.