Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat union #151

Merged
merged 6 commits into from
Nov 18, 2018
Merged

Feat union #151

merged 6 commits into from
Nov 18, 2018

Conversation

floriankramer
Copy link
Member

This pr adds support for the UNION feature.

@niklas88
Copy link
Member

Since this adds End-to-End Tests, you could give #150 a quick review and we could merge this and change the new tests on a rebase.

@niklas88
Copy link
Member

So it looks like we have a test failure and also the following compiler warning still persits:

/app/src/parser/ParsedQuery.cpp: In member function ‘std::__cxx11::string SparqlFilter::asString() const’:
/app/src/parser/ParsedQuery.cpp:82:10: warning: enumeration value ‘PREFIX’ not handled in switch [-Wswitch]
   switch (_type) {

@floriankramer floriankramer force-pushed the feat_union branch 2 times, most recently from 2f48760 to 698855b Compare November 14, 2018 14:27
@floriankramer
Copy link
Member Author

The test failure is really strange. I suddenly got errors in that test locally due to the order of nodes of the triple graph being permuted, which then broke the string based comparison used in the test.
I've reverted the test to its current master branch state which will hopefully stop the tests from failing for travis, but the better solution would probably to make them order invariant.

@niklas88
Copy link
Member

@floriankramer I've seen something similar when changing the implementation of the hash map. When hash map iteration is randomized (as for example in abseil::flat_map) the QueryPlannerTest fails sometimes.

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to add a complex new feature like UNION now and with relatively manageable changes is a really great sign that we're on a good track with QLever. Well done! I'm still a bit confused about a few details but overall this already looks quite good.

sparql: |
SELECT ?a ?t WHERE {
{?a <Height> ?t .}
uNiON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if one of the UNION keywords is written in weird case that's nice to test case insensitivity but then the other one should use the normal all uppercase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since I find UNION in SPARQL a bit weird maybe a comment to the spec could be added in a comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the case of the second e2e test UNION keyword and added an explanation for UNIONS and OPTIONAL to the README. I can add a comment into the code as well, if you think that's necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@floriankramer I think you might have forgotten to push those changes or are you still working on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was still working on the changes, I pushed them just now.

@@ -72,8 +76,40 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) const {

LOG(DEBUG) << "Creating execution plan.\n";
childPlans.clear();
for (const ParsedQuery::GraphPattern* child : pattern->_children) {
childPlans.push_back(&patternPlans[child->_id]);
std::vector<SubtreePlan> unionPlans;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only done at the top level I'd guess we currently don't support nested UNION, though these are also part of the spec? How about OPTIONAL inside UNION? Honestly by now I'm quite confused about what actually happens in createExecutionTree() and how ParsedQuery::GraphPatternOperation and SubtreePlan interact. Any chance we can split this in a couple of more manageable methods? I mean it's obviously not your fault that this code is now quite hard to understand

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said I think we could also do the simplification in a separate PR but we should probably do something to make these methods more understandable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query is parsed recursively, so any nesting of unions and optionals is parsed corectly. The result of that are ParsedQueries, which contain a tree of alternating GraphPatterns and GraphPatternOperations, where the GraphPattern represents a set of triples, filters and further oeprations while the GraphPatternOperation represents either an OPTIONAL or a UNION (and in the future possibly also a subquery). I added the additional layer of GraphPatternOperations as UNIONs require two GraphPatterns a children while OPTIONAL only requires one, but I wanted a unified abstraction for the two, to keep the query planning simpler.
The optimizer then creates a topological ordering of that tree, effectively skipping over GraphPatternOperations and only looking at the layers containing GraphPatterns. Every element of that ordering is then optimized separately, starting at the back. This way, whenever a GraphPattern is optimized all of its children have already been optimized. When optimizing a GraphPattern all GraphPatternOperations SubtreePlans are added to the childPlans vector, and then effectively treated as if they were another node in the triple graph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more comments for now, but I can definitely try to make the method easier to read and understand in a later PR.

}

size_t Union::getResultWidth() const {
// the width depends on how many columns use the same variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean the width is the number of unique columns? Maybe reformulate that comment. Also what's a columnOrigin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the comment. The columnOrigin vector stores from what column in each of the subtrees the values for a column in the result come.

vector<size_t> Union::resultSortedOn() const { return {}; }

std::unordered_map<string, size_t> Union::getVariableColumns() const {
std::unordered_map<string, size_t> variableColumns(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ad_utility::HashMap though that also depends on whether we want to merge #137

vector<Res>* res, const vector<L>* left, const vector<R>* right,
const std::vector<std::array<size_t, 2>>& columnOrigins);

/*template <typename T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


virtual vector<size_t> resultSortedOn() const override;

std::unordered_map<string, size_t> getVariableColumns() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above about ad_utility::HashMap

currentPattern->_children.back()->_id = query._numGraphPatterns;
size_t cb = ad_utility::findClosingBracket(inner, ob, '{', '}');
currentPattern->_children.push_back(
new ParsedQuery::GraphPatternOperation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these freed? Can't find it in GraphPattern and that has std::vector of raw pointers not std::shared_ptr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GraphPattern destructor deletes them.

currentPattern->_children.push_back(
new ParsedQuery::GraphPatternOperation(
ParsedQuery::GraphPatternOperation::Type::OPTIONAL,
{new ParsedQuery::GraphPattern()}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and where are these freed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GraphPatternOperation destructor deletes them

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I propose we merge the HashMap/HashSet changes first, then rebase this on them and also change the unordered_map uses

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants