New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the optimizer (more) correct #325
Make the optimizer (more) correct #325
Conversation
@joka921 seems the other PR introduced a conflict |
- The previous version of the QueryPlanner put all triples and filters that appeared directly in a WHERE clause together. - This is correct for the filters, but not for all other patterns - Joins are commutative, but Optional Joins are Left Joins, where first all the patterns appearing before them have to be evaluated and then the optional join has to be performed. - To correct this, implemented a BasicGraphPattern (triples that are "directly" in the Body of another graph pattern or a WHERE clause) class. BasicGraphPatterns are also children of their parent body. - This version of the QueryPlanner does some degree of optimization, however one could optimize multiple adjacent GraphPatterns that are not Optional, if their scopes are respected (e.g. Move Joins across Optional Joins that don't interfer with the OPTIONAL - TODO: There still are bugs in the handling of optionals (in the actual Optional Join class) as the SPARQL standard knows "unbound values" which can be rebound, e.g. SELECT ?x ?name WHERE { ?x <is-a> <human> OPTIONAL {?x <TwitterAccount> ?name} OPTIONAL {?x rdfs:label ?name } } which should only add the names only for humans that don't have twitter accounts and the handling of the empty query SELECT ?x WHERE {} (1 result where ?x is unbound/NULL)
…type of GraphPatternOperation.
…uncions - there is now a single function that creates all possible permutations of commutative joins that is consistently used.
4fed78b
to
230d4f4
Compare
* This is a bit similar to sorts: they can be applied after each step | ||
* and will filter on one variable. | ||
* Cycles have to be avoided (by previously removing a triple and using | ||
* it as a filter later on). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together in a function comment! I really like this overview!
ASSERT_EQ("?y", filters[0]._rhs); | ||
ASSERT_EQ(SparqlFilter::FilterType::NE, filters[0]._type); | ||
ASSERT_EQ(2u, triples.size()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea with the additional scoping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Previously, QLever was allowed to optimize all Graph patterns in an arbitrary order on the same nesting level
This is generally speaking not correct, since they have to be evaluated top to bottom.
Reordering Joins is no problem, since it does not change the result, but reordering across
OPTIONAL
joins is dangerous and can lead to errors.This PR forces the optimizer to first optimize all the graph patterns before the first optional join, then perform the optional join (leading to a non-optional pattern) and then treat this result as an input to optimize together with the rest of the query in the same manner.
In some cases it could be legal to optimize across optionals, but currently we simply ignore those.
The prize for this is currently some "Joins of the full index with itself are not allowed" errors which could be prevented, but those have been there before due to the internals and limitations of the current QueryPlanner.
TODO: there is actually a lot more stuff to fix with optional joins and treating columns where some values are defined and some are not, but that is a different story, that will be fixed later.