New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optional property paths (*, ?) are very slow #695
Comments
The reason is that |
Some more thoughts on implementation. First compute path nature (POS, NON) (see prop paths):
Then the following rewrite rules are executed repeatedly.
I think that's all that's needed. |
Eg if you try the rewrites on Use Case 5 above, you get this sequence This is correct and fast enough. But the most natural way to write it is the fastest, because it finds the shortest paths first (and returns elements in the desired order): So I think we need these changes:
and then:
|
Possibly related is issue #689 , which turned out to be partly caused by a bug in binding retention, but also a related query optimization issue. We've improved query optimization for arbitrary-length paths in RDF4J 2.1.4 to the effect that the optimizer will almost always execute the zero-or-more part last (leading to most variables being bound already and therefore far less options to search through). I have not yet looked at your analysis in sufficient detail to figure out whether that corresponds with your findings, but in practical terms it might be good to try your slow queries with the new planner, and see if there's an improvement. |
In addition to performance, it's important to return the shortest-path results first, |
I confirmed the issue exists in RDF4J 2.1.3, 2.1.6, and 2.2 |
It's been a while since I looked at this but we should investigate how the planner orders execution of the path elements and how it ensures that as much as possible, it executes paths with bound start or end points first. As for output ordering: a SPARQL query result is unordered by nature, so any ordering that you rely on (other than an externally imposed one using an ORDER BY clause) is bound to give you problems in the future. |
Also surprised that the issue still exists post 2.1.4 (and the fix for #689). Might be worth checking if that fix was somehow accidentally not included in later releases. |
It won't hurt to have a sensible order, esp for
|
Fix #695: Don't treat path modifiers as subqueries
Signed-off-by: James Leigh <james.leigh@ontotext.com>
…4j#695-join-projection Fix eclipse-rdf4j#695: Don't treat path modifiers as subqueries
Fix #695: Don't treat path modifiers as subqueries
Signed-off-by: James Leigh <james.leigh@ontotext.com> Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
…4j#695-join-projection Fix eclipse-rdf4j#695: Don't treat path modifiers as subqueries Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
(Created from OWLIM-1104).
Prop paths using optional constructs (
p?
andp*
) are extremely slow.These paths alone are necessarily slow because they must return all nodes.
But nobody uses them alone, people always use them in combination with a non-optional path.
So one has to use workarounds, eg rewrite
q/p?
toq|q/p
, to get good perfomance.Use cases:
SHACL uses rdf:List for many of its constructs (sometimes unnecessarily) and uses
rdf:rest*/rdf:first
to unroll the list, eg see ClosedConstraintComponent.Unless fixed, one can't use the SHACL SPARQL Definitions to implement SHACL in rdf4j
Another typical example is poor man's RDFS inference:
rdf:type/rdfs:subClassOf*
American Art Collaborative discusses using
rdf:List
for "people depicted, with order". In this case a natural query would becrm:P62_depicts/(rdf:rest*/rdf:first)?
but should be rewritten to
crm:P62_depicts | crm:P62_depicts/rdf:first | crm:P62_depicts/rdf:rest+/rdf:first
It seems to me that this can be fixed by internally rewriting to an alternative (union) path: one without the optional property and another with it.
Implementaton suggestion:
More examples:
This alone will be slow since it's non-negative (
()
is the bad part), but when combined with a positive path, it can be made fast again, eg:The text was updated successfully, but these errors were encountered: