You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FILTER ?x regex ^<https://en.wikipedia.org/wiki/Albert_Ein
Size: 29 x 1
Cols: ?x
Time: 162,932ms
INDEX SCAN ?x <Article>
Size: 67,677,598 x 1
Cols: ?x
Time: 248ms
I though that a prefix regex FILTER is implemented by doing one or two binary searches on the sorted IDs and then manifesting strings only for the result IDs (only 29 in this case).
However, the high query time indicates that the strings are looked up for all 67,677,598 IDs. Why?
The text was updated successfully, but these errors were encountered:
@joka921 I don't understand how this is fixed by #295 (which was about the pattern trick not being used in some cases). The problem for the query above is that the prefix FILTER takes forever, although it could be fast.
I just tried the query again on the current version (where #295 has been incorporated) and the problem is still there.
The actual problem is simple, "^<https://en.wikipedia.org/wiki/Albert_Ein" is not a simple prefix regex but contains a . which is "match any character". So the actual behavior in your case is correct.
You probably wanted to escape the ., to my understanding this should be done by using two backslashes, once for Sparql and one for the regexengine, so, FILTER regex(?x, "^<https://en\\.wikipedia\\.org/wiki/Albert_Ein")
This escaping is broken on very many Levels in the current parsing (The actual lexing regex is wrong, the handling of the escapes in the regex filter parser is wrong and the Sparql escape handling is currently nonexisting. I will have a closer look at this.
joka921
changed the title
Prefix FILTER query takes very long although it shouldn't
Handling Escaped characters (ECHAR in Sparql/Turtle Grammar) is wrong for the SparqlParser, the TurtleParser and the Regex Filter Parser
Jan 3, 2020
The following query takes 164 seconds on http://qlever.informatik.uni-freiburg.de/Wikidata_Full :
Here are the specs from the execution tree:
I though that a prefix regex FILTER is implemented by doing one or two binary searches on the sorted IDs and then manifesting strings only for the result IDs (only 29 in this case).
However, the high query time indicates that the strings are looked up for all 67,677,598 IDs. Why?
The text was updated successfully, but these errors were encountered: