New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better lexer, better parser, better tests and VALUES #258
Conversation
The following query on Wikidata currently seems to result in an infinite loop during parsing.
adding a space before
results in the following error
Sounds like you missed a prefix expansion step |
@floriankramer correction with the missing |
363544e
to
49b8c1e
Compare
@niklas88 Both the namespace expansion as well as the infinite looping when terminating values without a whitespace before the closing bracket should be fixed now. |
@floriankramer something is still really broken in the Values operation. (OT: 🤯 GitHub supports SPARQL syntax highlighting if you use Consider the following queries:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT ?city ?cityLabel ?citytype WHERE {
VALUES ?citytype { wd:Q515}
?city wdt:P31 ?citytype .
?city rdfs:label ?cityLabel .
FILTER (lang(?cityLabel) = "en") .
}
SELECT ?city ?cityLabel ?citytype WHERE {
VALUES ?citytype { wd:Q515}
?city wdt:P31 ?citytype .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} On WQS this gives 9209 results on QLever it matches only one.
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT ?city ?cityLabel ?citytype WHERE {
?city wdt:P31 wd:Q515 .
?city rdfs:label ?cityLabel .
FILTER (lang(?cityLabel) = "en") .
} However this returns 7862 results which is realistically the same as WQS given our dataset is older. Also adding a space after |
49b8c1e
to
e77c080
Compare
97caf57
to
e8e654b
Compare
e2e/scientists_queries.yaml
Outdated
@@ -403,7 +403,7 @@ queries: | |||
?x ql:has-predicate ?predicate . | |||
} | |||
GROUP BY ?predicate | |||
HAVING (?predicate < <Z) (?predicate = <Religion>) | |||
HAVING (?predicate < <Z-Cars>) (?predicate = <Religion>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually issue #249 and specifically the need to convert the predicate with str()
to allow URLs to be treated as strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niklas88 If we implement the filters the way #249 suggests (Using the same schema for all types of filters) we will run into a problem with the way our vocabulary is stored. In the following example we cannot return the correct value with a single greater than filter:
SELECT ?a WHERE {
VALUES ?a {<cake3> <cake>}
FILTER(str(?a) > "cake")
}
The string <cake3>
is lexicographically smaller than <cake>
but we want to return <cake3>
but not <cake>
, as "cake3" > "cake"
.
The same problem occurs with literals, as the '!' character is smaller than '"'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niklas88 I did a quick and dirty fix of the problem by treating fitlers on literals that contain either <
or "
differently. While that is not standard conform the proper fix to this seems to be a more significant change that might require changes to the way the vocabulary is stored and should probably be done in its own pr.
@niklas88 The parser and lexer should support the full syntax the old parser supported now (minus invalid literals without quotation marks). I've also fixed the QueryPlanner tests depending on the order of HashMap, which caused the test to fail on my System and prevented me from finding an actual bug in the test. This should fix all parsing problems with |
@floriankramer oh wow, this is pretty amazing (as always) and that non-determinism fix is something that's really good long term. For one we can finally pass that test with using Abseil hash maps and figure out if those are faster. I'll test your changes first thing tomorrow but from what I've already seen the code already looks great. |
I'm currently testing this with our Wikidata Full index and the queries from the crawled Wikidata examples that were previously working. It now fails for 6 of them but it looks like all of those use features that we currently don't actually support and they just got through parsing and gave some result (I currently don't have code to check the result against blazegraph). Interestingly the following use of PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX psn: <http://www.wikidata.org/prop/statement/value-normalized/>
SELECT ?city ?cityLabel ?country ?countryLabel ?lang ?langLabel ?langCode ?population WHERE {
?city wdt:P1082 ?population .
FILTER (?population>1000000) .
?city wdt:P31 wd:Q515 .
?city wdt:P17 ?country .
?city rdfs:label ?cityLabel .
?country wdt:P37 ?lang .
?country rdfs:label ?countryLabel .
?lang wdt:P424 ?langCode .
?lang rdfs:label ?langLabel .
FILTER (lang(?cityLabel)=?langCode) .
FILTER (lang(?countryLabel)=?langCode) .
FILTER (lang(?langLabel)=?langCode) .
}
LIMIT 100 As for the previously reported problem with PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT ?city ?cityLabel ?citytype WHERE {
VALUES ?citytype { wd:Q515 }
?city wdt:P31 ?citytype .
?city rdfs:label ?cityLabel .
FILTER (lang(?cityLabel) = "en") .
} This no longer seems to be dependent on parsing but still only gives one result on QLever but >7000 results on WQS. |
@floriankramer ok another finding, there seems to be a problem with PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX psn: <http://www.wikidata.org/prop/statement/value-normalized/>
SELECT * WHERE {
?disease ?p ?o .
?disease wdt:P699 ?doid .
?o owl:sameAs ?sa .
} The following log output suggests that it hangs during parsing.
|
@niklas88 There was a bug in the galloping join code which made it terminate after the last element of the right side of the join was used once for computing the cross product. I've fixed that and also turned the galloping join into a (hopefully) proper galloping join, instead of just using binary search in the remainder of the larger list. |
@niklas88 regarding the |
} else { | ||
_lexer.accept(); | ||
throw ParseException("Error in SELECT: unexpected token: " + | ||
_lexer.current().raw); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, you beat me to it my fix was almost the same ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niklas88 I'm currently fixing whatever I broke in the galloping join, and also fixing the normal join which had the same bug as the galloping case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you even find that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something had to be broken, and the values statement returned the correct number of rows, so the likely culprit was the join operation (as without the values scan and join becomes a single scan).
src/engine/Engine.h
Outdated
return l[jc2] < needle; | ||
}) - | ||
l2.begin(); | ||
size_t step = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you can remove the TODO
then if it actually gallops now
src/engine/QueryPlanner.cpp
Outdated
} | ||
if (!hasMatch) { | ||
std::cout << asString() << std::endl; | ||
std::cout << other.asString() << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
src/engine/QueryPlanner.cpp
Outdated
// This method is very verbose as it is currently only intended for | ||
// testing | ||
if (_nodeStorage.size() != other._nodeStorage.size()) { | ||
std::cout << asString() << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even for testing I would use the LOG
macros
@floriankramer I added the |
This functionality was merged in the separate mentioned PRs, all of them are now merged |
This pr adds support for inline
Values
statements.