Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ql:has-predicate yields incorrect results in conjunction with ps: predicate #289

Closed
hannahbast opened this issue Oct 30, 2019 · 4 comments
Closed
Assignees

Comments

@hannahbast
Copy link
Member

hannahbast commented Oct 30, 2019

In the current version of QLever, the following query yields only 5 rows, and wdt:P31 is not among them although it should be:

PREFIX ps: <http://www.wikidata.org/prop/statement/>
SELECT ?p WHERE {
  ?s ps:P1913 ?m .
  ?m ql:has-predicate ?p
}
GROUP BY ?p

If the second triple is replaced by ?m ?p ?o, the number of rows increases to 164 and wdt:P31 is included.

The result set of the two queries should be the same. So it seems that something is wrong the PREDICATE SCAN operation.

The problem also occurs if we add (COUNT ?m AS ?count) to the SELECT clause.

The problem also occurs when replacing ps:P1913 by wdt:P1913. However, wdt:P31 is present then in the results. So maybe this is a bug which has been around already for quite a while, but we haven't realized it yet (because only less "interesting" predicates were missing if no ps: predicates are involved in the query).

@hannahbast
Copy link
Member Author

I have investigated this a bit more and found another very natural query with the same problem:

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?x ?p WHERE {
  ?x wdt:P646 ?freebase_id .
  ?x ql:has-predicate ?p
}

If wdt:P646 ("Freebase ID") is replaced by wdt:P31("instance of"), the query works (at least, the results look plausible).

Maybe what make the difference here is whether the predicate in the first triple has objects, the names of which are part of the externalized vocabulary (stored on disk). At least, that is true for objects of the ps: predicates and (I guess) also for a predicate like wdt:P646. The problem also occurs for the predicate schema:name.

It's just an educated guess, since I don't understand how the nature of the predicate in the first triple affects the PREDICATE SCAN operation. But maybe this is an interesting piece of information anyway.

@floriankramer
Copy link
Member

The query

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?p WHERE {
  <http://www.wikidata.org/entity/Q1000001> ql:has-predicate ?p
}

returns only 34 lines, vs

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?p WHERE {
  <http://www.wikidata.org/entity/Q1000001> ?p ?o
}

returning 127. The difference here seem to be language predicates though (e.g. @ar@<http://schema.org/description>). The entity used for the queries is one of the subjects of the wdt:P646 triples. ql:has-predicate does include the P646 predicate though, indicating that the bug might be inside of the predicate scan with a subquery.

floriankramer added a commit to floriankramer/QLever that referenced this issue Nov 18, 2019
@floriankramer
Copy link
Member

I've found a bug in the assignment of the column of the subtree that a predicate scan uses as it's subjects. If the predicate scan was the right side of the join it would always use column 0 of it's subqueries. Thus, when the optimizer chose an ordering for the scan where the subtree's column 0 was the triples subjects the query would work, but if the optimizer chose any other order (which it could, given the order doesn't affect performance in this case) the query results would be arbitrary.

niklas88 added a commit that referenced this issue Nov 18, 2019
Fixed the subtree column of has pedicate scans. Fixes #289
@hannahbast
Copy link
Member Author

@floriankramer @niklas88 Thank you, Florian, for finding and fixing this bug, and thank you, Niklas, for the code review. I have updated the backend behind http://qlever.informatik.uni-freiburg.de/Wikidata_Full to the latest version of the master, and the problematic queries now work like a charm!

In particular, now try to find the three movies, for which Meryl Streep won an Oscar. It's not an easy SPARQL query, but it can be constructed reasonably well with what we have now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants