Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying a filter on a bound variable is not working #1208

Open
lpugin opened this issue Dec 18, 2023 · 5 comments
Open

Applying a filter on a bound variable is not working #1208

lpugin opened this issue Dec 18, 2023 · 5 comments

Comments

@lpugin
Copy link

lpugin commented Dec 18, 2023

Applying a FILTER on a variable defined through BIND does not work and returns an empty dataset. I could not find anything about this limitation in the documentation, sorry if I overlooked it.

This return an empty set:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?name ?population WHERE {
  ?city wdt:P31/wdt:P279* wd:Q515 .
  ?city wdt:P17 wd:Q183 .
  ?city wdt:P1082 ?population .
  ?city rdfs:label ?name .
  BIND(STRBEFORE(STR(?name), " ") AS ?first_word) .
  FILTER(REGEX(?first_word, "^F"))
  FILTER (LANG(?name) = "de")
}
ORDER BY DESC(?population)

https://qlever.cs.uni-freiburg.de/wikidata/FkvZCT

Applying the filter on ?name does work

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?name ?population WHERE {
  ?city wdt:P31/wdt:P279* wd:Q515 .
  ?city wdt:P17 wd:Q183 .
  ?city wdt:P1082 ?population .
  ?city rdfs:label ?name .
  BIND(STRBEFORE(STR(?name), " ") AS ?first_word) .
  FILTER(REGEX(?name, "^F"))
  FILTER (LANG(?name) = "de")
}
ORDER BY DESC(?population)

https://qlever.cs.uni-freiburg.de/wikidata/dGjnEU

Any ideas?

@hannahbast
Copy link
Member

@lpugin Thanks for the report. The problem with the first query is that ?first_name is a literal that is not part of the original input. QLever stores such IRIs or literals in its so-called "local vocabulary". The local vocabulary still has some (known) bugs and you have identified one of them. We should mention this on https://github.com/ad-freiburg/qlever/wiki/Current-deviations-from-the-SPARQL-1.1-standard .

There are two workarounds. One is to avoid applying the REGEX on the variable from the local vocabulary. Your second query does that, but of course with a different result (because ?first_name is undefined when the city label does not contain a space). You could fix that by changing the regex, for example: https://qlever.cs.uni-freiburg.de/wikidata/WvJmaI

Another workaround is the following: https://qlever.cs.uni-freiburg.de/wikidata/zE74aX . It adds STR around ?first_name (which should not be necessary, but currently is for terms from the local vocabulary) and turns the REGEX into an equivalent one that does not just search for matches for a fixed prefix. QLever has a particularly efficient implementation for matches with a fixed prefix, but that currently does not work with terms from the local vocabulary.

@lpugin
Copy link
Author

lpugin commented Dec 18, 2023

Thanks for the quick and extensive reply, @hannahbast. The example I gave was made up to be able to share it here, so the first workaround would not work because I need to perform a string operation before applying the filter. However, the second one works great!

For the record, I came across this example when trying to find a replacement for a FILTER(?value IN("C", "F", "G")), for which there is a well documented recommanded solution in the wiki, which is VALUES ?value { "C" "F" "G" }. That did not work when ?value was a bound variable. This is why I tried the REGEX. Now I tried to do VALUES STR(?value) { "C" "F" "G" } following your second workaround in order not to do the REGEX, but here I get:

Error processing query
Invalid SPARQL query: Token "STR": extraneous input 'STR' expecting {'(', VAR1, VAR2, NIL}

I guess it would be similar to this: https://qlever.cs.uni-freiburg.de/wikidata/bMOzB9 (while that works https://qlever.cs.uni-freiburg.de/wikidata/VzVYF6)

@hannahbast
Copy link
Member

@lpugin Thanks for the explanation. The token after VALUES has to be a variable, so you can't have VALUES STR(...).

As an alternative, you can use a REGEX with a disjunction, as in https://qlever.cs.uni-freiburg.de/wikidata/yrqnhx .Of course, the best solution would be for us to implement IN and fix the local vocabulary, but it is unlikely that this will happen before Christmas.

I am curious: which query or kind of query do you actually want to ask?

@lpugin
Copy link
Author

lpugin commented Dec 19, 2023

As an alternative, you can use a REGEX with a disjunction, as in https://qlever.cs.uni-freiburg.de/wikidata/yrqnhx .

Sure. I actually have one question for the REGEX. Why does "^F.*" work, and not "^F"?

Of course, the best solution would be for us to implement IN and fix the local vocabulary, but it is unlikely that this will happen before Christmas.

That is fully understandable, but thanks for looking into it.

I am curious: which query or kind of query do you actually want to ask?

It is on a local dataset, but I'll be happy to share it once we have it online. This is why I made up another one on Wikidata. Basically I want to filter and regroup counts of musical clefs coded with G-1, G-2, C-1, C-2, etc. by clef type (letter). Which is why I need to perform some sub-string extraction before applying the filter and the count.

@lpugin
Copy link
Author

lpugin commented Dec 21, 2023

Sorry, my question was probably not clear. Why is the REGEX behaving differently when applied to a predicate as opposed to a bound variable?
https://qlever.cs.uni-freiburg.de/wikidata/ITRzua
https://qlever.cs.uni-freiburg.de/wikidata/WTROpD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants