Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In using SERVICE, "string" variables get retrieved as NULL #1278

Closed
chapai2021 opened this issue Mar 12, 2021 · 6 comments · Fixed by #1894
Closed

In using SERVICE, "string" variables get retrieved as NULL #1278

chapai2021 opened this issue Mar 12, 2021 · 6 comments · Fixed by #1894
Labels

Comments

@chapai2021
Copy link

I was testing the federated SPARQL queries using "service". There are two issues:

  • only low-case "service" is accepted, i.e., SPARQL with "SERVICE" would fail. This was fixed for next release.
  • "string" variables get retrieved as null. This is new.

Sorry, I do not have time to create a small reproducible example.
Below, two examples of querying the same SPARQL endpoint, one with

from pandas import DataFrame
from rdflib.plugins.sparql.processor import SPARQLResult

def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
return DataFrame(
data=([None if x is None else x.toPython() for x in row] for row in results),
columns=[str(x) for x in results.vars],
)

qf="""
PREFIX tsmodel: http://siemens.com/tsmodel/
PREFIX vobj: http://siemens.com/tsmodel/VAROBJECTS#

 SELECT ?x ?node_name ?idx 
 {      
    service <http://localhost:8080/sparql>
    {
        ?x a tsmodel:VAROBJECTS .
        ?x vobj:idx ?idx .
        ?x vobj:node_name ?node_name .
    }
  }

"""

results=graph.query(qf)
df=sparql_results_to_df(results)
df.head(20)

x node_name idx
0 http://siemens.com/tsmodel/VAROBJECTS/idx=1 None 1
1 http://siemens.com/tsmodel/VAROBJECTS/idx=2 None 2
2 http://siemens.com/tsmodel/VAROBJECTS/idx=3 None 3
3 http://siemens.com/tsmodel/VAROBJECTS/idx=4 None 4
4 http://siemens.com/tsmodel/VAROBJECTS/idx=5 None 5

import sparql_dataframe

endpoint = "http://localhost:8080/sparql"

q = """
PREFIX tsmodel: http://siemens.com/tsmodel/
PREFIX vobj: http://siemens.com/tsmodel/VAROBJECTS#

SELECT ?x ?node_name ?idx {
?x a tsmodel:VAROBJECTS .
?x vobj:idx ?idx .
?x vobj:node_name ?node_name .
}
"""

df = sparql_dataframe.get(endpoint, q)
df.head()

x	node_name	idx

0 http://siemens.com/tsmodel/VAROBJECTS/idx=1 ns=3;s="ultrasonicLeft" 1
1 http://siemens.com/tsmodel/VAROBJECTS/idx=2 ns=3;s="voltageL1-N" 2
2 http://siemens.com/tsmodel/VAROBJECTS/idx=3 ns=3;s="voltageL2-N" 3
3 http://siemens.com/tsmodel/VAROBJECTS/idx=4 ns=3;s="voltageL3-N" 4
4 http://siemens.com/tsmodel/VAROBJECTS/idx=5 ns=3;s="currentL1" 5

I do have more examples like this. As I have checked, "int", "float", "datetime" work fine. But, "string" variables are always get "NULL".

@nickkpope
Copy link

Thanks @chapai2021! I've been meaning to file an issue for these exact same issues.

After some pdb'ing I saw that the regex for the service clause is only checking for 'service'. Not sure if a case-insensitive switch would have unintended side effects.

But the missing string literals are quite strange. I checked the rdflib.plugins.sparql.processor.SPARQLResult bindings and can see that the values are missing even though the vars are declared. CONSTRUCT queries also miss string literals.

In addition to those two points (happy to re-submit this as a separate issue) I found that extraction of the url between the '<>' will go out of bounds when using comments within the service clause's scope. For example:

SELECT *
WHERE {
    service <http://foo/sparql> {
#     service <http://foo/sparql?blah=foo> {
        ?s ?p ?o
    }
}

will result in the following url being used for the federated request:

'http://foo/sparql> { # '

The regex for this seems to be pretty permissive ^service\s*<(.*)>\s*{(.*)}\s*$ (same in both 5.0.0 and master ). By using .* it will consume up to the last occurrence of >[ \n]*{ regardless of comments (somewhat less important - using [ \n] doesn't encompass all whitespace characters especially tabs).

@nicholascar
Copy link
Member

The SERVICE / service issue is long done (i.e. SERVICE is implemented) but there is an issue with SERVICE & HTTPS, see Issue #1295

@vemonet
Copy link

vemonet commented May 31, 2021

Hi @nicholascar

Could you explain more about the SERVICE / service issue? I still have it on RDFLib 5.0.0 (only working with lowercase service)

Is it fixed in 6.0.0? Do I need to upgrade RDFLib to a specific version?

@vemonet
Copy link

vemonet commented Jun 1, 2021

Answering my own question:

The current master branch for RDFLib has the fix to make service parsing case insensitive: https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/sparql/evaluate.py#L307

But not the version 5.0.0: https://github.com/RDFLib/rdflib/blob/5.0.0/rdflib/plugins/sparql/evaluate.py#L280

@nicholascar
Copy link
Member

Hi @vermonet, glad you got to this quickly and thanks very much for returning here to post your answer!

@ghost ghost added the SPARQL label Dec 23, 2021
@gitmpje
Copy link
Contributor

gitmpje commented May 1, 2022

I encountered the same issue with literals getting returned as NULL, more specifically literals that do not have their datatype defined. Those cases are not covered by the current implementation:

if r[var]["type"] == "uri":
res_dict[Variable(var)] = URIRef(r[var]["value"])
elif r[var]["type"] == "bnode":
res_dict[Variable(var)] = BNode(r[var]["value"])
elif r[var]["type"] == "literal" and "datatype" in r[var]:
res_dict[Variable(var)] = Literal(
r[var]["value"], datatype=r[var]["datatype"]
)
elif r[var]["type"] == "literal" and "xml:lang" in r[var]:
res_dict[Variable(var)] = Literal(
r[var]["value"], lang=r[var]["xml:lang"]
)

I would propose to use similar logic as is used by the JSON results parser (or even reuse those functions/methods):

def parseJsonTerm(d):
"""rdflib object (Literal, URIRef, BNode) for the given json-format dict.
input is like:
{ 'type': 'uri', 'value': 'http://famegame.com/2006/01/username' }
{ 'type': 'literal', 'value': 'drewp' }
"""
t = d["type"]
if t == "uri":
return URIRef(d["value"])
elif t == "literal":
return Literal(d["value"], datatype=d.get("datatype"), lang=d.get("xml:lang"))
elif t == "typed-literal":
return Literal(d["value"], datatype=URIRef(d["datatype"]))
elif t == "bnode":
return BNode(d["value"])
else:
raise NotImplementedError("json term type %r" % t)

aucampia added a commit that referenced this issue May 15, 2022
Fixes #1278 simple literals returned as NULL. The resolution uses same logic as here: https://github.com/RDFLib/rdflib/blob/6f2c11cd2c549d6410f9a1c948ab3a8dbf77ca00/rdflib/plugins/sparql/results/jsonresults.py#L89-L107

Co-authored-by: Mark van der Pas <mark.van.der.pas@semaku.com>
Co-authored-by: Iwan Aucamp <aucampia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants