The # character is erased from the URI of a NAMED graph while querying it #1160

anatoly-scherbakov · 2020-09-07T17:10:08Z

Please consider the following Python snippet.

import rdflib

print(f'RDFLib version: {rdflib.__version__}')

response = rdflib.ConjunctiveGraph().query('''
SELECT DISTINCT ?g
FROM NAMED <http://www.w3.org/2000/01/rdf-schema#>
WHERE {
  GRAPH ?g {
    ?s ?p ?o .
  }
}
''')

print(list(response))

On my machine, this outputs:

RDFLib version: 5.0.0
[(rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema'),)]

Essentially, what's happening is that we create one named graph from http://www.w3.org/2000/01/rdf-schema# and then query the RDF dataset for all named graphs which have at least one triple. Which yields http://www.w3.org/2000/01/rdf-schema - without the # character.

I found this when debugging a larger SPARQL query, where the statement like

GRAPH rdfs: {
  ...
}

did not yield anything.

Is this expected behavior?
If it is, could you please point me to the place in SPARQL spec which describes it?
If not, maybe you could point me to a direction where this might happen? I digged into the code, but I am not familiar with rdflib well enough yet; I could not find where the IRI could have been corrupted.

By the way, when constructing a ConjunctiveGraph with parse() calls which specify IRIs for the named graphs — all of this works perfectly.

The text was updated successfully, but these errors were encountered:

anatoly-scherbakov · 2020-09-08T16:21:20Z

The source of the problem appears to be here:

https://github.com/RDFLib/rdflib/blob/master/rdflib/parser.py#L262

absolute_location = URIRef(location, base=base).defrag()
...
input_source = URLInputSource(absolute_location, format)

The URIRef.defrag() method does exactly this – it removes the URL fragment:

rdflib/rdflib/term.py

Line 262 in 5e06430

def defrag(self):

    def defrag(self):
        if "#" in self:
            url, frag = urldefrag(self)
            return URIRef(url)
        else:
            return self

The problematic line 262 in parser.py looks to be from 08.03.2009 according to git blame – this is quite old 🙂

Preparing a PR for this...

…r them

anatoly-scherbakov · 2020-09-12T15:51:26Z

After adding this, I realized that rdflib traverses redirects and changes the IRI of the named graph in case redirects took place. This means that the name of a named graph is generally not predictable in a SPARQL query. This would have been solved if SPARQL would support something like

FROM NAMED <iri> AS <another_iri>

But this construct arguably does not work?

anatoly-scherbakov added a commit to anatoly-scherbakov/rdflib that referenced this issue Sep 12, 2020

RDFLib#1160 create_input_source() conflicting arguments and a test fo…

65987a3

…r them

anatoly-scherbakov added a commit to anatoly-scherbakov/rdflib that referenced this issue Sep 12, 2020

RDFLib#1160 removed .defrag() and added a test

7c7f05c

anatoly-scherbakov mentioned this issue Sep 12, 2020

Issue 1160 missing url fragment #1163

Merged

nicholascar closed this as completed in #1163 Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The # character is erased from the URI of a NAMED graph while querying it #1160

The # character is erased from the URI of a NAMED graph while querying it #1160

anatoly-scherbakov commented Sep 7, 2020 •

edited

anatoly-scherbakov commented Sep 8, 2020

anatoly-scherbakov commented Sep 12, 2020

The # character is erased from the URI of a NAMED graph while querying it #1160

The # character is erased from the URI of a NAMED graph while querying it #1160

Comments

anatoly-scherbakov commented Sep 7, 2020 • edited

anatoly-scherbakov commented Sep 8, 2020

anatoly-scherbakov commented Sep 12, 2020

anatoly-scherbakov commented Sep 7, 2020 •

edited