Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The # character is erased from the URI of a NAMED graph while querying it #1160

Closed
anatoly-scherbakov opened this issue Sep 7, 2020 · 2 comments · Fixed by #1163
Closed

Comments

@anatoly-scherbakov
Copy link
Contributor

anatoly-scherbakov commented Sep 7, 2020

Please consider the following Python snippet.

import rdflib

print(f'RDFLib version: {rdflib.__version__}')

response = rdflib.ConjunctiveGraph().query('''
SELECT DISTINCT ?g
FROM NAMED <http://www.w3.org/2000/01/rdf-schema#>
WHERE {
  GRAPH ?g {
    ?s ?p ?o .
  }
}
''')

print(list(response))

On my machine, this outputs:

RDFLib version: 5.0.0
[(rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema'),)]

Essentially, what's happening is that we create one named graph from http://www.w3.org/2000/01/rdf-schema# and then query the RDF dataset for all named graphs which have at least one triple. Which yields http://www.w3.org/2000/01/rdf-schema - without the # character.

I found this when debugging a larger SPARQL query, where the statement like

GRAPH rdfs: {
  ...
}

did not yield anything.

  1. Is this expected behavior?
  2. If it is, could you please point me to the place in SPARQL spec which describes it?
  3. If not, maybe you could point me to a direction where this might happen? I digged into the code, but I am not familiar with rdflib well enough yet; I could not find where the IRI could have been corrupted.

By the way, when constructing a ConjunctiveGraph with parse() calls which specify IRIs for the named graphs — all of this works perfectly.

@anatoly-scherbakov
Copy link
Contributor Author

The source of the problem appears to be here:

https://github.com/RDFLib/rdflib/blob/master/rdflib/parser.py#L262

absolute_location = URIRef(location, base=base).defrag()
...
input_source = URLInputSource(absolute_location, format)

The URIRef.defrag() method does exactly this – it removes the URL fragment:

def defrag(self):

    def defrag(self):
        if "#" in self:
            url, frag = urldefrag(self)
            return URIRef(url)
        else:
            return self

The problematic line 262 in parser.py looks to be from 08.03.2009 according to git blame – this is quite old 🙂

Preparing a PR for this...

anatoly-scherbakov added a commit to anatoly-scherbakov/rdflib that referenced this issue Sep 12, 2020
anatoly-scherbakov added a commit to anatoly-scherbakov/rdflib that referenced this issue Sep 12, 2020
@anatoly-scherbakov
Copy link
Contributor Author

After adding this, I realized that rdflib traverses redirects and changes the IRI of the named graph in case redirects took place. This means that the name of a named graph is generally not predictable in a SPARQL query. This would have been solved if SPARQL would support something like

FROM NAMED <iri> AS <another_iri>

But this construct arguably does not work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant