Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow registered prefix use and namespaces which are URIs closes #632 #660

Closed
wants to merge 2 commits into from

Conversation

satra
Copy link
Contributor

@satra satra commented Oct 19, 2016

cache namespaces during the bind process.

closes #632

@coveralls
Copy link

coveralls commented Oct 19, 2016

Coverage Status

Coverage increased (+0.02%) to 62.847% when pulling 63f2953 on satra:fix/use_bound_namespace_prefix into 11e835a on RDFLib:master.

@gromgull
Copy link
Member

Two things:

  • this wont actually fix Handle namespaces that end in an underscore in turtle serializer #632, the cache only contains the namespace (http://purl.obolibrary.org/obo/RO_) not the URL that will be looked up: http://purl.obolibrary.org/obo/RO_0002200
  • fixing it in the cache is dirty :) If I move the triples to another graph, or close and reopen my store, I will have a new NamespaceManager and the cache is lost.

A possible fix would be for the compute qname method should instead loop through all registered prefixes, find the longest prefix match, and then check if the remainder is a valid localname. That is much more work though.

I find those URLs quite ugly - RDF is WEB technology, / means directory and # means fragment. _ SHOULDN'T be meaningful in a URL. I would maybe be in favour of leaving it as it is :)

@gromgull
Copy link
Member

#649 is the same discussion

@satra
Copy link
Contributor Author

satra commented Oct 19, 2016

@gromgull - let me know if we should move the discussion to #649, but i'll respond here for now!

this hack does fix #632 as the cache would contain the full URI.

i completely agree that this is a hack and should be replaced with sth better. however, if you do move the triples and bind the namespace there, you would regenerate the cache. if you simply copied the namespaces internally without going through a bind then you would not have the new cache.

the key here is that the cache generation happens on bind, and that's the point. the namespace prefix is a serialization concept not an RDF concept. RDF simply says URIs. turtle/trig on the other hand allows prefix-es.

therefore, if i explicitly bind a prefix to a valid string, rdflib shouldn't parse it down to a prefix, URI, and part. so both of the following prefixes would be completely valid:

@prefix ex: <http://example.org/> .
@prefix ex_foo: <http://example.org/#foo>

the problem right now is that the prefix generator in rdflib takes a decision that the prefix cannot be: <http://example.org/#foo> even though it's been instructed to by the user and is in full compliance with turtle specs.

import rdflib as rl
g = rl.Graph()
g.bind('ex', 'http://example.org/#foo')
g.compute_qname('http://example.org/#foo')

returns

Out[8]: ('ns1', rdflib.term.URIRef('http://example.org/#'), 'foo')

whereas it should have returned:

Out[8]: ('ex', rdflib.term.URIRef('http://example.org/#foo'), '')

i'm happy to propose a non-hacked solution if there is agreement that rdflib should keep the prefix as determined by the bind call

@gromgull
Copy link
Member

From #632:

   from rdflib import Graph, URIRef

   graph = Graph()
   graph.bind('GENO', 'http://purl.obolibrary.org/obo/GENO_')
   graph.bind('RO', 'http://purl.obolibrary.org/obo/RO_')

   graph.add((URIRef('http://example.org'),
                     URIRef('http://purl.obolibrary.org/obo/RO_0002200'),
                     URIRef('http://purl.obolibrary.org/obo/GENO_0000385')))

   print(graph.serialize(format='turtle'))

so this calls bind with the namespace URL: http://purl.obolibrary.org/obo/GENO_

this gets put in the cache table.

Then when we serialize the graph, rdflib will call compute_qname on the actual URL: http://purl.obolibrary.org/obo/RO_0002200 and it will NOT be in the table.

Call compute qname on just the namespace URL like you do and expect a an tuple with the 3rd part as the empty string isn't something you would ever do?

@gromgull
Copy link
Member

I didn't test it btw - I am just speculating how I remember it works - I could be wrong :)

@satra
Copy link
Contributor Author

satra commented Oct 19, 2016

@gromgull - i think we are saying the same thing :)

the point of disagreement is that i believe the way it works is incorrect. simply, if i bind something, i don't expect rdflib to interpret that bind for me. validate yes, interpret, no

so using your example, and to make the turtle file readable to humans, i'm doing

from rdflib import Graph, URIRef

graph = Graph()
graph.bind('GENO', 'http://purl.obolibrary.org/obo/GENO_')
graph.bind('RO_has_phenotype', 'http://purl.obolibrary.org/obo/RO_0002200')
graph.add((URIRef('http://example.org'),
                     URIRef('http://purl.obolibrary.org/obo/RO_0002200'),
                     URIRef('http://purl.obolibrary.org/obo/GENO_0000385')))
print(graph.serialize(format='turtle'))

i will now see,

@prefix GENO: <http://purl.obolibrary.org/obo/GENO_> .
@prefix RO_has_phenotype: <http://purl.obolibrary.org/obo/RO_0002200> .
@prefix ns1: <http://purl.obolibrary.org/obo/> .

<http://example.org> ns1:RO_0002200 ns1:GENO_0000385 .

i would like to see this (which is valid turtle):

@prefix: RO_has_pheotype: <http://purl.obolibrary.org/obo/RO_0002200> .
@prefix: GENO: <http://purl.obolibrary.org/obo/GENO_> .

<http://example.org/> RO_has_phenotype: GENO:0000385 .

@gromgull
Copy link
Member

@gromgull - i think we are saying the same thing :)

there are two things:

  1. should rdflib allow namespaces ending with arbitrary characters.

And I am not totally convinced it should - and I am not sure what will break in other serialisations that rely on the same code, but are not turtle.

In any case, it certainly should not make them up by itself, but MAYBE if explicitly instructed with bind, it should.

  1. Does this PR solve that issue.

And my point is that "no", this PR does not solve it, not even in a hacky way?

@gromgull
Copy link
Member

<http://example.org/> RO_has_phenotype: GENO:0000385 .

is NOT valid turtle. The localname needs to be at least one character long.

@gromgull
Copy link
Member

Wait - maybe it IS valid. That is so ugly :)

https://www.w3.org/TR/turtle/#grammar-production-PrefixedName

@satra
Copy link
Contributor Author

satra commented Oct 20, 2016

@gromgull - ok now that we agree that it is valid :) i've updated the fix.

are there other serializers that use the namespaces like turtle/trig?

graph = Graph()
graph.bind('GENO', 'http://purl.obolibrary.org/obo/GENO_')
graph.bind('RO_has_phenotype', 'http://purl.obolibrary.org/obo/RO_0002200')
graph.add((URIRef('http://example.org'),
                     URIRef('http://purl.obolibrary.org/obo/RO_0002200'),
                     URIRef('http://purl.obolibrary.org/obo/GENO_0000385')))
print(graph.serialize(format='turtle').decode())

returns:

@prefix GENO: <http://purl.obolibrary.org/obo/GENO_> .
@prefix RO_has_phenotype: <http://purl.obolibrary.org/obo/RO_0002200> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org> RO_has_phenotype: GENO:0000385 .

@satra
Copy link
Contributor Author

satra commented Oct 20, 2016

will be looking at the N3 failures.

@satra
Copy link
Contributor Author

satra commented Oct 20, 2016

i found a bunch of other things i need to address. will fix and update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle namespaces that end in an underscore in turtle serializer
3 participants