Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespace shortening and invalid RDF outputs #115

Closed
ianmillard opened this issue Jan 21, 2013 · 7 comments
Closed

Namespace shortening and invalid RDF outputs #115

ianmillard opened this issue Jan 21, 2013 · 7 comments
Assignees
Labels

Comments

@ianmillard
Copy link

In some instances the in-built turtle serialiser can output invalid documents due to incorrect namespace shortening.

Consider four types of animal

animals/dog
animals/cat
animals/bird
animals/reptiles/snake

Then take the following example, which parses some triples and outputs them as turtle.
Note we also have also defined a namespace prefix for animals.

    set_include_path(get_include_path() . PATH_SEPARATOR . 'lib/');
    require_once "EasyRdf.php";

    $triples  = '
      <http://example.com/id/1> <http://www.w3.org/2000/01/rdf-schema#type> <http://example.com/ns/animals/dog> .
      <http://example.com/id/2> <http://www.w3.org/2000/01/rdf-schema#type> <http://example.com/ns/animals/cat> .
      <http://example.com/id/3> <http://www.w3.org/2000/01/rdf-schema#type> <http://example.com/ns/animals/bird> .
      <http://example.com/id/4> <http://www.w3.org/2000/01/rdf-schema#type> <http://example.com/ns/animals/reptiles/snake> .
    ';

    EasyRdf_Namespace::set('id',      'http://example.com/id/');
    EasyRdf_Namespace::set('animals', 'http://example.com/ns/animals/');

    //  parse graph
    $graph = new EasyRdf_Graph();
    $graph->parse($triples, 'ntriples');

    //  dump as text/turtle
    print $graph->serialise('turtle');

This will produce the following output, which contains the invalid animals:reptiles/snake

@prefix id: <http://example.com/id/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix animals: <http://example.com/ns/animals/> .

id:1 rdfs:type animals:dog .
id:2 rdfs:type animals:cat .
id:3 rdfs:type animals:bird .
id:4 rdfs:type animals:reptiles/snake .

In the last triple, the namespace shortening has been applied, but only matched against part of the URI.

There is a simple fix for the turtle parser, in changing line 62 and line 141 to be

if ($short && strpos($short, '/') === false) {

However, this problem still persists in the RDF/XML serialiser, as if I had say http://example.com/ns/property and http://example.com/ns/property/subproperty
then output such as the following would be produced

<ns0:property/subproperty rdf:resource="..." />

So probably a better fix is required within EasyRdf_Namespace ??

Also, in attempting to fix the "snake" issue, I discovered that the namespace shortenings are applied in the order in which they are declared. In the above example code, if you declare reptiles before animals then all is fine

EasyRdf_Namespace::set('reptile', 'http://example.com/ns/animals/reptiles/');
EasyRdf_Namespace::set('animal',  'http://example.com/ns/animals/');

however if you add reptiles after animals...

EasyRdf_Namespace::set('animal',  'http://example.com/ns/animals/');
EasyRdf_Namespace::set('reptile', 'http://example.com/ns/animals/reptiles/');

then you will never get reptile as animal will match first, causing the problem I discovered at the outset.

I suggest that probably each time a new namespace is declared/generated the internal map of prefixes and namespaces should be sorted by length of namespace?

thanks, and sorry this is so long!

Ian

@njh
Copy link
Collaborator

njh commented Jan 23, 2013

Yes, I like the idea of making sure the most specific namespace is matched.

I think I need to read the CURIE specification in detail:
http://www.w3.org/TR/curie/

And check that my implementation is compliant.

The serialiser implementations probably need different rules to ensure that the local part is valid for the serialisation.
For example the local part of a XML QNames isn't allowed to start with a number.

@indeyets
Copy link
Contributor

any news on this one? I just got output with slashes in "local" part too.

@scor
Copy link
Contributor

scor commented Jan 13, 2014

This issue is causing the RDFa test 0316 from the official RDFa test suite to fail. I initially filed #156 which is a duplicate of this one.

indeyets added a commit that referenced this issue Jul 22, 2014
indeyets added a commit that referenced this issue Jul 22, 2014
@indeyets
Copy link
Contributor

I implemented simple fix in EasyRdf_Namespace. What do you think? can we close this issue?

@indeyets indeyets self-assigned this Jul 22, 2014
@njh
Copy link
Collaborator

njh commented Jul 22, 2014

Looks good to me :)

Can you add a test to EasyRdf_NamespaceTest too please?

indeyets added a commit that referenced this issue Jul 22, 2014
@indeyets
Copy link
Contributor

@njh done

@scor
Copy link
Contributor

scor commented Jul 22, 2014

Thanks @indeyets!

@njh let me know when you deploy that to http://www.easyrdf.org/converter and I'll test the RDFa test which was failing, it should work now...

hackerceo added a commit to hackerceo/easyrdf that referenced this issue Jul 27, 2015
A more robust fix for closed problem easyrdf#115:

"Namespace shortening and invalid RDF outputs"
easyrdf#115
hackerceo added a commit to hackerceo/easyrdf that referenced this issue Jul 27, 2015
See easyrdf#115

I ran into this problem last month.  Here is my solution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants