Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json-ld serialisation and URIs/predicate overlapping #1254

Closed
fatzh opened this issue Apr 9, 2022 · 5 comments
Closed

json-ld serialisation and URIs/predicate overlapping #1254

fatzh opened this issue Apr 9, 2022 · 5 comments

Comments

@fatzh
Copy link

fatzh commented Apr 9, 2022

First off, thanks for the great software, we use it a lot and it's brilliant ;)

I stumbled upon something today while trying to parse a json-ld response from Jena/Fuseki. I have a test store with a few books, their URIs look like this: <http://onbetween.ch/3ms/cms#book_1>.

If there's a custom predicate <http://onbetween.ch/3ms/cms#book>, there's an overlap with the book URIs and the JSON-LD serialisation is no longer valid :-/ as I get book URIs like this: book:_1.

Here's the very simple CONSTRUCT query that I send to Fuseki (version 4.4.0):

In [67]: response = requests.post('http://localhost:3030/threems_example', data={'query': """
    ...: PREFIX cms: <http://onbetween.ch/3ms/cms#>
    ...:
    ...: CONSTRUCT {
    ...: ?a cms:book ?c
    ...: }
    ...: FROM <http://example/cmstest_data>
    ...: WHERE {
    ...: ?a a cms:Collection.
    ...: VALUES ?a { cms:collection_1 }.
    ...: ?c a cms:Book.
    ...: }
    ...: """})

In [68]: print(response.content.decode())
@prefix schema: <http://schema.org/> .
@prefix threems: <http://onbetween.ch/3ms/core#> .
@prefix owl:   <http://www.w3.org/2002/07/owl#> .
@prefix cms:   <http://onbetween.ch/3ms/cms#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix cmsapi: <http://onbetween.ch/3ms/cmsapi#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml:   <http://www.w3.org/XML/1998/namespace> .
@prefix cmsui: <http://onbetween.ch/3ms/cmsui#> .
@prefix dc:    <http://purl.org/dc/elements/1.1/> .

cms:collection_1  cms:book  cms:book_B , cms:book_C , cms:book_2 , cms:book_1 , cms:book_A .

Which is correct, got a simple collection with 5 books.

Now if I request JSON-LD form Fuseki (just adding headers Accept: application/ld+json to the query)

In [73]: response = requests.post('http://localhost:3030/threems_example', data={'query': """
    ...: PREFIX cms: <http://onbetween.ch/3ms/cms#>
    ...:
    ...: CONSTRUCT {
    ...: ?a cms:book ?c
    ...: }
    ...: FROM <http://example/cmstest_data>
    ...: WHERE {
    ...: ?a a cms:Collection.
    ...: VALUES ?a { cms:collection_1 }.
    ...: ?c a cms:Book.
    ...: }
    ...: """}, headers={'Accept':  'application/ld+json'})

In [74]: print(response.content.decode())
{
  "@id" : "cms:collection_1",
  "book" : [ "book:_B", "book:_C", "book:_2", "book:_1", "book:_A" ],
  "@context" : {
    "book" : {
      "@id" : "http://onbetween.ch/3ms/cms#book",
      "@type" : "@id"
    },
    "schema" : "http://schema.org/",
    "threems" : "http://onbetween.ch/3ms/core#",
    "owl" : "http://www.w3.org/2002/07/owl#",
    "cms" : "http://onbetween.ch/3ms/cms#",
    "xsd" : "http://www.w3.org/2001/XMLSchema#",
    "skos" : "http://www.w3.org/2004/02/skos/core#",
    "rdfs" : "http://www.w3.org/2000/01/rdf-schema#",
    "cmsapi" : "http://onbetween.ch/3ms/cmsapi#",
    "xml" : "http://www.w3.org/XML/1998/namespace",
    "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "cmsui" : "http://onbetween.ch/3ms/cmsui#",
    "dc" : "http://purl.org/dc/elements/1.1/"
  }
}

Fuseki serializes the books like this:

"book" : [ "book:_B", "book:_C", "book:_2", "book:_1", "book:_A" ],

I wasn't sure if that's actually correct json-ld serialisation, but trying on the json-ld playground here I get this interpretation:

<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <book:_1> .
<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <book:_2> .
<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <book:_A> .
<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <book:_B> .
<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <book:_C> .

Which is incorrect. Should be:

<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <http://onbetween.ch/3ms/cms#book_1> .
<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <http://onbetween.ch/3ms/cms#book_2> .
<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <http://onbetween.ch/3ms/cms#book_A> .
<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <http://onbetween.ch/3ms/cms#book_B> .
<http://onbetween.ch/3ms/cms#collection_1> <http://onbetween.ch/3ms/cms#book> <http://onbetween.ch/3ms/cms#book_C> .

It's a bit of an edge case, but actually for us this may happen when working with organisation specific ontologies.

I'm more of a python dev nowadays but if I can help let me know. If you can confirm it's a bug, I can also look into it, but I'm actually not 100% sure if that's not a JSON-LD spec issue.

Also if the predicate serialisation would be using the prefixes, i.e. cms:book, this wouldn't happen, we would have cms:book_1, something like:

{
  "@id": "cms:collection_1",
  "cms:book": [
    "cms:book_B",
    "cms:book_C",
    "cms:book_2",
    "cms:book_1",
    "cms:book_A"
  ],
  "@context": {
    "cms:book": {
      "@id": "http://onbetween.ch/3ms/cms#book",
      "@type": "@id"
    },
    "schema": "http://schema.org/",
    "threems": "http://onbetween.ch/3ms/core#",
    "owl": "http://www.w3.org/2002/07/owl#",
    "cms": "http://onbetween.ch/3ms/cms#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "cmsapi": "http://onbetween.ch/3ms/cmsapi#",
    "xml": "http://www.w3.org/XML/1998/namespace",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "cmsui": "http://onbetween.ch/3ms/cmsui#",
    "dc": "http://purl.org/dc/elements/1.1/"
  }
}

What do you think ?

@afs
Copy link
Member

afs commented Apr 16, 2022

Hi @fatzh,

Jena uses jsonld-java for reading JSON-LD 1.0 and uses titanium-json-ld to parse JSON-LD 1.1.

Jena uses jsonld-java for writing JSON-LD (so JSON-LD 1.0). Note - your data does not have "@ version" (space added to not name a GH user!)

When I parse the [ "book:_B", "book:_C", "book:_2", "book:_1", "book:_A" ] I get different RDF between JSON-LD 1.0 and 1.1 across json-ld-java and titanium. Same for the JSON-LD playground does the same for JSON-LD 1.0 vs 1.1.

See a users@jena thread:
https://lists.apache.org/thread/zl0c6jgxnc9ckmc5pvhcoy72ypyr41fp

Suggestion - could you add a prefix to the data for book:? The writer tries to use the prefixes to build the context.

It looks like a difference between JSON-LD 1.0 and 1.1.

@fatzh
Copy link
Author

fatzh commented Apr 16, 2022

hi @afs ! thanks, indeed it seems ok when parsing with @version: 1.0. I guess at some point jsonld-java will support 1.1, they seem to be on it.

Note - your data does not have "@ version"

the data is what I get back from Jena, I guess I can live with it for now, but good to know.

@fatzh fatzh closed this as completed Apr 16, 2022
@afs
Copy link
Member

afs commented Apr 18, 2022

@gkellogg -- Hi Gregg, the 1.0 and 1.1 playgrounds confirm this difference in behaviour for the "book:YYY".

If you have a moment, could you point to which of items in https://www.w3.org/TR/json-ld11/#changes-from-10 is causing this?

Simplified version:

{
  "@id" : "http://example/collection",
  "http://example/p" : [ "book:ZZZ" ],
  "book" : [ "book:YYY" ],
  "@context" : {
    "book" : {
      "@id" : "http://onbetween.ch/3ms/cms#",
      "@type" : "@id"
    }
  }
}

gives 1.0 playground:

<http://example/collection> <http://example/p> "book:ZZZ" .
<http://example/collection> <http://onbetween.ch/3ms/cms#> <http://onbetween.ch/3ms/cms#YYY> .

or 1.1 playground:

<http://example/collection> <http://example/p> "book:ZZZ" .
<http://example/collection> <http://onbetween.ch/3ms/cms#> <book:YYY> .

@gkellogg
Copy link

Yes, this was intentional, as terms were used too liberally as prefixes, which caused unintended consequences. The note in the Changes since 1.0 Recommendation of 16 January 2014 says the following:

In JSON-LD 1.1, terms will be used as compact IRI prefixes when compacting only if a simple term definition is used where the value ends with a URI gen-delim character, or if their expanded term definition contains an @Prefix entry with the value true. The 1.0 algorithm has been updated to only consider terms that map to a value that ends with a URI gen-delim character.

The operative step is Step 10 in the Create Term Definition Algorithm

Create a new term definition, definition, initializing prefix flag to false, protected to protected, and reverse property to false.

And step 14.2.5:

If term contains neither a colon (:) nor a slash (/), simple term is true, and if the IRI mapping of definition is either an IRI ending with a gen-delim character, or a blank node identifier, set the prefix flag in definition to true.

The operative bit is that this is not a simple term.

This can be changed by adding "@prefix": true to the term definition (playground link):

{
  "@id" : "http://example/collection",
  "http://example/p" : [ "book:ZZZ" ],
  "book" : [ "book:YYY" ],
  "@context" : {
    "book" : {
      "@id" : "http://onbetween.ch/3ms/cms#",
      "@type" : "@id",
      "@prefix": true
    }
  }
}

@afs
Copy link
Member

afs commented Apr 19, 2022

@gkellogg -- thanks for the details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants