Skip to content

Conversation

Panaetius
Copy link
Member

@Panaetius Panaetius commented Oct 9, 2019

Closes #749

JSON-LD compaction goes through two phases, which we let pyld handle for us, namely:

  • Expansion: short form IRI's like schema:isPartOf get translated to full form IRI's like http://schema.org/isPartOf. Only values/nodes that have a corresponding entry in the @context or scoped @contexts inside the nodes/values themselves will get expanded, with other values getting dropped
  • Compaction: A new, separate context is supplied that is used to compact the JSON-LD from the previous step to a simpler form, replacing the absolute IRI's with the shortform if possible and removing values/nodes not found in the new context. This step ONLY pays attention to the supplied context, not to the one(s) found in the original JSON-LD document.

For instance, the following JSON-LD document:

{
  "@context": {
    "@version": 1.1,
    "name": "http://schema.org/name",
    "interest": {
      "@id": "http://xmlns.com/foaf/0.1/interest",      
    }
  },
  "name": "Manu Sporny",
  "interest": {
    "@id": "https://www.w3.org/TR/json-ld/",
     "@context": {"@vocab": "http://xmlns.com/foaf/0.1/"}
    "name": "JSON-LD",
    "topic": "Linking Data",
  }
}

together with using the top-level context of the document

"@context": {
    "@version": 1.1,
    "name": "http://schema.org/name",
    "interest": {
      "@id": "http://xmlns.com/foaf/0.1/interest",      
    }
  }

while being perfectly valid JSON-LD and a perfectly valid context, would delete the topic entry on compaction, since only interest and name are specified in the context. topic would be specified in the "@context": {"@vocab": "http://xmlns.com/foaf/0.1/"} line when reading the document, but any context inside the document is not applied when compacting.

So to not lose the topic field, we'd have to either manually add all fields of child nodes to the context supplied to the compaction function (leading to problems if you have the same name for two fields with a different Ontology, and making it quite hacky to know what has to be added to the context, which is what we were doing until now.

Or we can use scoped (sub-)contexts, such as:

{
    "@version": 1.1,
    "name": "http://schema.org/name",
    "interest": {
      "@id": "http://xmlns.com/foaf/0.1/interest",
      "@context": {"@vocab": "http://xmlns.com/foaf/0.1/"}
    }
  }

(See the @context added in interest, which is a scoped context that only applies to interest entries)

If we supply this context to compaction, all fields are there successfully. Since we don't want two methods to calculate contexts, it makes sense to add this enhanced, nested context (instead of the current flat one) to the metadata files etc. as well. Since this leads to the whole context being at the top level of a document, we don't need the contexts inside the values (as in the original example) anymore, as that'd just be duplication.

This PR does exactly that, automatically expanding entries in the top-level context with their respective child-contexts, removing the contexts inside values.
Collection types already had code to propagate their contexts up, though the implementation was broken and the code never reached.

jsonld.ib now has a new type parameter that allows the type (or fully namespaced string representation of the type to help with dependency hell) to be set for a property, which will automatically add it's context to the toplevel @context on load.

So for a collection, it's still (Example is for the Dataset class)

creator = jsonld.container.list(
        Creator,
        converter=_convert_dataset_creator,
        context='schema:creator',
        kw_only=True
    )

with Creator being the class whose @context will get added to the Datasets @context (if Dataset is the top level element)
And for single valued properties (one-to-one relations), like project.creator, it is now

creator = jsonld.ib(
        default=None, kw_only=True, context='schema:creator', type=Creator
    )

with type=Creator being new.

Old contexts get automatically adjusted on loading (Since the code that caused all this now actually does its job) and if they're persisted will have the new JSON-LD. Child level property names don't have to be manually added to the top level context anymore.

As this is a rather drastic rewrite of how we handle JSON-LD, we should be really sure it works before merging.

Other changes:

  • Creator has been moved to its own file, to prevent circular dependencies.
  • All JSON-LD now specifies that it's version 1.1, as otherwise scoped @context are not available.
  • Fixed a bug due to JSON-LD compacting single-element arrays to just the element (Screwing with DatasetTag collections if there's only 1 tag)
  • Fixed a bug with a raised exception that caused an exception itself
  • Added schema.org definition to Activity as it's needed for CommitMixin (This should not have to be done on every class inheriting from CommitMixin! we should fix this). This change is also in renku log to generate single identifier for dataset imports of the same resource #719, as it was needed in both places.

@Panaetius Panaetius requested a review from a team as a code owner October 9, 2019 15:37
jsam
jsam previously approved these changes Oct 15, 2019
Copy link
Contributor

@jsam jsam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I've played with it a bit and it looks that nothing broke. Thanks a lot for looking into this. Let's resolve conflicts and get this merged! 🚀

Copy link
Contributor

@jsam jsam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@Panaetius Panaetius merged commit 2b1948d into master Oct 16, 2019
@Panaetius Panaetius deleted the 749-json-ld-compacting branch October 16, 2019 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure loading dataset tag commits from metadata with jsonld compaction (and others)
2 participants