Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ror file as defined in INSPIRE #5

Closed
sgrellet opened this issue Dec 18, 2018 · 20 comments
Closed

ror file as defined in INSPIRE #5

sgrellet opened this issue Dec 18, 2018 · 20 comments

Comments

@sgrellet
Copy link
Member

Dynamically generate .ror files as specified by INSPIRE group on register federation:

  • registry ror file
  • ror descriptor for each register you want to share in the federation.

More details and example files here: https://ies-svn.jrc.ec.europa.eu/projects/inspire-registry/wiki/Registry_federation_requirements

Running examples (hand-made for BRGM registry)

Idea would be to have the required 'descriptors' natively included in the description of the registry and the registers -> no more need for static files but need to have a new 'download format' (like RDF/XML.ROR) that triggers the required response/serialization

@der
Copy link

der commented Dec 20, 2018

Adding metadata-only endpoints for registers and a metadata endpoint for the registry itself is feasible and may be a generally useful extension.

Such endpoints could deliver the usual RDF formats, including RDF/XML.

If the ROR specification allows ROR consumers to ignore properties they are not interested in, then once there are metadata-only endpoints you would just need to add the metadata corresponding to ROR on top of the the normal registry metadata.

If the specification (which isn't precise on this point) requires "these properties and only these properties" then that would be a different situation and would require a ROR specification solution that seems less appropriate for the main code line.

@sgrellet
Copy link
Member Author

sgrellet commented Jan 7, 2019

The validation process used (https://ies-svn.jrc.ec.europa.eu/projects/inspire-registry/wiki/Registry_federation_xsl_validators) excludes several prefixes.
Testing on our files it seems ok also having additionnal properties

@sgrellet
Copy link
Member Author

sgrellet commented Jan 7, 2019

Testing the example files with INSPIRE xsl, the .ror for the register (litho.ror) initially did not pass register descriptor validators for several reasons listed below

  1. considers the file just as an XML file and not a graph -> the XSL template is looking directly for a ConceptScheme typed resource/tag at the first level (see /rdf:RDF/*)
  2. almost only works byRef and not inLine: it is expecting a shallow representation of the register and its content. Thus if a resource linked to another one directly describes it in line instead of pointing to it using rdf:resource=, the 'refered to' resource is ignored.
  3. however, for the description of the ConceptScheme, it expects dct:publisher and dct:isPartOf to have their inline description
  4. a skos:definition @en is expected for the ConceptScheme (dct:description seems to be ignored)
    -> mid-term solution : update the xslt or try SHACL ?
  • ldregistry download takes a first 'information seed' and pulls out the data graph from it. This is clean from websem perspective but does not match the validation script expectations
    -> short-term : describe the SPARQL construct needed to enable the additionnal metadata elements for INSPIRE registry federation (or have them added for each register in the registry), pass the output TTL through rdf-translator API to have a flat XML structure and do some manual edits for bullet points 3 and 4 above to validate properly
    -> mid-term : allow to have a download structure conforming to the INSPIRE registry federation expectatations (validation + harvest)

@der
Copy link

der commented Nov 7, 2019

Comments and clarification questions ...

So there's several parts to this problem - the need to include the full publisher information and isPartOf links to the registry, the constrained XML serialization and the need for additional terms such as skos:definition.

The publisher and registry link information can't be included in a register definition without modifications to the codebase. You can uploaded embedded information by using blank nodes but the ROR validator doesn't accept those and indeed the ROR spec explicitly asks for URIs as well as inline descriptions of them. It would be possible to have separate registers (e.g. under structure) to hold the publisher descriptions and the overall registry description and then automatically pull those into the register payload when responding to a ROR request.

After some experimentation I think the XML serialization issue itself is solvable. The validators have a very narrow view of RDF/XML but we can configure the Jena RDF/XML writer to pull skos:Concept and skos:ConceptScheme to the top level. With that then a sample register, with manually added publisher metadata and registry metadata, passes the validator.

The ROR specification includes a separate media type for ROR files so the right approach is probably to add a _format=ror option which would both trigger the enrichment of the return payload (to pull in the publisher and registry links) and then set the media type to application/x-ror-rdf+xml. A new marshaller could then handle that media type by setting the right parameters for the RDF/XML writer.

The ROR requirements over use of skos:prefLabel instead of rdfs:label for registers, the need for a skos:definition of a register etc are solvable by simply including those in your register definitions, no change required to support those.

Some questions:

  1. The ROR specification limits cardinality of dct:title, skos:prefLabel etc to 1..1. This means that you can not have multilingual labels on a ROR registry, register or individual concepts. However, the validators do not check this and payloads with multilingual labels do pass validation. Can we rely on this? Can we assume that whatever software consumes this can cope with multilingual labels?

  2. Is your requirement to federate all registers (or at least all under /def) or do you want selective control so that you explicitly enumerate the registers to be included in the federation? The controlled option would be easier to implement since we would then just need a system register to represent the overall catalog to be federated. That would also be the place to include the top level registry metadata, which would be convenient. However, you would then need to manually add each new register to be federated into the catalog register.

@afeliachi
Copy link
Collaborator

afeliachi commented Nov 7, 2019

Thank you very much Dave, the solution of a separate registers (e.g. under structure) seems the most suitable way I think. We agree with your proposition of a selective control using this separate register. I think it's easier for the dev and better for us since we want to have an individual control on what registers should be pushed through ror. It's even better this way since we can have different properties (e.g. producers and different periodicities) from one register to antother.
Concerning your question about dct:title, skos:prefLabel, it's seems more logical to us that the software consuming the ror payload should handle multilingual labels. We'll check with jrc people why the cardinality is limited to 1..1.
we also agree with the _format=ror and application/x-ror-rdf+xml media type solutions.

@sgrellet
Copy link
Member Author

JRC reply: Currently the RoR is just using the English language as first choice. If this is not found, it tries to read the first occurrence without specifying the language.

@afeliachi
Copy link
Collaborator

@der have there been any advances about this issue.
I see that the dedicated dev branch hasn't moved since December, so I assumed not. I tried it thought with the content negotiation as explained in the wiki preview , it didn't work.
I think I missed something in our exchanges. Do you have an idea when this will be finished please?
Thanks a lot

@simonoakesepimorphics
Copy link

@afeliachi I am currently working on this issue now that the internationalisation feature appears to be in a good state. I expect that it will be ready to test by the end of this week.

@afeliachi
Copy link
Collaborator

Thanks Simon. We would really appreciate that.

@simonoakesepimorphics
Copy link

The branches for this feature on registry-core and registry-config-base are ready to test (5-ror-format).
They are documented here.

I had to differ from some of the plans discussed above in order to make sure that the produced XML would conform to the given templates (the default serialisation tends to produce valid RDF but not in the required structure). In particular, the registry descriptor and the root ConceptScheme of the register descriptor will be rendered with only the properties that are relevant to the INSPIRE format. This is currently a hard coded list, so please let me know if you need this to be configurable or if there are specific properties you would like to include.

@afeliachi afeliachi moved this from Priority1a to To test in Registry-core whishlist prioritization Jul 6, 2020
@simonoakesepimorphics
Copy link

Have you made any progress in testing this?

@afeliachi
Copy link
Collaborator

Sorry Simon I thought I responded to this.
Actually there are few issues with the payload returned:

  1. The content doesn't meet the conformance scheme of having rdf:description markups instead of typed markups (skos:ConceptScheme & skos:Concept).

  2. Registry descriptor: when requesting the ror description of the root ( the registry) the response doesn't provide registry descriptor as specified by the conformance class.

  3. When loading data for the first time inScheme properties are lost (looks like an old bug and not for this version only)

thank you

@simonoakesepimorphics
Copy link

@afeliachi OK, I will try to look into this in the next week or so.

@simonoakesepimorphics
Copy link

@afeliachi

  1. I'm not aware of the specification requiring those resources to be written as rdf:descriptions, and the XSL validators succeed even if they are not - could you point me to which part of the spec requires this?
  2. As mentioned in Dave's comment on 7/11/2019, the registry descriptor should be located under /structure and signified by having the dcat:Catalog type. The suggested setup is documented here.
  3. I haven't encountered this bug before, could you provide an example of an RDF payload where this occurs?

@afeliachi
Copy link
Collaborator

Hi Simon
Am currently retesting this. We'll be back to you ASAP
sorry for the delay

@afeliachi
Copy link
Collaborator

Hi Simon

  1. regarding the rdf:description pattern you can see it in the examples provided in the requirement, but you are right the XSL validator accepts type markups also, so no problem with that, sorry.

  2. I was aware of the configuration recommendation of where to locate the register with the dcat:Catalog type. But we hoped that the payload would expose the registry (root) URI as the URI of the catalog
    so instead of having

<dcat:Catalog rdf:about="http://localhost:8080/ldregistry/structure/catalog">

It is necessary to have

<dcat:Catalog rdf:about="http://localhost:8080/ldregistry/">

So I tried to edit the root directly, by adding the dcat:Catalog type and the necessary conformance class properties, but the problem is that I am not able of editing the root registry description. I encounter the following message

Proposed notation for item is not a legal pchar or starts with '_' -

I get the same response with (PATCH request) to update the root description. Do you have any suggestion on that.

  1. Regarding the skos:inScheme loss problem, I think it's a general problem, and worth a separate issue with examples, will do that. We can disregard this in this ticket.

@simonoakesepimorphics
Copy link

Thanks @afeliachi , I will look into point 2 this week.

@afeliachi
Copy link
Collaborator

Thank you @simonoakesepimorphics

@simonoakesepimorphics
Copy link

simonoakesepimorphics commented Nov 18, 2020

I've updated the branch to satisfy the requirement that the registry descriptor should have the root register as its root element rather than the /structure/catalog register.

The registry description and dataset catalog resides on the /structure/catalog register as before, and you can access it from either the root register (ROR format) or the catalog register itself. However, the result will always use the root register as the root of the graph and transpose the registry description from /structure/catalog onto it. This means that you will not need to modify the root register in any way.

Register descriptors should work the same as before.

I've updated the wiki page to reflect this.

@afeliachi
Copy link
Collaborator

Hi @simonoakesepimorphics
Just finished the tests. Everything looks perfect. Thank you for this last edit.
@sgrellet I close the ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants