Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to specify identifier for the metadata record #245

Open
smrgeoinfo opened this issue Jun 22, 2023 · 13 comments
Open

how to specify identifier for the metadata record #245

smrgeoinfo opened this issue Jun 22, 2023 · 13 comments

Comments

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented Jun 22, 2023

building on #210, Here's a discussion thats come up in a CODATA WG. SOSO should weigh in on this issue in the guidelines.

A metadata record has two parts; one part is about the metadata record itself, the other part is the content about the resource that the metadata documents. The part about the record specifies the identifier for the metadata record, agents with responsibility for the record, when it was last updated, what specification or profiles the metadata serialization conforms to, and other optional properties that are deemed useful. The metadata about the resource has properties about the resource like title, description, responsible parties, spatial or temporal extent (as outlined in the Metadata Content Requirements section).

Schema.org includes several properties that can be used to embed information about the metadata record in the resource metadata: sdDatePublished, sdLicense, sdPublisher, but lacks a way to provide an identifier for the metadata record distinct from the resource it describes, to specify other agents responsible for the metadata except the publisher, or to assert specification or profile conformance for the metadata record itself.
There are two patterns that could be used to structure the two parts of the metadata record:

Option 1. The root object is the described resource:

{    "@context": "https://schema.org",
    "@id": "ex:URIforDescribedResource",
    "@type": "ImageObject",
    "title": "Picture of analytical setup",
    "description": "Description of the resource".
    "subjectOf": {
        "@id": "ex:URIforTheMetadata",
        "@type": "DigitalDocument",
        "dateModified": "2017-05-23",
        "encoding": {
            "@type": "MediaObject",
            "dcterms:conformsTo": "https://example.org/cdif-metadataSpec"
          }
        "about":{"@id":"ex:URIforDescribedResource"}
    },
}

Option 2: root object is the metadata record

{   "@context": "https://schema.org",
    "@id": "ex:URIforTheMetadata",
    "@type": "DigitalDocument",
    "dateModified": "2017-05-23",
    "encoding": {
        "@type": "MediaObject",
        "dcterms:conformsTo": "https://example.org/cdif-metadataSpec"
       },
     "about": {
         "@id": "ex:URIforDescribedResource",
         "@type": "ImageObject",
         "title": "Picture of analytical setup",
         "description": "Description of the resource",
         "subjectOf":"ex:URIforTheMetadata"
       }   }

The rdf triples generated by these two approaches are identical, so if the metadata are always harvested to a triple store it makes no difference. However, allowing either approach would create interoperability problems for harvesters that are parsing the metadata as JSON-- the paths to the same metadata elements are different in the two approaches. It is our judgment that option one above (root object is the described resource) is the more widely used serialization, commonly without specifying the metadata record specific properties, or using the schema.org ‘sd...’ properties to provide some of the metadata ‘metadata’.

what should be the recommended serialization?

@datadavev
Copy link
Collaborator

JSON-LD is a serialization of RDF, so is describing a graph. I'm not sure there's a definitive "root object" in these examples. Instead there's a graph of related nodes, any of which could be considered a root to start traversal (except perhaps the MediaObject anonymous node). So it is pretty much always incorrect to treat a json-ld document as a plain json document since inferred JSON semantics such as list ordering don't apply and vice-versa, semantics such as graph structure provided by json-ld parsing rules are unknown by a json parser. It's much the same as trying to use xpath to process RDF-XML documents, it'll work for specific examples but fails in the general sense.

@datadavev
Copy link
Collaborator

I believe this is the type of issue that is addressed by JSON-LD Framing 1. So instead of suggesting a preferred pattern of serialization (generally cumbersome when serializing from an RDF source), it may be more appropriate to suggest a frame document to apply when inspecting from a perspective. If the desired form is option 1 above, then use a frame like:

{
  "@context": {
    "@vocab":"http://schema.org/",
    "subjectOf":{"@reverse":"about"}
  },
  "@type": "ImageObject",
  "subjectOf": {}
}

This places the ImageObject at the top level which may make parsing with plain JSON a little more tractable.

Footnotes

  1. https://www.w3.org/TR/json-ld11-framing/

@smrgeoinfo
Copy link
Contributor Author

Makes sense to address the alternate serializations; it only matters if you're parsing the metadata as JSON. That there needs to be an identifier for the metadata digitalObject distinct from the thing it describes is what I should have emphasized.

@datadavev
Copy link
Collaborator

datadavev commented Aug 17, 2023

Reading through the examples above, it strikes me that neither option 1 or 2 above provide an identifier for the described resources. So although this issue seems to be about preferred serialization pattern for nested documents, it may be helpful to make the example a little more complete by adding schema:identifier properties to the DigitalDocument and the ImageObject. That makes a clear statement that the entities described by the graph have those identifiers. Otherwise one might infer that the @id values (i.e. the graph node identifiers) are equivalent to the identifiers for the objects being described, which seems incorrect. The @id value is an identifier for a node in the graph, schema:identifier is an identifier for the thing being described by the graph.

So rewriting option 1:

{    
    "@context": "https://schema.org",
    "@id": "ex:URIforImageObjectNode",
    "@type": "ImageObject",
    "title": "Picture of analytical setup",
    "description": "Description of the resource",
    "identifier": "ex:URIforDescribedResource",
    "subjectOf": {
        "@id": "ex:URIforDigitalDocumentNode",
        "@type": "DigitalDocument",
        "dateModified": "2017-05-23",
        "identifier": "ex:URIforTheMetadata",
        "encoding": {
            "@type": "MediaObject",
            "dcterms:conformsTo": "https://example.org/cdif-metadataSpec"
          }
        "about":{"@id":"ex:URIforImageObjectNode"}
    }
}

makes it clear that ex:URIforImageObjectNode is the node identifier for the graph about the ImageObject with identifier ex:URIforDescribedResource, and that ImageObject is the subject of a DigitalDocument. That DigitalDocument is in turn described by the graph with node identifier ex:URIforDigitalDocumentNode and the document itself has an identifier ex:URIforTheMetadata.

@datadavev
Copy link
Collaborator

Note that conceptually, the graph ex:URIforImageObjectNode and the document identified by ex:URIforTheMetadata (i.e. the document itself, not the ex:URIforDigitalDocumentNode node in the graph) fill the same role. They both describe the image identified by ex:URIforDescribedResource. In fact, the contents of the graph ex:URIforImageObjectNode would perhaps ideally be generated from the content of the document ex:URIforTheMetadata, since that document is presumably the authoritative source of information about the image ex:URIforDescribedResource.

@smrgeoinfo
Copy link
Contributor Author

I don't get the distinction between the 'graph' and the 'document' In my understanding the digitalDocument (in my example) is the metadata describing the resource (image object in the example). This metadata is represented using rdf -- a logical graph. The document and graph are the same thing.

The resource (image object) is described by rdf triples-- statement in which the image is the subject, some property is the predicate, and a value is the object. I expect the uri for the described resource to be the subject of this statements.

Converting the JSON-LD in @datadavev example:

<ex:URIforDigitalDocumentNode> sdo:about <ex:URIforDescribedResource> .
<ex:URIforDigitalDocumentNode> sdo:dateModified "2017-05-23"^^ sdo:Date .
<ex:URIforDigitalDocumentNode> sdo:encoding _:b0 .
<ex:URIforDigitalDocumentNode> sdo:identifier "ex:URIforTheMetadata" .
<ex:URIforDigitalDocumentNode> rdfs:type sdo:DigitalDocument .
<ex:URIforImageObjectNode> sdo:description "Description of the resource" .
<ex:URIforImageObjectNode> sdo:identifier "ex:URIforDescribedResource" .
<ex:URIforImageObjectNode> sdo:subjectOf <ex:URIforDigitalDocumentNode> .
<ex:URIforImageObjectNode> sdo:title "Picture of analytical setup" .
<ex:URIforImageObjectNode> rdfs:type sdo:ImageObject .
_:b0 dcterms:conformsTo "https://example.org/cdif-metadataSpec" .
_:b0 rdfs:type sdo:MediaObject .

What does ex:URIforImageObjectNode actually identify. Do the statements about ImageObjectNode make sense? The digitalDocumentNode is the metadata record-- a digital document.

@smrgeoinfo
Copy link
Contributor Author

here are the triples for my example 2 (syntax buggered up to make the lines shorter)

<ex:URIforDescribedResource> sdo:description "Description of the resource" .
<ex:URIforDescribedResource> sdo:subjectOf  <ex:URIforTheMetadata> .
<ex:URIforDescribedResource> sdo:title "Picture of analytical setup" .
<ex:URIforDescribedResource> rdfs:type sdo:ImageObject .
<ex:URIforTheMetadata> sdo:about <ex:URIforDescribedResource> .
<ex:URIforTheMetadata> sdo:dateModified "2017-05-23"^^sdo:Date .
<ex:URIforTheMetadata> rdfs:type  sdo:DigitalDocument .
<ex:URIforTheMetadata> sdo:encoding   _:b0 .
_:b0 dcterms:conformsTo "https://example.org/cdif-metadataSpec" .
_:b0 rdfs:type sdo:MediaObject .

this seems a lot clearer to me, the graph nodes have the same identifier as the thing they represent.

@datadavev
Copy link
Collaborator

Sure, you could do that, but I think it is incorrect to always infer that @id has the same purpose as schema:identifier. To me at least it is much clearer that @id refers to statements about a thing and schema:identifier refers specifically to the thing.

@datadavev
Copy link
Collaborator

@smrgeoinfo In your example, what is returned when resolving ex:URIforDescribedResource?

@smrgeoinfo
Copy link
Contributor Author

smrgeoinfo commented Aug 18, 2023

I'd argue that what you get when you resolve a URI depends on what it identifies, and the conventions of the identifier scheme.
In the example above, the resource is typed as an sdo:ImageObject, defined as "An image file", so the resource the URI identifies is a Digital Object. In general, I'd expect the default URI resolution to get that digital object. Content negotiation or signposting links might provide access to metadata. The metadata example doesn't include any distribution information, but the convention I like is that if the metadata is about a DigitalObject, then the sdo:URL in the 'about' section would get that digitalObject.

Things are much more interesting if the metadata is about a non-digital object that might have multiple digital representations. Then the distribution section is critical.

@datadavev
Copy link
Collaborator

So I managed to get myself confused about a subject of the graph (i.e. ex:URIforDescribedResource) and the graph itself (i.e. the entire JSON-LD document). Steve is correct that the @id property (the Node ID) is the subject of the various statements contained therein.

Perhaps one approach is to consider is that we are making statements about the graph ex:URIforDescribedResource. That is, we want to make statements about the graph that describes the described resource. The JSON-LD spec § 4.9 Named Graphs describes this scenario, and following that pattern the structure would be like:

{    
	"@context": "https://schema.org",
	"@id": "ex:URIforTheMetadata",
	"@type": "DigitalDocument",
	"dateModified": "2017-05-23",
	"encoding": {
		"@type": "MediaObject",
		"dcterms:conformsTo": "https://example.org/cdif-metadataSpec"
	},
    "about":{"@id":"ex:URIforDescribedResource"},
	"@graph": [
		{
		    "@id": "ex:URIforDescribedResource",
		    "@type": "ImageObject",
		    "title": "Picture of analytical setup",
		    "description": "Description of the resource",
		    "subjectOf": {"@id": "ex:URIforTheMetadata"}
		}
    ]
}

This results in quad statements like:

<ex:URIforDescribedResource> <http://schema.org/description> "Description of the resource" <ex:URIforTheMetadata> .
<ex:URIforDescribedResource> <http://schema.org/subjectOf> <ex:URIforTheMetadata> <ex:URIforTheMetadata> .
<ex:URIforDescribedResource> <http://schema.org/title> "Picture of analytical setup" <ex:URIforTheMetadata> .
<ex:URIforDescribedResource> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ImageObject> <ex:URIforTheMetadata> .
<ex:URIforTheMetadata> <http://schema.org/about> <ex:URIforDescribedResource> .
<ex:URIforTheMetadata> <http://schema.org/dateModified> "2017-05-23"^^<http://schema.org/Date> .
<ex:URIforTheMetadata> <http://schema.org/encoding> _:b0 .
<ex:URIforTheMetadata> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/DigitalDocument> .
_:b0 <http://purl.org/dc/terms/conformsTo> "https://example.org/cdif-metadataSpec" .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/MediaObject> .

This provides that statements about ex:URIforDescribedResource are being made in the context of the named graph ex:URIforTheMetadata, and that named graph has some properties describing it (e.g. http://schema.org/dateModified).

I don't think there are defined semantics for the intent of a named graph other than providing a context for statements to be made about the container of the graphs. Using the so:about and so:subjectOf statements makes the relationship between the two subjects clearer.

Does this approach provide any benefit over the arguably simpler alternative construct?:

{
	"@context": "https://schema.org",
	"@id": "ex:URIforTheMetadata",
	"@type": "DigitalDocument",
	"dateModified": "2017-05-23",
	"encoding": {
		"@type": "MediaObject",
		"dcterms:conformsTo": "https://example.org/cdif-metadataSpec"
	},
    "about":{
		"@id": "ex:URIforDescribedResource",
		"@type": "ImageObject",
		"title": "Picture of analytical setup",
		"description": "Description of the resource",
		"subjectOf": {"@id": "ex:URIforTheMetadata"}
    }
}
<ex:URIforDescribedResource> <http://schema.org/description> "Description of the resource" .
<ex:URIforDescribedResource> <http://schema.org/subjectOf> <ex:URIforTheMetadata> .
<ex:URIforDescribedResource> <http://schema.org/title> "Picture of analytical setup" .
<ex:URIforDescribedResource> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ImageObject> .
<ex:URIforTheMetadata> <http://schema.org/about> <ex:URIforDescribedResource> .
<ex:URIforTheMetadata> <http://schema.org/dateModified> "2017-05-23"^^<http://schema.org/Date> .
<ex:URIforTheMetadata> <http://schema.org/encoding> _:b0 .
<ex:URIforTheMetadata> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/DigitalDocument> .
_:b0 <http://purl.org/dc/terms/conformsTo> "https://example.org/cdif-metadataSpec" .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/MediaObject> .

@ksonda
Copy link

ksonda commented Aug 25, 2023

The latter approach is for the most part what we do in https://geoconnex.us, which is based on SELFIE

@smrgeoinfo
Copy link
Contributor Author

smrgeoinfo commented Aug 25, 2023

After discussion at monthly group meeting, I'll edit guide text and create PR, based on the option 2 approach (same as second approach in @datadavev comment above). I think it should go in GETTING-STARTED.md because it applies to any SOSO metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants