Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shapes and hierarchical metadata schemes #65

Open
kcoyle opened this issue Jan 21, 2022 · 7 comments
Open

Shapes and hierarchical metadata schemes #65

kcoyle opened this issue Jan 21, 2022 · 7 comments
Labels

Comments

@kcoyle
Copy link
Collaborator

kcoyle commented Jan 21, 2022

First, a shape as we see it in RDF, and how it is rendered in the TAP:
valueshape

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jan 21, 2022

A schema that uses a hierarchy with nested elements, like XML, does not have an arc with a property that connects the shapes - it instead uses the nesting. To connect them correctly in the same way they are linked in the metadata schema, we may need an additional column that indicates "sub" or "super" relationships.
valueshapexml

@philbarker
Copy link
Collaborator

If your top diagram represents instance data, I think the ovals should be «Book» and «Author» (instances of the Book and Author class), and the shapeIDs should be BookShape and AuthorShape, and we have no way in TAP to say how we know to apply BookShape to instance of Book or AuthorShape to instances of Author. Though I have thought about trying to draw a TAP as if it were paper with bits cut out where the arrows and ovals of an RDF instance diagram should be. To my mind then, there's no difficulty of not having shapes in XML data because they are not in RDF data either.

To find the commonality between XML, RDF and TAPs it might be worth thinking of something more like a UML diagram: I think this one mostly works for the same example (not exactly, it's one I made earlier)
Simple book application profile (1) Roughly speaking, the shapes are the boxes (shapeID not shown, but how to relate them to data classes is shown), and statement constriants are either arrows joining the boxes or attributes listed in the lower section of the box.

So whereas shapes in the tap define rules for statements in RDF (the property to use as the predicate, and possible values for the objects), for XML shapes defines rules for the content and attributes of elements. Where the tap represents a relationship between two objects, that is represented in XML as nested elements, i.e. the content of one element is other elements.

More fully:

  • StatementConstraint: defines rules for a part of the content of an XML element or attribute.
  • Shape: set of rules for element and attribute content.
  • propertyID: the element or attribute for which the rules are defined: this might be in the form of an XPath to where you would find it.
  • valueNodeType: content type, simple text or nested element for elements, text for attributes.
  • valueDataType: for xsd data type for simple text content.
  • valueConstraint: other rules for values of simple text.
  • valueShape: the shape to use when a value is nested XML elements

There is slightly gnarly point around StatementConstraint and Shape being very similar, which results from XML relying on nesting rather than other relationships, so statements become individual parts of the content of elements.

BTW, the diagram above is drawn in lucidChart, which allows export in CSV; I drew it because I'm pondering whether it would be possible to convert that export into a TAP / SHACL and so on. I guess if we could agree on how to map such a diagram to XML Schema (or schematron) and how to map it to TAP then we would be winning.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jan 24, 2022

@philbarker Thanks. Looking at what I did, above, even I don't agree with it. So here's my second attempt. First, RDF instance data followed by a TAP:
valueshape
Then XML data followed by a TAP. What I intended with this is to show that the authorShape is not a value for any properties/elements in the bookShape, so calling it valueShape is odd.
valueshapexml

@philbarker
Copy link
Collaborator

"authorShape is not a value for any properties/elements in the bookShape, so calling it valueShape is odd."

I could live with saying that

  • authorShape defines the content of the author (or creator) element
  • the content of an element is its value
  • the elements are "nodes" in the hierarchy

but perhaps I am missing the point. Also it's a long time since I did anything in XML so I've lost some of the idiom. I'm interested in what @johnhuck thinks.

@johnhuck
Copy link
Collaborator

johnhuck commented Feb 2, 2022

I'm not sure if I've absorbed all of the issues here, but I'm not seeing why you couldn't apply the same approach we use for regular shapes to nested structures:

bookShape | author | authorShape
authorShape | name |

Maybe there's something I don't understand. However, I don't find it odd to call authorShape a valueShape, because a TAP shape is only an informal entity that exists in the context of a TAP model. It isn't an RDF class, although we probably intuitively think of shapes as being like that. So for me, the purpose of the shapeID and valueShape columns are to reference each other, but that's it.

The problem I ran into (and tried to solve) with my other XML modelling attempt was that each element in a nested structure can potentially have a cardinality or other restriction, including wrapper elements, so in my solution I made sure each element could have its own row. That's maybe tangential to this question, but that's the background on my thinking.

@tombaker
Copy link
Collaborator

tombaker commented Feb 3, 2022

@johnhuck

I don't find it odd to call authorShape a valueShape, because a TAP shape is only an informal entity that exists in the context of a TAP model. It isn't an RDF class, although we probably intuitively think of shapes as being like that. So for me, the purpose of the shapeID and valueShape columns are to reference each other, but that's it.

+1 - especially: "the purpose of the shapeID and valueShape columns are to reference each other, but that's it."

Like @philbarker , I could live with the notion that the content of an element is its value or rather, I defer to XML experts as to whether this is a reasonable thing to say.

The Shape ID really just names a set of statement constraints for the purpose of making that set of statement constraints referenceable as a Value Shape. Full stop.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Feb 3, 2022

@johnhuck suggests using the xml element "author" as a property. That seems to retain the XML structure. (It also seems obvious! doh!) Presumably cardinality of the shape would be given on the row with the propertyID -- at least, that's how we've done it so far. But is there a better way? Also, would there be other restrictions on the shape that would need to be expressed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants