-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value type, value, and more #61
Comments
Question: How many columns are needed? That is, do we need to designate separate columns for valueType type - the type of the value type, such as if pick lists - do these need their own column? concept schemes, aka URI stems - there is no standard RDF value type for URI stems, although this does exist in both SHACL and ShEx; does this need its own column? Or can we designate sx:URIstem as a valueType even though that is not standard? node type - RDF has (Note: feel free to add other questions below - these are the ones from my meeting notes.) |
Where does a statement that the value should meet a definition of an entity shape (or one from a list of entity shapes) defined in the AP fit? |
I don't understand how one would define an entity shape other than what we have in the table. Can you give an example of what you are looking for? |
yes, the entity shape is defined in the table. We have examples that refer to entity shapes as constraints on values in the Value Space column, e.g. in bookclub
When processing the AP, such a constraint needs different treatment to a pick list of literal or URI values (I think). That doesn't necessarily mean we need a different column, but it does require consideration. |
OK, I think I get what you mean. The Type = URI would mean that the type of the value in instance data is a URI, thus @author would need to be a URI (or a bnode). Maybe what we need to do is play with some pseudo code or simply natural language statements of what these types and values mean to clarify our thinking. I also think that we should create some instance data examples to make this all more concrete, maybe even working back from some plausible instance data to a profile that would define it. |
Better late than never, I am chiming in after this group has been running for some time now. The background being that I picked up reading about APs and this group when preparing a talk about an LRMI profile we are developing for German-speaking implementors. See the slides to the talk (in German) at http://slides.lobid.org/kim-ws-2020/. I am also contributing to the LRMI task group that is chaired by @philbarker.
I recommend to also gather examples that are a bit off and thus invalid so that you have to make sure the AP catches them. For the above mentioned profile, which is currently embodied in a JSON(-LD) schema, we gather valid and invalid examples and the schema is automatically tested against those with every commit (see the test.sh which is executed by Travis). This setup (which we copied from reconciliation-api/specs@6b5985d) makes it easy to iteratively develop the schema and make it more and more restrictive/verbose. Whenever I encounter/think of an invalid example that is not catched by the profile, I add it to the |
There are now instance examples for bookclub (data.ttl) and recipe (guided_recipe.json). Neither is meant to illustrate anything specific, they were just what I had to hand. Both should conform to the relevant AP. The recipe example is interesting in that, as is typical of schema.org instances for Google, it's bnodes from top to bottom. |
Might some use case based requirements also help? E.g. "I need to know whether something is a BNode or a URI because ...." "If you give me the information that something is a Literal and that its datatype is xsd:date separately it will allow me to ...." |
In terms of Bnode v URI, I think that happens this way: BNODE Instance data with BNODE: using bookclub.csv ex:book1 _:author1 a sdo:Person ; URI Instance data with a URI, using this profile [Tom: corrected format]:
Instance: ex:book1 In natural language, the internal links using "@" are BNODES in RDF; if the value is to be a URI then you either do:
|
oof, that's a big extra assumption, and is not in line the example that I provided e.g.:
I would think it is a quite common case that you would want to maintain your own data for more than one entity type, and would want to allow people to submit data as graphs covering all the relevant entities. |
Phil, in your case, then "author:" is a prefix that has to be defined in your prefix declarations as standing in for some URI, right? Which would mean that |
Yup, Another case is where I want to use data from a service (something like wikidata) but need to check that it is sufficiently complete in order to decide whether to use it as is or to supplement it. Sorry, but the assertion that "@"-referenced node is always a BNode seems sudden and arbitrary. |
Last night I realized that my example is actually perfect for why a separate entity link may be needed. And it's a real use case. The library world has URIs for agents of various types, but they are all under a single URI scheme. If you want only one type of agent, say "Person", you have to look beyond the URI to the data associated with that URI. However, the "link" of a URI stem in the profile could be ambiguous. Here's an attempt to map this [Tom: corrected format]:
What this says to me is that the URI stem used as a value might not work when linking entities within the profile. If your value is a URI stem, how can you reuse that within the profile? Should you? Can anyone think of a way to work around this? Also, should entities be exclusively "@" names? Are there instances where a URI could be used for an entity name? |
@philbarker What you are describing to me is a URI and you show it as a URI (author:001), not an "@" entity. As a URI it seems fine. The "@" entities, as I have seen them, are internal and we haven't developed a way to associate them with URIs. It seems that if you are making use of URIs then you have a defined URI scheme in your prefix declaration. Your example above does not use "@" notation. Where do you see "@" notation fitting in to your example? If it represents a URI, how would that work? |
@kcoyle That example above is instance data, I don't see why there would be any the example shows the value for |
I see the following two things as fundamentally different (even "disjoint"):
Taking @philbarker 's example:
One might read this as meaning that we should expect to find an entity shape ID as a value in the instance data, whereas I think the intent is to say two quite different things:
In other words, the value (a URI) associated with
which I intend to mean:
|
@kcoyle I took the liberty of fixing the format in two of your examples above (the mandatory and repeatable columns did not align). If I correctly understand what you intend the book example to mean, one might express it in the template as follows:
|
I think @tombaker and I are saying the same thing. I would go one step further, and suggest that saying a literal used as a value must conform to |
I too believe that it make sense to put it in the same column unless there is a use case where filling out both columns makes sense. @tombaker provides one in #61 (comment) but I am not sure whether this is necessary. Thinking about this, I realize I see two – somehow related – problems in the current profile examples:
Re. 1.) I would rather like to see something like
|
That's why I like the node kinds as mentioned by Karen (who I think meant "ShEx", not "RDF", which has just three, the fourth in ShEx, "non-literal" meaning, in effect, "iri or bnode"). |
Yes, sorry, I got the wrong meta level! But I do wonder what the advantage is to using an "@" node in the profile rather than the entity URI, since one exists. This gets us back to what our value column represents, which I think we need to hash out. I'll try to create some examples. |
(I also sent this as an email to the list, in slightly different formatting) Tom and I had an impromptu brainstorming session this morning, based on the github comment that I made where I realized that the use of URIstems was problematic for our table structure. Tom responded with the thought that we might be using the value space for significantly different things. (Tom, speak up if I mis-characterize your view!) Adrian also weighed in on this. During our chat Tom and I fiddled around with a google spreadsheett trying out various ideas. We did NOT solve the problem but at least agreed on what we thought was unresolved. Here are some things you may notice about the spreadsheet (which I based on a section of Phil's example):
Problems remain in this table. My main concern is that it might seem odd to profile developers to have the key information about the author (line 6) separate from the author entity information on lines 8 and 9. We tried to come up with a way to have an author entity that has all of the author information, somehow moving the value type of author entity and the constraint to the @author entity node of the table. What we came up with was an unattractive kludge. If you have ideas on how to solve this, please speak up and/or copy what is hear and make the modifications that you think will work. kc |
@kcoyle Excellent summary! Naming issues aside, I think this iteration makes three key improvements:
One other detail: Instead of using |
I agree that this cannot be expressed in any of the variants of the CSV model we are discussing, but in general, I think we should not try to overload the model with too many features and, specifically in this case, I think the omission is justified. In the current model, constraints on nodes can only be expressed for nodes in the object position, but to support the specification of a URI stem for the subject URI, one would need to add another two columns to the model:
But would the gain in expressivity be worth the loss of simplicity? If the CSV model were so complex, why not just learn ShEx (which distinguishes between node constraints and triple constraints)? The "unattractive kludge" to which Karen alludes above would be to use, say, |
I agree with keeping it simple.
I am not convinced. Don't you think there will be use cases where people want to add constraints on the URI of the top-level node (which in the example is
I am not sure it has to be this way. Maybe we could find a way to put in in the current model In #61 (comment), I used the
This could easily mean, the URI for an instance node of |
@acka47 That's an interesting twist! In this case, I find it confusing that We did decide awhile ago to "develop an RDF-specific model, back it up with a variety of example profiles, each with some instance data that we can validate with ShEx. Once that is solid, we can go back and see if the same template can work with other data types, like XML, JSON". My first reaction is that treating the identifier as a property, as you suggest above, should fit nicely into the model (apart from the question of punctuation) without compromising the use the CSV model for RDF-based profiles. Is there anything else in the way you use JSON-LD that is not already supported by the CSV model and that you think should be? |
If the table structure makes the use of URIstems difficult, fine: don't do string hacks on URIs, treat them as opaque identifiers with no internal semantics,--but I guess the boat has sailed on that I still don't understand why the emphasis is on APs being about properties. I know that a focus on properties is in the DCMI heritage, but an Application Profile mixes and matches existing Classes, Properties and value encoding schemes, and I believe we should treat them equally. I think the table should primarily identify which existing vocabularies are being used and which terms from these vocabularies are being used. So it should include namespaces, with local and global identifiers; classes being used, with local and global identifiers; properties being used, with local and global identifiers; and encoding schemes (concept schemes, syntax encoding schemes), with local and global identifiers. (The local identifiers are only needed if we want to make cross references within the AP.) |
Directly to my mind come the following two:
Re. 2.) you have to distinguish – from a JSON-LD view – a simple array ( |
Thinking about this, you can easily define a value shape
So it might make sense to add some syntactic sugar for ordered lists. |
Check.
Check (I think), as in the example above:
Check.
Taking the two separately:
I am assuming that the more common way to use a concept scheme in modern metadata would be simply to use the URI of a concept as value URI. I dunno - maybe there could be a Value Type like, say, |
@acka47 The shape IDs in rows 6 and 7 were in the "prefix" column (and mandatory/repeatable were not aligned) so I edited your post to move them under Value Shape ID.
Maybe so. Is this a common use case? |
Good question - I'm not sure we have covered those. Maybe |
With regard to bibliographic data, the use cases are contributors (e.g. |
@philbarker I would love to find another solution to URI stems! One that doesn't disrupt our model. I would appreciate hearing more detail around your:
To that end, here is an expression of the use cases.
The third one is an add-on that we haven't discussed but that I know is in use in at least one metadata application. If it doesn't fit with solutions to 1 and 2 we can discuss it another time. I'll also note a possibly dangerous thought that came up in earlier discussions, which is to allow the value column to include regex-type formulas. This is not something that we would expect from our most beginner profile developers, but might provide a passage from the simplest template to one that can express useful value rules like "date = > than 2000 but < than 2021". URI stems could be "ex:www.something.org*". But then we'd have to have a way to indicate that the value field contains a formula, rather like the "SUM=" in spreadsheets. Thanks! |
I added a second sheet to the google spreadsheet to show the "solution" that gathers all of the author information in a single "shape". I do like the idea of saying that PropertyX has as its object ShapeY and all of the value constraints are in the ShapeY rather than giving value constraints on the PropertyX row. (That'll be clearer when you look at the spreadsheet.) This solution is really unrelated to the URIstem problem, and you can see it in the @Series shape as well. In short, this suggests: A property either has a value as its object, or it has a shape as its object, but not both. |
@kcoyle I think for those requirements it is better to resolve the identifier and check for statements like |
@philbarker Is this what you mean?
@kcoyle This would perhaps address:
|
How about |
@kcoyle @acka47 To summarize, I'm seeing two possible modeling patterns for recording the identifier of the subject described by a given shape, both of which put the URI in the object position:
Aside from the unfortunate use of |
Joining both the discussions on Regex and URIStems together: I think one could completely get along without the
|
I completely understand that you don't want to use I think it makes sense to use such a specific keyword that means "using this keyword means all statements are made about the subject URI", but I don't mind using another token for the keyword – as long as it is clearly distinguished from the RDF properties. That's where I see the problem with |
Interesting idea! So it could be something like Other than this keyword, and aside from a controlled "starter vocabulary" of Value Types (which we clearly need), can we think of other cases where such a keyword might provide some syntactic sugar for edge cases? I wouldn't want to see us get too fancy with special keywords, but I'd be curious if we do see any. For example, how strong is the requirement for a profile (or its shapes) to be able to reference themselves (e.g., |
@philbarker you asked:
and the answer is a resounding "yes!" - those use cases would be very helpful. Thanks. |
Adrian, we talked about this early on, before we decided to work first on the very simplest of cases. In that early thinking this would indeed be in the value space, and it could be fed pretty directly into a ShEx schema. We could "allow" this in our simple schema and give a few simple examples of common needs like constraining dates or numbers. I'm thinking that we may want to include features like this as extensions of the simpleAP model, not as part of the very simple base. And I am still hoping that we can go beyond the simplest model, at least as extensions and examples of more complex cases. Anyone else have comments on this? |
@kcoyle I fully agree that we need to keep the model simple but think we have some wiggle room between saying that the To follow the former (stricter) interpretation would be to limit ourselves to things such as:
because the URI However, if we were to say that the nature of the value constraints is specific to a given value type, then we could accommodate things like:
In our planned "starter" vocabulary of value types, then, we would need to clarify, for each value type, the nature of expected value constraints (eg, actual URIs and literals in the former examples; base URIs, datatype URIs, and regular expressions in the latter examples, along with any formatting rules such as "enclose lists with square brackets" or "enclose regular expressions in parentheses"). We would need to make a number of somewhat arbitrary decisions such details (e.g., can any value type be turned into a picklist by enclosing the set of alternative value constraints in square brackets?). And I still see no elegant way to accommodate more complex expressions such as "URI or Literal |
oh, I don't think that helps. I don't need the square brakets to tell me it is a picklist if the Value Type does that, and I don't want the square brackets if I am processing the Value Constraint string in python because without them I can just use str.split() on the value. |
It's fine with me to separate multiple items in a |
Taken up in dcmi/dctap#5, which links to here for discussion |
One of the key things that we have to decide is how to define the rules or constraints for values that will be applied to instance data created according to the profile. Our current template has:
value type: Akin to rdf:type, this designates the general data type expected for the value, such as
xsd:date
orxsd:anyURI
value: This column contains further constraints on the value itself. An example could be a pick list of literal values ("red" "blue" "green"). If there are no more specific constraints on values beyond the value type, this column is left blank.
Some questions follow.
The text was updated successfully, but these errors were encountered: