Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value type, value, and more #61

Closed
kcoyle opened this issue Jun 3, 2020 · 49 comments
Closed

Value type, value, and more #61

kcoyle opened this issue Jun 3, 2020 · 49 comments

Comments

@kcoyle
Copy link
Collaborator

kcoyle commented Jun 3, 2020

One of the key things that we have to decide is how to define the rules or constraints for values that will be applied to instance data created according to the profile. Our current template has:

value type: Akin to rdf:type, this designates the general data type expected for the value, such as xsd:date or xsd:anyURI

value: This column contains further constraints on the value itself. An example could be a pick list of literal values ("red" "blue" "green"). If there are no more specific constraints on values beyond the value type, this column is left blank.

Some questions follow.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 3, 2020

Question: How many columns are needed? That is, do we need to designate separate columns for

valueType type - the type of the value type, such as if xsd:date the type of the value type would be URL, whereas the expected instance data value is a formatted date ("2020-06-02")

pick lists - do these need their own column?

concept schemes, aka URI stems - there is no standard RDF value type for URI stems, although this does exist in both SHACL and ShEx; does this need its own column? Or can we designate sx:URIstem as a valueType even though that is not standard?

node type - RDF has nodeKind:("iri" | "bnode" | "nonliteral" | "literal". Are these needed in the template, and if so how are they to be used?

(Note: feel free to add other questions below - these are the ones from my meeting notes.)

@philbarker
Copy link
Collaborator

Where does a statement that the value should meet a definition of an entity shape (or one from a list of entity shapes) defined in the AP fit?
(My initial thought was that these constraints on entity-like values are similar to constraints on literal values such as xsd:date, and so could go in the same column.)

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 3, 2020

I don't understand how one would define an entity shape other than what we have in the table. Can you give an example of what you are looking for?

@philbarker
Copy link
Collaborator

yes, the entity shape is defined in the table.

We have examples that refer to entity shapes as constraints on values in the Value Space column, e.g. in bookclub

ID URI Label Type Value Space
sdo:author author URI @author
  wdt:P127 owner URI @owner

When processing the AP, such a constraint needs different treatment to a pick list of literal or URI values (I think). That doesn't necessarily mean we need a different column, but it does require consideration.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 4, 2020

OK, I think I get what you mean. The Type = URI would mean that the type of the value in instance data is a URI, thus @author would need to be a URI (or a bnode). Maybe what we need to do is play with some pseudo code or simply natural language statements of what these types and values mean to clarify our thinking. I also think that we should create some instance data examples to make this all more concrete, maybe even working back from some plausible instance data to a profile that would define it.

@acka47
Copy link

acka47 commented Jun 5, 2020

Better late than never, I am chiming in after this group has been running for some time now. The background being that I picked up reading about APs and this group when preparing a talk about an LRMI profile we are developing for German-speaking implementors. See the slides to the talk (in German) at http://slides.lobid.org/kim-ws-2020/. I am also contributing to the LRMI task group that is chaired by @philbarker.

I also think that we should create some instance data examples to make this all more concrete, maybe even working back from some plausible instance data to a profile that would define it.

I recommend to also gather examples that are a bit off and thus invalid so that you have to make sure the AP catches them. For the above mentioned profile, which is currently embodied in a JSON(-LD) schema, we gather valid and invalid examples and the schema is automatically tested against those with every commit (see the test.sh which is executed by Travis). This setup (which we copied from reconciliation-api/specs@6b5985d) makes it easy to iteratively develop the schema and make it more and more restrictive/verbose. Whenever I encounter/think of an invalid example that is not catched by the profile, I add it to the invalid folder and subsequently adjust the schema so that it catches the error.

@philbarker
Copy link
Collaborator

There are now instance examples for bookclub (data.ttl) and recipe (guided_recipe.json).

Neither is meant to illustrate anything specific, they were just what I had to hand. Both should conform to the relevant AP.

The recipe example is interesting in that, as is typical of schema.org instances for Google, it's bnodes from top to bottom.

@philbarker
Copy link
Collaborator

Might some use case based requirements also help? E.g. "I need to know whether something is a BNode or a URI because ...." "If you give me the information that something is a Literal and that its datatype is xsd:date separately it will allow me to ...."

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 5, 2020

In terms of Bnode v URI, I think that happens this way:

BNODE

Instance data with BNODE: using bookclub.csv

ex:book1
a sdo:book ;
sdo:name "Moby Dick" ;
sdo:author _:author1 .

_:author1 a sdo:Person ;
sdo:givenName "Herman" ;
sdo:familyName "Melville" .

URI

Instance data with a URI, using this profile [Tom: corrected format]:

ID URI label M O VT V
@book Book y y
rdf:type instance of y n URI sdo:Book
rdf:type instance of y n URI wd:Q571
sdo:name title y n Literal xsd:string
sdo:author author y y URIstem http://viaf.org

Instance:

ex:book1
a sdo:book ;
sdo:name "Moby Dick" ;
sdo:author http://viaf.org/viaf/27068555/ .

In natural language, the internal links using "@" are BNODES in RDF; if the value is to be a URI then you either do:

  • a specific URI (as in the Wikidata case)
  • a URI stem
  • or you leave value blank and you will accept any URI.

@philbarker
Copy link
Collaborator

the internal links using "@" are BNODES in RDF

oof, that's a big extra assumption, and is not in line the example that I provided e.g.:

book:002 a sdo:Book, wd:Q571 ;
    sdo:name "The Comedians" ;
    sdo:author author:001 ;
    wdt:P127 member:002 .                # owned by

author:001 a sdo:Person ;
    sdo:givenName "Graham" ;
    sdo:familyName "Greene" .

I would think it is a quite common case that you would want to maintain your own data for more than one entity type, and would want to allow people to submit data as graphs covering all the relevant entities.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 5, 2020

Phil, in your case, then "author:" is a prefix that has to be defined in your prefix declarations as standing in for some URI, right? Which would mean that author:001 is an entity that has been defined with a URI prior to the creation of the instance data. Or are you saying that the local data mints URIs "on the fly" for entities?

@philbarker
Copy link
Collaborator

Yup, author: is in the prefixes. The URI might be minted as part of the submission process (A workflow such as: when entering data, the user enters the name, the system checks whether the author with this name exists, if not it provides a new URI for the author)

Another case is where I want to use data from a service (something like wikidata) but need to check that it is sufficiently complete in order to decide whether to use it as is or to supplement it.

Sorry, but the assertion that "@"-referenced node is always a BNode seems sudden and arbitrary.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 6, 2020

Last night I realized that my example is actually perfect for why a separate entity link may be needed. And it's a real use case.

The library world has URIs for agents of various types, but they are all under a single URI scheme. If you want only one type of agent, say "Person", you have to look beyond the URI to the data associated with that URI. However, the "link" of a URI stem in the profile could be ambiguous. Here's an attempt to map this [Tom: corrected format]:

ID URI label M O VT V
@book Book y y
rdf:type instance of y n URI sdo:Book
rdf:type instance of y n URI wd:Q571
sdo:name title y n Literal xsd:string
sdo:author author y y URIstem http://viaf.org
http://viaf.org
rdf:type instance of Y n URI bf:Person

What this says to me is that the URI stem used as a value might not work when linking entities within the profile. If your value is a URI stem, how can you reuse that within the profile? Should you? Can anyone think of a way to work around this? Also, should entities be exclusively "@" names? Are there instances where a URI could be used for an entity name?

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 6, 2020

@philbarker What you are describing to me is a URI and you show it as a URI (author:001), not an "@" entity. As a URI it seems fine. The "@" entities, as I have seen them, are internal and we haven't developed a way to associate them with URIs. It seems that if you are making use of URIs then you have a defined URI scheme in your prefix declaration. Your example above does not use "@" notation. Where do you see "@" notation fitting in to your example? If it represents a URI, how would that work?

@philbarker
Copy link
Collaborator

@kcoyle That example above is instance data, I don't see why there would be any @ notation in the instance data?

the example shows the value for sdo:author is provided by a URI that indentifies an entity, the description for which is provided and conforms to the constraints defined by the AP entity shape @author.

@tombaker
Copy link
Collaborator

tombaker commented Jun 8, 2020

I see the following two things as fundamentally different (even "disjoint"):

  • URIs or BNodes used to identify subjects and objects in the instance data
  • identifiers for entity shapes, which are constructs of the application profile that are about the instance data but not actually appear in the instance data.

Taking @philbarker 's example:

ID URI Label Type Value Space
sdo:author author URI @author
  wdt:P127 owner URI @owner

One might read this as meaning that we should expect to find an entity shape ID as a value in the instance data, whereas I think the intent is to say two quite different things:

  • that the value associated with sdo:author must be a URI that identifies that author, and
  • the author identified by that URI must be described using properties and constraints as specified in the entity shape identified by @author.

In other words, the value (a URI) associated with sdo:author does not actually identify the construct in the profile that we are calling an entity or entity shape. Until now I had pictured that we might say: "Type: Entity, Value space: @author", as in the Wikidata "painting" example, but the example above makes it clear that this will not do. I hesitate to propose an extra column but think this distinction could be made alot more cleanly in something like in the following:

Entity Shape ID URI Label Type Value Space Entity Shape Ref
@book
sdo:author author URI @author
  wdt:P127 owner URI @owner
@author
foaf:name name Literal

which I intend to mean:

  • the object of the sdo:author statement is a URI (which identifies an author), and the fact that its Value Space is empty means that the URI is not further constrained (for example, to a specific URI or to a URI stem such as http://viaf.org).
  • the object of the sdo:author statement is described using a set of properties and constraints as specified by the entity shape @author.

@tombaker
Copy link
Collaborator

tombaker commented Jun 8, 2020

@kcoyle I took the liberty of fixing the format in two of your examples above (the mandatory and repeatable columns did not align). If I correctly understand what you intend the book example to mean, one might express it in the template as follows:

ESID URI Label Type Value Space ESRef Comment
@book
sdo:author author URIStem http://viaf.org @author
@author
rdf:type is instance of URI bf:Person Must be a BIBFRAME person

@philbarker
Copy link
Collaborator

I think @tombaker and I are saying the same thing.

I would go one step further, and suggest that saying a literal used as a value must conform to xsd:string is very similar to saying that entity [description] used as a value must conform to @author. I would (& have in my example) put them in the same column. I'm happy for a value space to be defined by conformance to a standard/spec/profile but if that's not what you have in mind maybe it's a distinct column.

@acka47
Copy link

acka47 commented Jun 8, 2020

I would (& have in my example) put them in the same column.

I too believe that it make sense to put it in the same column unless there is a use case where filling out both columns makes sense. @tombaker provides one in #61 (comment) but I am not sure whether this is necessary.

Thinking about this, I realize I see two – somehow related – problems in the current profile examples:

  1. using URI as type for nodes that can also be blank nodes
  2. not being able to define a URI stem for the subject URI (or do I oversee something?) but only for object URIs/nodes

Re. 1.) I would rather like to see something like node as value which basically means something like: "another entity for which at least one statement is included in the data". Together with 2.), the result could look something like this (where I use @id for defining the subject URI and make it optional so that it basically means: bnodes are ok but if you provide a URI it should be from VIAF):

ID URI Label Mandatory Repeatable Type Value Space Comment
sdo: http://schema.org/ schema.org
xsd: http://www.w3.org/2001/XMLSchema# XML Schema
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# RDF
@book Book
sdo:author author y n Node @author
@author Author
@id URI n n URIStem http://viaf.org
rdf:type instance of y n URI sdo:Person sdo:Organization must be person or organization
sdo:givenName given name y n Literal xsd:string
sdo:familyName family name n n Literal xsd:string

@tombaker
Copy link
Collaborator

tombaker commented Jun 8, 2020

@acka47

Thinking about this, I realize I see two – somehow related – problems in the current profile examples:

  1. using URI as type for nodes that can also be blank nodes

That's why I like the node kinds as mentioned by Karen (who I think meant "ShEx", not "RDF", which has just three, the fourth in ShEx, "non-literal" meaning, in effect, "iri or bnode").

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 8, 2020

the example shows the value for sdo:author is provided by a URI that indentifies an entity, the description for which is provided and conforms to the constraints defined by the AP entity shape @author.

Yes, sorry, I got the wrong meta level! But I do wonder what the advantage is to using an "@" node in the profile rather than the entity URI, since one exists. This gets us back to what our value column represents, which I think we need to hash out. I'll try to create some examples.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 8, 2020

(I also sent this as an email to the list, in slightly different formatting)

Tom and I had an impromptu brainstorming session this morning, based on the github comment that I made where I realized that the use of URIstems was problematic for our table structure. Tom responded with the thought that we might be using the value space for significantly different things. (Tom, speak up if I mis-characterize your view!) Adrian also weighed in on this. During our chat Tom and I fiddled around with a google spreadsheett trying out various ideas. We did NOT solve the problem but at least agreed on what we thought was unresolved. Here are some things you may notice about the spreadsheet (which I based on a section of Phil's example):

  1. Line 6 has the URI stem http://viaf.org as the "value constraint" (the column we've often called "value")
  2. We have added another column for the value shape linking identifier (@author)
  3. The statement in line 6 holds the property sdo:author, its cardinality, and value type and value
  4. The statement in line 9 gives the information that the author shape entity must be of type bf:c_Person (I'm making this up; just assume it is a legitimate RDF class for this entity)
  5. To show a use of the entity @book I added a new entity for the series in which a book may appear.
  6. The table has columns for namespace and namespace prefix (both Tom and I prefer that we not mix data types within a single column) and I moved it to the bottom of the table for purely editorial reasons; I think some human readers may be better served by presenting the "meat" of the profile before this type of detail.

Problems remain in this table. My main concern is that it might seem odd to profile developers to have the key information about the author (line 6) separate from the author entity information on lines 8 and 9. We tried to come up with a way to have an author entity that has all of the author information, somehow moving the value type of author entity and the constraint to the @author entity node of the table. What we came up with was an unattractive kludge. If you have ideas on how to solve this, please speak up and/or copy what is hear and make the modifications that you think will work.

kc

Here's a nearly-readable screen shot of the table:
Screenshot 2020-06-08 16 18 10

@tombaker
Copy link
Collaborator

tombaker commented Jun 9, 2020

@kcoyle Excellent summary! Naming issues aside, I think this iteration makes three key improvements:

  • As Karen points out, shifting the namespace declarations to the bottom and off to the side helps the reader by putting the spotlight where it belongs: on the main part of the profile.
  • Removing namespace declarations from columns A and B means that the columns must no longer do double duty ("ID" both for shape IDs and namespace prefixes, "URI" both for namespace URIs and for property URIs) and can have more precise headings: "Subject Shape ID" and "Property URI".
  • Splitting out "value shape ID" from "value constraints" (aka "value space") cleanly separates "things in the instance data" (e.g., URIs that identify people; dates; strings...) from pointers to the application profile construct known as "[entity] description [template]" or "[entity] shape". In other words, it separates things about (but not in) the instance data -- the shape IDs -- from things that are actually in the instance data (URIs and literals).

One other detail: Instead of using xsd:string to mean "string", it seems just a bit more user-friendly to call it simply String and rely on the template conversion script to map this to xsd:string if so desired.

@tombaker
Copy link
Collaborator

tombaker commented Jun 9, 2020

@acka47

  1. not being able to define a URI stem for the subject URI (or do I oversee something?) but only for object URIs/nodes

I agree that this cannot be expressed in any of the variants of the CSV model we are discussing, but in general, I think we should not try to overload the model with too many features and, specifically in this case, I think the omission is justified.

In the current model, constraints on nodes can only be expressed for nodes in the object position, but to support the specification of a URI stem for the subject URI, one would need to add another two columns to the model:

  • Subject Type (because one of those types would need to be URI stem) and
  • Subject constraints (which is where one would put http://viaf.org).

But would the gain in expressivity be worth the loss of simplicity? If the CSV model were so complex, why not just learn ShEx (which distinguishes between node constraints and triple constraints)?

The "unattractive kludge" to which Karen alludes above would be to use, say, dc:identifier to put the subject URI into the object position, where it could be constrained, as in the triplehttp://viaf.org/1234 dc:identifier http://viaf.org/1234. This is not really a solution...

@acka47
Copy link

acka47 commented Jun 9, 2020

, I think we should not try to overload the model with too many features

I agree with keeping it simple.

specifically in this case, I think the omission is justified.

I am not convinced. Don't you think there will be use cases where people want to add constraints on the URI of the top-level node (which in the example is @book)? Maybe there won't be and I argue for something people don't need but I am not sure about this.

to support the specification of a URI stem for the subject URI, one would need to add another two columns to the model

I am not sure it has to be this way. Maybe we could find a way to put in in the current model In #61 (comment), I used the @id key – as JSON-LD does it – for the subject URI. In the book example, this could look like this:

Subject Shape Property Display Label Mandatory Repeatable Value Type Value Constraints Value Shape ID Prefix Namespace
@book Book
@id y n URIStem http://openlibrary.org/
rdf:type instance of y n URI sdo:Book
rdf:type instance of y n URI wd:Q571
sdo:name title y n String
sdo:author author y y URIStem http://viaf.org @author

This could easily mean, the URI for an instance node of @book must be in the http://openlibrary.org/ namespace. Note that I renamed the column "property" instead of "propertyURI" to take this case into account. This leads to another question I have (sorry I have not followed the whole process until now): Is DCAP aimed at RDF data only, i.e. at data that identifies keys/properties with a URI? Or does it also cover non-RDF like plain csv or JSON?

@tombaker
Copy link
Collaborator

tombaker commented Jun 9, 2020

@acka47 That's an interesting twist! In this case, I find it confusing that @id and @book both use the @ prefix with different meanings, but I'm sure there are ways around that.

We did decide awhile ago to "develop an RDF-specific model, back it up with a variety of example profiles, each with some instance data that we can validate with ShEx. Once that is solid, we can go back and see if the same template can work with other data types, like XML, JSON".

My first reaction is that treating the identifier as a property, as you suggest above, should fit nicely into the model (apart from the question of punctuation) without compromising the use the CSV model for RDF-based profiles. Is there anything else in the way you use JSON-LD that is not already supported by the CSV model and that you think should be?

@philbarker
Copy link
Collaborator

If the table structure makes the use of URIstems difficult, fine: don't do string hacks on URIs, treat them as opaque identifiers with no internal semantics,--but I guess the boat has sailed on that

I still don't understand why the emphasis is on APs being about properties. I know that a focus on properties is in the DCMI heritage, but an Application Profile mixes and matches existing Classes, Properties and value encoding schemes, and I believe we should treat them equally.

I think the table should primarily identify which existing vocabularies are being used and which terms from these vocabularies are being used. So it should include namespaces, with local and global identifiers; classes being used, with local and global identifiers; properties being used, with local and global identifiers; and encoding schemes (concept schemes, syntax encoding schemes), with local and global identifiers. (The local identifiers are only needed if we want to make cross references within the AP.)

@acka47
Copy link

acka47 commented Jun 9, 2020

Is there anything else in the way you use JSON-LD that is not already supported by the CSV model and that you think should be?

Directly to my mind come the following two:

  1. constraining literals by regular expression: I guess that you have it already covered but could not find it quickly
  2. define property value as ordered list: In which way do I define usage of an ordered list (in JSON-LD "@container: "@list", in RDF rdf:List) for a specific property?

Re. 2.) you have to distinguish – from a JSON-LD view – a simple array ("@container": "@set") from an ordered list ("@container: "@list"). Re. an array, it is often important for a JSON representation to know which properties can generally have many values (=array) and it often makes sense to then coerce those values to an array even if only one value exists. I can already derive this behaviour from repeatable: yes but do not see a solution for the ordered list.)

@acka47
Copy link

acka47 commented Jun 9, 2020

define property value as ordered list: In which way do I define usage of an ordered list (in JSON-LD "@container: "@list", in RDF rdf:List) for a specific property?

Thinking about this, you can easily define a value shape @list with rdf:type rdf:List, so that does not seem a general problem. If I want to define the nodes that are items of this list it becomes more complex as I have to add another value shape. Example:

Subject Shape Property Display Label Mand. Rep. Value Type Value Constraints Value Shape ID Prefix Namespace
@book Book
rdf:type instance of y n URI sdo:Book
ex:chapters chapters n n @list
@chapterList chapter list
rdf:first y n @chapter
rdf:rest n n @chapterList
@chapter Chapter
rdf:type y n URI sdo:Chapter

So it might make sense to add some syntactic sugar for ordered lists.

@tombaker
Copy link
Collaborator

tombaker commented Jun 9, 2020

@philbarker

think the table ... should include namespaces, with local and global identifiers;

Check.

classes being used, with local and global identifiers;

Check (I think), as in the example above: rdf:type sdo:Chapter. Not that rdf:type is the only way to express class membership, but I cannot think of any way to do this that could not in principle be accommodated in the model, especially if we were to relax the requirement that properties be URIs, as per Adrian's example from JSON-LD.

properties being used, with local and global identifiers;

Check.

and encoding schemes (concept schemes, syntax encoding schemes),

Taking the two separately:

  • Might syntax encoding schemes, aka datatypes, such as xsd:string, be accommodated as Value Types? Perhaps the user guide could encourage the use of String or Date but I see no obvious reason not to use any arbitrary datatype URI as a value type. Use of a datatype URI from the xsd namespace would of course imply that the value is a literal; I'm not sure if it would be safe to assume that any URI used as a Value Type would be a datatype.
  • Concept schemes - hmm, this is trickier. I'm not sure how to accommodate the early-2000s notion of "vocabulary encoding scheme" (as a URI meaning, for example, that the string "China -- History" is taken from LCSH. I'd be interested to know whether this style of "qualifying" literal values is still in wide use and do not see an obvious way to express a VES in our simple model without going in the direction of DSP, which created an extra box in the model for Vocabulary Encoding Scheme Constraints.

I am assuming that the more common way to use a concept scheme in modern metadata would be simply to use the URI of a concept as value URI. I dunno - maybe there could be a Value Type like, say, Value URI Source, defined as a pointer to a list or set of value URIs? If a concept scheme consists of concepts that share a base URI, such as http://www.fao.org/aims/aos/agrovoc/, then URIStem could be used. (However, concept schemes do not necessarily have just one base URI.)

@tombaker
Copy link
Collaborator

tombaker commented Jun 9, 2020

@acka47 The shape IDs in rows 6 and 7 were in the "prefix" column (and mandatory/repeatable were not aligned) so I edited your post to move them under Value Shape ID.

So it might make sense to add some syntactic sugar for ordered lists.

Maybe so. Is this a common use case?

@tombaker
Copy link
Collaborator

tombaker commented Jun 9, 2020

@acka47

constraining literals by regular expression: I guess that you have it already covered but could not find it quickly

Good question - I'm not sure we have covered those. Maybe Regex Match could be a value type, the value constraints of which would be the regex?

@acka47
Copy link

acka47 commented Jun 9, 2020

Is this a common use case?

With regard to bibliographic data, the use cases are contributors (e.g. bf:contribution) and subjects (e.g. mads:componentList). At least, that's where we use RDF lists in lobid.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 9, 2020

@philbarker I would love to find another solution to URI stems! One that doesn't disrupt our model. I would appreciate hearing more detail around your:

don't do string hacks on URIs, treat them as opaque identifiers with no internal semantics

To that end, here is an expression of the use cases.

  1. My data has a property that will take as its object a member of an external vocabulary. The external vocabulary members have URIs. Any member is acceptable/valid.

  2. My data has a property that will take as its object a member of an external vocabulary. This member must meet certain criteria to be valid - specifically, it must be itself be a member of a specific class as defined in that vocabulary (e.g. a SKOS concept)

  3. My data has a property that will take as its object ANY URI that is a member of a SKOS concept scheme.

The third one is an add-on that we haven't discussed but that I know is in use in at least one metadata application. If it doesn't fit with solutions to 1 and 2 we can discuss it another time.

I'll also note a possibly dangerous thought that came up in earlier discussions, which is to allow the value column to include regex-type formulas. This is not something that we would expect from our most beginner profile developers, but might provide a passage from the simplest template to one that can express useful value rules like "date = > than 2000 but < than 2021". URI stems could be "ex:www.something.org*". But then we'd have to have a way to indicate that the value field contains a formula, rather like the "SUM=" in spreadsheets.

Thanks!

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 9, 2020

I added a second sheet to the google spreadsheet to show the "solution" that gathers all of the author information in a single "shape". I do like the idea of saying that PropertyX has as its object ShapeY and all of the value constraints are in the ShapeY rather than giving value constraints on the PropertyX row. (That'll be clearer when you look at the spreadsheet.) This solution is really unrelated to the URIstem problem, and you can see it in the @Series shape as well. In short, this suggests:

A property either has a value as its object, or it has a shape as its object, but not both.

@philbarker
Copy link
Collaborator

@kcoyle I think for those requirements it is better to resolve the identifier and check for statements like <whatever> rdf:type skos:Concept <whatever> skos:inScheme http://example.org/requiredVocab I think you fit that in the model defining the required properties in an @entityShape and referring to it.

@tombaker
Copy link
Collaborator

tombaker commented Jun 10, 2020

@philbarker Is this what you mean?

SSID Prop VType VConstraints Value ShapeID
@book dc:subject URI @subject
@subject rdf:type URI skos:Concept
skos:inScheme URI http://ex...

@kcoyle This would perhaps address:

  1. My data has a property that will take as its object ANY URI that is a member of a SKOS concept scheme.

@tombaker
Copy link
Collaborator

@kcoyle

  1. My data has a property that will take as its object a member of an external vocabulary. The external vocabulary members have URIs. Any member is acceptable/valid.

How about ValueTypeSource as a value type (defined as a pointer to a list or set of value URIs)? I'm not convinced by my own suggestion but do not see any obvious problem with it.

@tombaker
Copy link
Collaborator

@kcoyle @acka47 To summarize, I'm seeing two possible modeling patterns for recording the identifier of the subject described by a given shape, both of which put the URI in the object position:

  1. Karen's spreadsheet, uses dc:identifier and constrains it to be based on a URI stem. This is very readable and easy to understand, though it does not actually constrain the subject URI of the triples about (in this case) the author.
  2. Adrian's example uses a non-URI property in a statement that, by JSON-LD rules, does constrain the subject URI of triples.
SSID Prop VType VConstraints Value ShapeID
@thing @id URIStem http://...
dc:identifier URI http://ex...

Aside from the unfortunate use of @ with two meanings, both patterns seem valid, and with known limitations: the first does not constrain the subject URI, while the second relies on a JSON-LD interpretation.

@acka47
Copy link

acka47 commented Jun 10, 2020

I would love to find another solution to URI stems! One that doesn't disrupt our model

Joining both the discussions on Regex and URIStems together: I think one could completely get along without the URIStem value type and replace it with a regex. I added a third table to the spreadsheet that defines a constraint on a URI by regular expression. Here is the relevant snippet.

Subject Shape ID PropertyURI Display Label Mandatory Repeatable Value Type Value Constraints Value Shape ID Prefix Namespace
@author Author
rdf:type instance of y n URI bf:c_Person
@id author y y URI regex(^http:\/\/viaf.org\/viaf\/[1-9]\d{0,21})

@acka47
Copy link

acka47 commented Jun 10, 2020

Aside from the unfortunate use of @ with two meanings, both patterns seem valid, and with known limitations: the first does not constrain the subject URI, while the second relies on a JSON-LD interpretation.

I completely understand that you don't want to use @s in two different senses. I used @id because it is known from JSON-LD and I did not see a better choice. I don't se it as a JSON-LD keyword in this context, though, but as a specific DCAP keyword (like all the @ terms in JSON-LD are JSON-LD-specific keywords, all the @s in DCAP could be DCAP-specific).

I think it makes sense to use such a specific keyword that means "using this keyword means all statements are made about the subject URI", but I don't mind using another token for the keyword – as long as it is clearly distinguished from the RDF properties. That's where I see the problem with dct:identifier as it can not be used and interpreted like a specific DCAP keyword.

@tombaker
Copy link
Collaborator

@acka47

I think it makes sense to use such a specific keyword that means "using this keyword means all statements are made about the subject URI", but I don't mind using another token for the keyword – as long as it is clearly distinguished from the RDF properties.

Interesting idea! So it could be something like SubjectID -- no prefix or http://, and perhaps uppercased?

Other than this keyword, and aside from a controlled "starter vocabulary" of Value Types (which we clearly need), can we think of other cases where such a keyword might provide some syntactic sugar for edge cases? I wouldn't want to see us get too fancy with special keywords, but I'd be curious if we do see any.

For example, how strong is the requirement for a profile (or its shapes) to be able to reference themselves (e.g., InProfile, analogously to skos:inScheme or rdfs:isDefinedBy?).

@philbarker
Copy link
Collaborator

@tombaker yes, something like that

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 13, 2020

@philbarker you asked:

"I need to know whether something is a BNode or a URI because ...." "If you give me the information that something is a Literal and that its datatype is xsd:date separately it will allow me to ...."

and the answer is a resounding "yes!" - those use cases would be very helpful. Thanks.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jun 13, 2020

@acka47

constraining literals by regular expression: I guess that you have it already covered but could not find it quickly

Adrian, we talked about this early on, before we decided to work first on the very simplest of cases. In that early thinking this would indeed be in the value space, and it could be fed pretty directly into a ShEx schema. We could "allow" this in our simple schema and give a few simple examples of common needs like constraining dates or numbers. I'm thinking that we may want to include features like this as extensions of the simpleAP model, not as part of the very simple base. And I am still hoping that we can go beyond the simplest model, at least as extensions and examples of more complex cases. Anyone else have comments on this?

@tombaker
Copy link
Collaborator

@kcoyle I fully agree that we need to keep the model simple but think we have some wiggle room between saying that the Value Constraints column (naming issue aside) only holds things that are actually "in the value space", in a strict sense, as opposed to also holding things that meaningfully constrain the set of possible values with respect to a given Value Type in a looser and more flexible sense.

To follow the former (stricter) interpretation would be to limit ourselves to things such as:

Value Type Value Constraint
URI sdo:Person
Literal "confidential"

because the URI sdo:Person (or http://schema.org/Person) and the string "confidential" are expected to appear in the instance data.

However, if we were to say that the nature of the value constraints is specific to a given value type, then we could accommodate things like:

Value Type Value Constraint
URIStem http://schema.org
TypedLiteral xsd:string
URIRegex regex(^http:\/\/viaf.org\/viaf\/[1-9]\d{0,21})
LiteralPicklist ["animal" "vegetable" "mineral"]
URIPicklist [http://purl.org/example http://schema.org]

In our planned "starter" vocabulary of value types, then, we would need to clarify, for each value type, the nature of expected value constraints (eg, actual URIs and literals in the former examples; base URIs, datatype URIs, and regular expressions in the latter examples, along with any formatting rules such as "enclose lists with square brackets" or "enclose regular expressions in parentheses").

We would need to make a number of somewhat arbitrary decisions such details (e.g., can any value type be turned into a picklist by enclosing the set of alternative value constraints in square brackets?). And I still see no elegant way to accommodate more complex expressions such as "URI or Literal lastname". Creeping featurism is a slippery slope, but such a model could accommodate quite a few of the list cases we have discussed.

@philbarker
Copy link
Collaborator

@tombaker

a picklist by enclosing the set of alternative value constraints in square brackets

oh, I don't think that helps. I don't need the square brakets to tell me it is a picklist if the Value Type does that, and I don't want the square brackets if I am processing the Value Constraint string in python because without them I can just use str.split() on the value.

@tombaker
Copy link
Collaborator

@philbarker

oh, I don't think that helps. I don't need the square brakets to tell me it is a picklist if the Value Type does that, and I don't want the square brackets if I am processing the Value Constraint string in python because without them I can just use str.split() on the value.

It's fine with me to separate multiple items in a Value Constraints cell with just whitespace. I guess that would preclude putting multiple regexes into a cell (because a regex might have spaces), and the strings of a literal picklist would also break if there were spaces, but perhaps those are small prices to pay for the simpler approach.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jan 25, 2021

Taken up in dcmi/dctap#5, which links to here for discussion

@kcoyle kcoyle closed this as completed Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
valueConstraints
  
valueConstraint issues
Development

No branches or pull requests

4 participants