Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REQUIREMENT: May specify Open/Closed #33

Closed
kcoyle opened this issue May 25, 2019 · 13 comments
Closed

REQUIREMENT: May specify Open/Closed #33

kcoyle opened this issue May 25, 2019 · 13 comments
Labels
Basic requirements that are definitely in scope needs discussion Extra attention is needed POSTPONED requirements As derived from use cases

Comments

@kcoyle
Copy link
Collaborator

kcoyle commented May 25, 2019

#17 #19
"Must be able to include information about what to do with metadata received or encountered that is not included in the profile itself."

@kcoyle kcoyle added the requirements As derived from use cases label May 25, 2019
@kcoyle
Copy link
Collaborator Author

kcoyle commented May 25, 2019

There may be more than one type of "open" - accept any metadata properties, or accept any properties from a limited set of vocabularies.

@kcoyle kcoyle added Basic requirements that are definitely in scope needs discussion Extra attention is needed labels Jun 13, 2019
@kcoyle kcoyle added this to Basic requirements in Basic AP Vocabulary Jun 13, 2019
@kcoyle
Copy link
Collaborator Author

kcoyle commented Sep 17, 2020

This relates to the entire profile and therefore does not fit on any of the rows in the csv template. That means that we may need a manifest file of some type that gives information for the profile itself. That file could include the administrative information: who created it, when, etc.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Sep 29, 2020

From an email to the list from Tom Baker:

"In ShEx, "openness" is an attribute of a shape [1]. Maybe we could have
an (optional) column like 'shapeOpen', with a value of True/False,
Yes/No, or whatever. "
https://shex.io/shex-primer/#closed-shapes

To contemplate when we get here: should we allow open/closed to be only on shapes or on the entire profile (if we figure out a mechanism for the latter)?

@tombaker
Copy link
Collaborator

@kcoyle I suggest we support open/closed only on shapes. Reasons:

  • It is unclear to me what "closed" would mean for an entire profile. In the absence of cardinality on shapes, for example, would it mean that there must be at least one node neighborhood that matches every shape in the profile? Our model creates a bag of shape descriptions but does not actually have a construct for the bag itself (ie, the Application Profile as a whole).
  • Putting open/closed on all shapes in a profile would close down the allowable content of the data pretty firmly. I cannot think of cases where this would not, in effect, close down the data content allowed by the profile as a whole. Maybe for scenarios where the cardinality of node neighborhoods that match specific shapes must be controlled? In order to proceed, I think we'd need to have some clear use cases.
  • Adding Open/Closed for shapes, on the other hand, seems like low-hanging fruit. What would we call such a column? How about shapeClosed? We could suggest
    • No (the default) / Yes - or N / Y
    • False (the default) / True

@tombaker
Copy link
Collaborator

... or my original proposal for shapeOpen, but since we might want to say that the default interpretation of a profile is "open", shapeClosed might make more sense because we would not encourage people to use this element at all unless they want to close their shapes.

@philbarker
Copy link
Collaborator

@tombaker but what then of what you might call the "top level" or "default" shape? What I mean is that if the simplest profile can be just a list of properties (no declared shapes at all), how would you say whether that was a closed or open list? I'm guessing you would stipulate that if one wanted to say that it was either open or closed would have to assign the list a shape.

It might be useful to decide on what the default is.

@tombaker
Copy link
Collaborator

@philbarker An interesting question. To be translated into ShEx, a "shapeless" list of properties would need to be turned into a shape for which "closed" (or "open") could be specified. In this sense, a shapeless list of properties could be seen as having an implied (anonymous) shape.

I have been assuming that the default should be "open", as it is in ShEx, but am curious to hear the case for "closed". In the absence of a default, we would in effect be saying that it could be either open or closed unless deliberately specified - in effect, saying that the minimal profile would need to have at least two columns.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Oct 12, 2020

@tombaker "Closed" for a profile means that properties or shapes that are not included in the profile are invalid for matching on the profile. It would be the same as marking all shapes as "closed". This seems to me to be the likely intention, rather than having some shapes closed and some open. I say likely because people would generally think of their metadata (aka the entire profile) as open or closed. I have no objection to putting open and closed on a shape, but in either case we have to think about what that means:

  • If a shape is "closed" does that mean that only the properties in the shape are valid for that shape?
  • Could "open" also refer to value shapes, allowing the object of a property to be a node not in the profile?
  • Could "open" refer to value constraints, e.g. taking the object value from a domain not listed in the constraints? Or not in another way conformant to a constraint?
  • Does "closed" mean that other shapes/properties/values are ignored, or do they throw an error during validation?

It seems to me that the nature of an application profile is to create a defined metadata environment, which would be "closed" in the RDF sense. An open metadata set would be somewhat contrary to the intention of a profile. What does seem especially relevant for those ingesting metadata created by others would be the instruction to ignore any shapes/properties/values that are not included in the profile, thus creating a metadata set that conforms to the profile. So I am arguing for "closed" as the default, and am not sure that our first template version will need "open", but we should poll the community for that option.

@philbarker
Copy link
Collaborator

I would assume the default default (i.e. if the spec says nothing) would be "open" because that is the de facto situation with no profile, however, I can see the argument for the intention in creating an AP being to close down some options--hence the question. (Aside: I recall discussions arising from OAI-PMH mandating Dublin Core metadata but nothing being mandatory in Dublin Core.)

My 2c regarding @kcoyle's Qs

  • If a shape is "closed" does that mean that only the properties in the shape are valid for that shape?

That is what I assume it means

  • Could "open" also refer to value shapes, allowing the object of a property to be a node not in the profile?

If the valueShape for a property is an open shape that lists no mandatory properties then that value could be any non-literal.

  • Could "open" refer to value constraints, e.g. taking the object value from a domain not listed in the constraints? Or not in another way conformant to a constraint?

I don't think so. If so then there would be no point in having the constraint.
BTW, I've tried examples like this which have a similar effect:

property Mand Repeat valueType constraint note
rdf:type y n URI sdo:Book must be schema.org/Book
rdf:type n y URI can be anything in addition to schema.org/Book
  • Does "closed" mean that other shapes/properties/values are ignored, or do they throw an error during validation?

That would be an implementation issue. In a spec I would word this as such data "may be ignored" "may trigger an error or warning". If I were creating the data I would want a validator warning to tell me I was using terms that may be ignored. If I were receiving data and had decided to keep all data as it was sent I wouldn't want an error/warning. If I were writing a validator I would make these warning configurable through a "strictness level" setting or similar.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Oct 12, 2020

I did my usual shallow->middling dive into the documents, and here's what I believe is true:

The "open" in RDF schema or OWL corresponds only vaguely to what we are discussing here for profiles. The "open world assumption" is a somewhat different beast, as per the OWL documentation:

"If some fact is not present in a database, it is usually considered false (the so-called closed-world assumption) whereas in the case of an OWL 2 document it may simply be missing (but possibly true), following the open-world assumption."

Instead, our sense of "open/closed" is directly related to validation. SHACL has this description of "closed":

"If $closed is true then there is a validation result for each triple that has a value node as its subject and a predicate that is not explicitly enumerated as a value of sh:path in any of the property shapes declared via sh:property at the current shape."

Therefore, in SHACL, closed is explicitly about a subject/predicate pair. I'm less clear about the ShEx use of closed because of how that documentation is worded:

" In a ShEx schema, a shape may be defined to match only RDF data nodes that have outgoing triples matching the given set of triple constraints and no other outgoing triples. A shape declaration can be qualified to mean "this set of outgoing triples and no others" by using the keyword CLOSED."

What I am unclear about on this is "outgoing triples", although the document also says:

" A node in the subject position has an outgoing arc and a node in the object position has an incoming arc."

So I believe this also refers to triples with a specific property but I can't tell if the "outgoing arc" can include the object node's value. @ericprud ?

@ericprud
Copy link
Collaborator

The concept of closed is intended to be the same for both languages. In ShEx it applies to "outgoing arcs" of the node being validated, meaning all triples with that node as a subject and any predicate or object. So if the predicate of an outgoing arc isn't mentioned in a closed shape, it's flagged as a violation.

The only predicates SHACL recognizes when testing closed-ness are those in the top-level triple constraints, which makes closed-ness more complicated. For instance, if a closed schema required a foaf:name or a schema:name, and the data had a foaf:name, that triple would be flagged as a violation. The work-around is to list any properties buried in expressions in a property called sh:ignoredProperties.

@tombaker
Copy link
Collaborator

@kcoyle @philbarker @ericprud
Karen, you wrote: 'What does seem especially relevant for those ingesting metadata created by others would be the instruction to ignore any shapes/properties/values that are not included in the profile, thus creating a metadata set that conforms to the profile. So I am arguing for "closed" as the default...'

If you are suggesting that "closed" be the default because ingesters of metadata should be able to ignore triple patterns not included in a profile (though I am unsure if you are arguing this), then I'd point out:

  • If a shape is closed, then outgoing arcs not matching the shape are not ignored but flagged as violations. The ShEx Primer describes the "open" interpretation as 'meaning that data triples not specified in the schema are simply ignored.'
  • Saying that a shape is open does not mean that everything in the shape is open, because specifying an outgoing triple in terms of its property in effect "closes" that property with respect to allowable values.
  • Saying that a shape is open or closed only really affects the set of allowable property-value pairs (outgoing triples) associated with a shape. If the shape is closed, the presence of any outgoing triples that do not match that set trigger a violation. If open, any outgoing triples not specified in the set are allowed - except (in ShEx) for triples that use one of the specified properties in a way that does not match the pattern described in the shape (because use of those properties is "closed"). This default interpretation can be overridden with the keyword EXTRA.

Re: "A node in the subject position has an outgoing arc and a node in the object position has an incoming arc." (from the Primer), you wrote: "So I believe this also refers to triples with a specific property but I can't tell if the "outgoing arc" can include the object node's value."

The example given in the Primer includes a schema:

my:IssueShape {
  ex:state [ex:unassigned ex:assigned];
 ^ex:reportedIssue @my:UserShape
}

my:UserShape {
  foaf:name LITERAL;
  foaf:mbox IRI+
}

and matching data:

inst:Issue1 a ex:Issue ;
    ex:state        ex:unassigned .
inst:User1 a foaf:Person ;
    foaf:name       "Bob Smith" ;
    ex:reportedIssue inst:Issue1 ;
    foaf:mbox       <mailto:bob@example.org> .

A shape can always describe the object node's value associated with a given property (but can also leave it unspecified). If you meant to ask whether the "INCOMING arc" can include the object node's value, then the line ^ex:reportedIssue @my:UserShape could be read as meaning:

  • the subject node is something that matches the shape my:UserShape
  • the predicate is ex:reportedIssue
  • the object is something that matches the shape my:IssueShape.

@ericprud Closing my:UserShape would make the data invalid (because the ex:reportedIssue triple is not covered), would it not?

As I see it, inverse triple constraints are a good example of what an expressive constraint language like ShEx can cover, but would be very awkward to express in a simple CSV format.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Jan 25, 2021

This discussion is continued at dcmi/dctap#8

@kcoyle kcoyle closed this as completed Jan 25, 2021
Basic AP Vocabulary automation moved this from Basic requirements to Done Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Basic requirements that are definitely in scope needs discussion Extra attention is needed POSTPONED requirements As derived from use cases
Projects
Development

No branches or pull requests

4 participants