Pure-ODD content model elements appear to be strictly weaker than regular languages -- intended? #1596

cmsmcq · 2017-02-28T16:00:35Z

The elements used for content models in pure ODDs include, if I understand chapter 22 correctly:

Elements denoting atomic units of the content model (anyElement, dataRef, elementRef, textNode, valItem).
Elements denoting sequences, choices, or all-groups of atomic units (alternate, classRef, sequence, valList)
Elements designed for textual operations (macroRef).

For sequence and alternate, the content models require at least one child, and the Schematron rules require at least two.

That means that <sequence/> and <alternate/> cannot be used, as currently defined, to denote sequences of length 0 and sets of cardinality 0, respectively. Relax NG also requires sequences and alternations (group and choice elements) to contain children, but it provides the empty and notAllowed elements to denote empty sequences and empty sets.

Are there other ways to denote the empty sequence and the empty set in pure ODDs?

If not, then pure ODDs appear to have lost one of the nice features of Relax NG, that content models are closed under the standard set operations (intersection, union, complementation, set difference), and that any regular language over elements, typed data, and text nodes can be defined by a content model.

Is the absence of closure in pure ODD intentional?

The text was updated successfully, but these errors were encountered:

lb42 · 2017-02-28T16:27:46Z

An empty content model is represented by and empty content element in pure odd

cmsmcq · 2017-02-28T16:55:55Z

Thank you. Other things being equal, I like being able to say explicitly 'this is empty', and not having an explicit notation for empty sequences does make it less convenient to perform operations on content models, but one can't have everything.

Is there a way to denote the empty set, or does one need to define an element whose type is xs:error or use some similar workaround?

Hmm. The existence of such a workaround seems to mean that pure ODD does have the closure property I claimed it was missing, so the initial entry in this issue is wrong. It's not that the language cannot express the empty sequence or the empty set, only that it makes references to them inconvenient.

I would propose to add empty elements to denote these concepts explicitly; their absence is not an error, but their presence would be an improvement.

lb42 · 2017-02-28T17:20:11Z

We did consider at one point introducing an element *lt;empty/> for this purpose. But it seemed needlessly verbose to require <content><empty/></content> when <content/> did the same job more economically. By all means put in a ticket requesting the introduction of an element <empty> : would it make sense to restrict this to the content model of <sequence or <alternate to avoid having two ways of doing the same thing?

cmsmcq · 2017-02-28T17:38:04Z

would it make sense to restrict this to the content model of <sequence or <alternate to avoid having two ways of doing the same thing?

On this point I think RNG's XML syntax is reasonably well defined; it allows both <element name="E"/> and <element name="E"><empty/></element>, so that emptiness can be either implicit or explicit. If one were to require that there be only one way of doing it, then I think an explicit <empty/> element would be better than an implicit emptiness. (But with regular languages, there is almost always more than one way to say the same thing; I don't think that's avoidable. For every language described by expression e the same language is described by (e|e) and (e|∅) and (e, ε) and an infinite number of other expressions.)

By all means put in a ticket requesting the introduction of an element <empty>

Well, not just <empty/> but also <emptySet/> (or <nil/> or <nihil/> or <null/> or <void/> or <nonviable/> or <nugatory/> or whatever one wants to call it -- lots of bikeshedding opportunities here).

But isn't this ticket already a ticket requesting the introduction of those elements?

lb42 · 2017-03-08T15:14:49Z

Let's resist the bikeshedding (I had to look this up, but now I know what it means I definitely don't want to do it). I propose to add an element <empty>, member of model.contentPart. I propose further that <content/> is considered synonymous with <content><empty/></content>. All in favour?

lb42 · 2017-03-22T12:16:28Z

Adding a new element to P5 is a complicated process, involving at least the following

providing a new <elementSpec>
adding a reference to that spec and some discussion of it in the appropriate chapter (in this case TD)
providing at least one example of its use
(in this case) modifying the stylesheets, specifically those generating schemas from ODD, to take notice of its presence in a document

I will happily do the first two of these, if Council approves, alone or with any Council member interested; I'd definitely appreciate some help with the third and the fourth is (to coin a phrase) beyond my pay grade.

martindholmes · 2017-03-27T15:29:38Z

I still don't understand how allowing both this:

<content/>

and this:

<content><empty/></content>

is an improvement over the current situation; multiple ways of doing the same thing are sometimes inevitable but not a good thing in themselves.

lb42 · 2017-03-27T15:41:43Z

MSM claimed earlier on this ticket that the availability of the second form was "more convenient" : I think he had in mind the processing of content models under modification, where (say) one component of a complex structure becomes empty as a result of deletions from a class

hcayless · 2017-03-27T16:19:39Z

I think if we added <empty/>, we would have to deprecate and then disallow empty <content>. Allowing the two to coexist would lead only to confusion. Is adding this new element worth the pain?

cmsmcq · 2017-03-27T18:02:49Z

On Mar 27, 2017, at 10:19 AM, Hugh A. Cayless ***@***.***> wrote: I think if we added <empty/>, we would have to deprecate and then disallow empty <content>. Allowing the two to coexist would lead only to confusion. Is adding this new element worth the pain?

YMMV, but empty elements and elements containing an explict empty element whose basic semantics is “yes, i know this is empty, that’s the way I want it” do co-exist in both RNG and XSD, which I think are the most widely used schema languages for XML. So as a user, and as a writer of stylesheets which work with content models, I would be perfectly happy to have both <content/> and <content><empty/></content> Personally I prefer the latter, because I tend to find explicitness more easily readable, after a few weeks or years away from something, than the alternative. As I have already suggested, trying to ensure that there is only one way to write a content model for a particular language is a losing effort. One may conceivably achieve it for this one case, but not for any more complex languages. (I point to the rules for facet-based restriction in XSD as an impressive example of how much effort a spec can put into a failed attempt to stop people from writing things that the authors of the spec think "don't make sense". It's not a path I recommend to those I wish well.) ................................................................ The crucial difference between the proposal and the status quo is that adding 'empty' and 'emptySet' allows expressions like <choice><empty/><elementRef key="foo"/></choice> <choice><emptySet/><elementRef key="foo"/></choice> as the first step in simplifying expressions like the following (where CARTHAGE is an element to be deleted): <choice> <elementRef key="CARTHAGE" minOccurs="0"/> <elementRef key="foo"/> </choice> ... <choice> <elementRef key="CARTHAGE"/> <elementRef key="foo"/> </choice> A further simplification is of course possible in both cases and would yield <elementRef key="foo" minOccurs="0"/> and <elementRef key="foo"/> ................................................................ As for whether it’s worth doing or not — again, your mileage may vary. One of the reasons given by some people (including some involved with TEI) for preferring RNG to XSD was that RNG content models are closed under set operations (union, difference, negation, intersection) while XSD and DTD content models are not (owing to the determinism rules they took over from SGML), and that RNG is more obviously governed (or informed) by the theory of formal languages (here: regular languages and regular expressions) than is XSD. An observer who took that claim at face falue might be surprised at the work the TEI has put in on replacing RNG with a new language for content models, which turns out NOT to be closed under set operations and NOT to include simple representations of basic concepts of formal language theory like the empty language and the empty string. As has been observed, the change currently proposed appears to be more of a cosmetic change and simplification than a fundamental improvement in expressive power. It is worth doing only if the TEI cares about simplicity and aesthetics.

hcayless · 2017-03-27T20:05:58Z

Michael, I'm trying to get a feel for cost v. benefit here. It sounds like the addition of empty (and also emptySet?) would make some of the content-model rewriting operations you've been talking about elsewhere easier to implement. Is that true? The argument against is that we have to do extra work to make the Stylesheets handle (e.g.) both <content/> and <content><empty/></content> as equivalents. It's not necessarily all that hard, but it is a bunch of work beyond just adding the element(s). On Mon, Mar 27, 2017 at 2:02 PM, C. M. Sperberg-McQueen < notifications@github.com> wrote:

…

> On Mar 27, 2017, at 10:19 AM, Hugh A. Cayless ***@***.***> wrote: > > I think if we added <empty/>, we would have to deprecate and then disallow empty <content>. Allowing the two to coexist would lead only to confusion. Is adding this new element worth the pain? YMMV, but empty elements and elements containing an explict empty element whose basic semantics is “yes, i know this is empty, that’s the way I want it” do co-exist in both RNG and XSD, which I think are the most widely used schema languages for XML. So as a user, and as a writer of stylesheets which work with content models, I would be perfectly happy to have both <content/> and <content><empty/></content> Personally I prefer the latter, because I tend to find explicitness more easily readable, after a few weeks or years away from something, than the alternative. As I have already suggested, trying to ensure that there is only one way to write a content model for a particular language is a losing effort. One may conceivably achieve it for this one case, but not for any more complex languages. (I point to the rules for facet-based restriction in XSD as an impressive example of how much effort a spec can put into a failed attempt to stop people from writing things that the authors of the spec think "don't make sense". It's not a path I recommend to those I wish well.) ................................................................ The crucial difference between the proposal and the status quo is that adding 'empty' and 'emptySet' allows expressions like <choice><empty/><elementRef key="foo"/></choice> <choice><emptySet/><elementRef key="foo"/></choice> as the first step in simplifying expressions like the following (where CARTHAGE is an element to be deleted): <choice> <elementRef key="CARTHAGE" minOccurs="0"/> <elementRef key="foo"/> </choice> ... <choice> <elementRef key="CARTHAGE"/> <elementRef key="foo"/> </choice> A further simplification is of course possible in both cases and would yield <elementRef key="foo" minOccurs="0"/> and <elementRef key="foo"/> ................................................................ As for whether it’s worth doing or not — again, your mileage may vary. One of the reasons given by some people (including some involved with TEI) for preferring RNG to XSD was that RNG content models are closed under set operations (union, difference, negation, intersection) while XSD and DTD content models are not (owing to the determinism rules they took over from SGML), and that RNG is more obviously governed (or informed) by the theory of formal languages (here: regular languages and regular expressions) than is XSD. An observer who took that claim at face falue might be surprised at the work the TEI has put in on replacing RNG with a new language for content models, which turns out NOT to be closed under set operations and NOT to include simple representations of basic concepts of formal language theory like the empty language and the empty string. As has been observed, the change currently proposed appears to be more of a cosmetic change and simplification than a fundamental improvement in expressive power. It is worth doing only if the TEI cares about simplicity and aesthetics. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1596 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABbxd9iWgVcuSaDeG22pCFkUpKyU14Pks5rp_nJgaJpZM4MOk-v> .

cmsmcq · 2017-03-27T21:45:31Z

On Mar 27, 2017, at 2:06 PM, Hugh A. Cayless ***@***.***> wrote: Michael, I'm trying to get a feel for cost v. benefit here. It sounds like the addition of empty (and also emptySet?) would make some of the content-model rewriting operations you've been talking about elsewhere easier to implement. Is that true?

I think so.

The argument against is that we have to do extra work to make the Stylesheets handle (e.g.) both <content/> and <content><empty/></content> as equivalents. It's not necessarily all that hard, but it is a bunch of work beyond just adding the element(s).

Yes, agreed. ******************************************** C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com ********************************************

jamescummings · 2017-04-04T17:03:03Z

I think <empty/> makes sense and agree that then using <content/> should not be catered for.

lb42 · 2017-05-03T15:24:59Z

I have now added a spec for <empty> and the following brief discussion to TD

<p>In the simplest case, an element may have no content. This may be indicated by supplying an empty <gi>content</gi> element. 
 It may however be considered preferable to indicate this explicitly by supplying a <gi>content</gi> element containing the <gi>empty</gi> element.

As noted above, we may want to deprecate the use of <content/> in which case the above text will need revision ; however that would not be advisable till the Stylesheets have been adjusted to implement processing of <empty/>. Am raising a new Stylesheets ticket on that subject.

emylonas · 2018-02-26T12:07:42Z

deprecate <content/> (@sydb)
add test suite for <empty/> in stylesheets (@martindholmes)

Deprecate use of <content> w/o any content. (Use a child <empty> to indicate the element being defined should be empty.) Deprecated until 2019-08-25

sydb · 2018-04-24T23:07:33Z

Step 1, the actual deprecation (to 2019-08-25) completed at 84a80e4. Still need to address the prose and add some remarks to the tagdoc.
Built and tested locally w/o problem. However, folks may wish to wordsmith the message provided by the Schematron assertion (in content.xml).

Update prose in TD and add remarks in tagdoc for <content>.

sydb · 2018-04-24T23:41:47Z

Prose (in TD) updated and remarks (to <content>’s tagdoc) added at 6c27e74.

Over to you, @martindholmes, for adding a test to test suite.

cmsmcq · 2018-04-25T15:21:37Z

For the record, I repeat once more my view that deprecating empty content models is unnecessary, unhelpful, and harmful.

It appears to be grounded in the view that it's better to have a mechanism for specifying content models in which there is only one way to define a given language. Such a mechanism is not realistically feasible, and the idea that it would be better if it were feasible is at best dubious. Some arguing for it have feared that having two ways to say the same thing will be confusing, but no one has provided any evidence that users of Relax NG (including creators of TEI ODD documents during the years the ODDs used RNG to specify content models) have been seriously confused by having more than one way to write empty-sequence content models, or more than one way to write x?, x+, or x*.

When in the future someone asks how this particular ad-hoc rule came to infest the grammar of pure ODDs, and someone else answers "Oh, it was to resolve a bug raised by Michael," I would like the record to show that Michael had nothing to do with the deprecation of empty content models.

martindholmes · 2018-04-25T15:29:17Z

@sydb and @emylonas : should I add this to Test2 or to Test? (I'd prefer the former, of course, but those tests don't currently run during the build process.)

sydb · 2018-04-25T15:43:09Z

@cmsmcq — I don’t think anyone is imagining that requiring <content><empty/></content> (or <content><rng:empty/></content>) rather than just <content/> to indicate an empty content model resolves the issues raised in the rest of this ticket. It’s just that it is a complaint that came up in the discussion which we could handle quickly.

And, for the record, I lean towards the “one way is better” logic that Martin expressed when he raised this concern, but it is not why I am in favor. I am in favor because using <empty/> (or <rng:empty>) is more explicit and more obvious.

But it sounds like you disagree … is there an advantage to using <content/> that we’re not seeing?

@martindholmes: I think Tests2/ is sufficient, myself, but that’s in part to help pressure us (you) to make Test2/ the real testing paradign. :-)

martindholmes · 2018-04-25T15:48:16Z

@sydb We can go with Test2 for now, then, but we should make sure we don't end up in a situation where I'm the only one who can easily work on Test2. I've done my best to make it easy to work with, but the longer it's just me, the less likely it is that it'll be generally comprehensible. Do you fancy adding this test yourself to the new build_odd.xml file (assuming that's where it belongs)?

cmsmcq · 2018-04-25T15:58:01Z

@sydb No, I think the advantage lies in simple rules with as few special cases as one can manage.

I agree with your view that explicit elements are more explicit, and I personally find them clearer; I disagree with the apparent view that the definition of pure ODDs should include ad-hoc rules to enforce your stylistic preferences, or mine.

hcayless · 2018-04-25T16:25:21Z

@cmsmcq Is any actual harm done if the way TEI indicates an empty content model is with <content><empty/></content> rather than <content/>? Granted, there might be other ways you could write a functionally empty content model, and that's fine. We're only talking about disallowing (via deprecation) one of those. I don't see the problem. What am I missing here?

martindholmes · 2018-04-25T17:38:10Z

@sydb exactly what am I testing here? <content/> is already deprecated, and the deprecation message is working; it'll be invalid after 2019-08-25. Are we testing that <content><empty/></content> is processed correctly?

lb42 · 2018-04-25T18:00:01Z

I must have been asleep when deprecation of empty <content> was agreed. My opinion may be coloured by the fact that all my ODD tutorials will now need to be revised, but I still think that was an unnecessary and mistaken move. Should we now expect Council to move towards deprecation of numbered divs and other instances where the TEI has long offered more than one way to do it?

martindholmes · 2018-04-25T18:07:17Z

@lb42 I hope so. :-)

lb42 · 2018-04-25T18:12:34Z

@martindholmes that does not surprise me either :-(

cmsmcq · 2018-04-25T19:02:29Z

@hcayless I am sorry to see that my efforts at stating my views clearly have been unsuccessful.

There is no harm done that I can see in allowing the use of an explicit signal for emptiness in the form of <content><empty/></content>; on the contrary, I think it's an improvement and filed this bug report in the hopes of persuading people to add such an explicit signal (and also one for the empty set).

Whether any "actual harm" is done (you didn't ask this explicitly, but I gather it's what you meant to ask) by disallowing <content/> as a way of expressing the same thing will depend on what one understands "actual harm" to be.

Whether any "actual harm" is done (you didn't ask this explicitly, but I gather it's what you meant to ask) by disallowing <content/> as a way of expressing the same thing will depend on what one understands "actual harm" to be. The harms I see are to the simplicity and clarity of the design; I think those are actual enough.

[Overlong discussion of the harms and analogies with design mistakes of other specs deleted here.]

martindholmes · 2018-04-25T19:53:18Z

I think there may be an actual additional advantage to ODD processing in this decision.

Imagine we have an element whose content model is not intended to be empty. During an ODD-chaining process, all of the elements permitted inside it are eventually deleted -- basically unintentionally as a by-product of other decisions, but leaving an empty content model. The ODD processor in this case would most likely end up outputting <content/> or <content></content>. Under the new restriction, this would be invalid, so the nature of the problem would be clear immediately. Without the restriction, the problem is harder to diagnose.

This obviously depends on the design of ODD processing, though. A clever ODD processor might detect an empty content model and helpfully supply <empty/>.

lb42 · 2018-04-26T09:59:49Z

"An element whose content model is not intended to be empty" would presumably express this using @minOccurs and @maxOccurs at some point, and so in the case I think you are hypothesizing the resulting schema would be invalid. There is a well known problem when a content model requires "one or more" members of some class, but all members of the class have been deleted during ODD processing. I don't see how this issue helps with that though.

raffazizzi · 2018-09-07T06:32:06Z

Tokyo F2F: council agrees that reducing complexity in ODD processing is a priority over language expressiveness. Therefore we decided to keep <sequence/> and <alternate/> unable to denote sequences of length 0 and sets of cardinality 0.

cmsmcq · 2018-09-07T18:59:35Z

Thank you for considering this change. I accept the result, although I am disappointed that the council did not see things my way and believe the given rationale to be self-contradictory.

The absence of notations for the empty sequence and the empty set makes the processing of content models more complex, not less complex; some of the bugs I have reported elsewhere in the behavior of the current ODD processors (such as those connected with the deletion of elements from content models) would have been easier to fix had the suggested change been made, and I suspect that the absence of those notations may have been part of the reason the bugs got into the stylesheets in the first place. If the Council seeks to simplify the processing of ODDs, the way to do so is to remove special cases from the language, not to preserve them.

Thank you again for your time.

Remove vestigal remarks about previous practice.

Change deprecation of <content/> to making it invalid as deprecation period is now over.

hcayless assigned lb42 Mar 8, 2017

lb42 added the Status: Needs Discussion label Mar 22, 2017

lb42 mentioned this issue May 3, 2017

<empty/> element not processed TEIC/Stylesheets#263

Open

emylonas assigned sydb and martindholmes and unassigned lb42 Feb 26, 2018

sydb added a commit that referenced this issue Apr 24, 2018

Address #1596:

84a80e4

Deprecate use of <content> w/o any content. (Use a child <empty> to indicate the element being defined should be empty.) Deprecated until 2019-08-25

sydb added a commit that referenced this issue Apr 24, 2018

Address #1596:

6c27e74

Update prose in TD and add remarks in tagdoc for <content>.

raffazizzi closed this as completed Sep 7, 2018

martinascholger added this to the Guidelines 3.5.0 milestone Jan 20, 2019

sydb added a commit that referenced this issue Mar 21, 2019

Address #1596:

18a7d64

Remove vestigal remarks about previous practice.

sydb added a commit that referenced this issue Aug 27, 2019

Address by-product of #1596:

5472a2c

Change deprecation of <content/> to making it invalid as deprecation period is now over.

martindholmes mentioned this issue Aug 26, 2021

Bad example for <empty/> element #2177

Open

hcayless pushed a commit that referenced this issue Aug 11, 2022

Address by-product of #1596:

d2ebe56

Change deprecation of <content/> to making it invalid as deprecation period is now over.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pure-ODD content model elements appear to be strictly weaker than regular languages -- intended? #1596

Pure-ODD content model elements appear to be strictly weaker than regular languages -- intended? #1596

cmsmcq commented Feb 28, 2017

lb42 commented Feb 28, 2017

cmsmcq commented Feb 28, 2017 •

edited

Loading

lb42 commented Feb 28, 2017

cmsmcq commented Feb 28, 2017 •

edited

Loading

lb42 commented Mar 8, 2017 •

edited

Loading

lb42 commented Mar 22, 2017

martindholmes commented Mar 27, 2017

lb42 commented Mar 27, 2017

hcayless commented Mar 27, 2017

cmsmcq commented Mar 27, 2017 via email

hcayless commented Mar 27, 2017 via email

cmsmcq commented Mar 27, 2017 via email

jamescummings commented Apr 4, 2017

lb42 commented May 3, 2017

emylonas commented Feb 26, 2018 •

edited

Loading

sydb commented Apr 24, 2018

sydb commented Apr 24, 2018

cmsmcq commented Apr 25, 2018

martindholmes commented Apr 25, 2018

sydb commented Apr 25, 2018

martindholmes commented Apr 25, 2018

cmsmcq commented Apr 25, 2018

hcayless commented Apr 25, 2018

martindholmes commented Apr 25, 2018

lb42 commented Apr 25, 2018

martindholmes commented Apr 25, 2018

lb42 commented Apr 25, 2018

cmsmcq commented Apr 25, 2018 •

edited

Loading

martindholmes commented Apr 25, 2018

lb42 commented Apr 26, 2018

raffazizzi commented Sep 7, 2018 •

edited

Loading

cmsmcq commented Sep 7, 2018

Pure-ODD content model elements appear to be strictly weaker than regular languages -- intended? #1596

Pure-ODD content model elements appear to be strictly weaker than regular languages -- intended? #1596

Comments

cmsmcq commented Feb 28, 2017

lb42 commented Feb 28, 2017

cmsmcq commented Feb 28, 2017 • edited Loading

lb42 commented Feb 28, 2017

cmsmcq commented Feb 28, 2017 • edited Loading

lb42 commented Mar 8, 2017 • edited Loading

lb42 commented Mar 22, 2017

martindholmes commented Mar 27, 2017

lb42 commented Mar 27, 2017

hcayless commented Mar 27, 2017

cmsmcq commented Mar 27, 2017 via email

hcayless commented Mar 27, 2017 via email

cmsmcq commented Mar 27, 2017 via email

jamescummings commented Apr 4, 2017

lb42 commented May 3, 2017

emylonas commented Feb 26, 2018 • edited Loading

sydb commented Apr 24, 2018

sydb commented Apr 24, 2018

cmsmcq commented Apr 25, 2018

martindholmes commented Apr 25, 2018

sydb commented Apr 25, 2018

martindholmes commented Apr 25, 2018

cmsmcq commented Apr 25, 2018

hcayless commented Apr 25, 2018

martindholmes commented Apr 25, 2018

lb42 commented Apr 25, 2018

martindholmes commented Apr 25, 2018

lb42 commented Apr 25, 2018

cmsmcq commented Apr 25, 2018 • edited Loading

martindholmes commented Apr 25, 2018

lb42 commented Apr 26, 2018

raffazizzi commented Sep 7, 2018 • edited Loading

cmsmcq commented Sep 7, 2018

cmsmcq commented Feb 28, 2017 •

edited

Loading

cmsmcq commented Feb 28, 2017 •

edited

Loading

lb42 commented Mar 8, 2017 •

edited

Loading

emylonas commented Feb 26, 2018 •

edited

Loading

cmsmcq commented Apr 25, 2018 •

edited

Loading

raffazizzi commented Sep 7, 2018 •

edited

Loading