Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure-ODD content model elements appear to be strictly weaker than regular languages -- intended? #1596

Closed
cmsmcq opened this issue Feb 28, 2017 · 32 comments

Comments

@cmsmcq
Copy link

cmsmcq commented Feb 28, 2017

The elements used for content models in pure ODDs include, if I understand chapter 22 correctly:

  • Elements denoting atomic units of the content model (anyElement, dataRef, elementRef, textNode, valItem).
  • Elements denoting sequences, choices, or all-groups of atomic units (alternate, classRef, sequence, valList)
  • Elements designed for textual operations (macroRef).

For sequence and alternate, the content models require at least one child, and the Schematron rules require at least two.

That means that <sequence/> and <alternate/> cannot be used, as currently defined, to denote sequences of length 0 and sets of cardinality 0, respectively. Relax NG also requires sequences and alternations (group and choice elements) to contain children, but it provides the empty and notAllowed elements to denote empty sequences and empty sets.

Are there other ways to denote the empty sequence and the empty set in pure ODDs?

If not, then pure ODDs appear to have lost one of the nice features of Relax NG, that content models are closed under the standard set operations (intersection, union, complementation, set difference), and that any regular language over elements, typed data, and text nodes can be defined by a content model.

Is the absence of closure in pure ODD intentional?

@lb42
Copy link
Member

lb42 commented Feb 28, 2017

An empty content model is represented by and empty content element in pure odd

@cmsmcq
Copy link
Author

cmsmcq commented Feb 28, 2017

Thank you. Other things being equal, I like being able to say explicitly 'this is empty', and not having an explicit notation for empty sequences does make it less convenient to perform operations on content models, but one can't have everything.

Is there a way to denote the empty set, or does one need to define an element whose type is xs:error or use some similar workaround?

Hmm. The existence of such a workaround seems to mean that pure ODD does have the closure property I claimed it was missing, so the initial entry in this issue is wrong. It's not that the language cannot express the empty sequence or the empty set, only that it makes references to them inconvenient.

I would propose to add empty elements to denote these concepts explicitly; their absence is not an error, but their presence would be an improvement.

@lb42
Copy link
Member

lb42 commented Feb 28, 2017

We did consider at one point introducing an element *lt;empty/> for this purpose. But it seemed needlessly verbose to require <content><empty/></content> when <content/> did the same job more economically. By all means put in a ticket requesting the introduction of an element <empty> : would it make sense to restrict this to the content model of <sequence or <alternate to avoid having two ways of doing the same thing?

@cmsmcq
Copy link
Author

cmsmcq commented Feb 28, 2017

would it make sense to restrict this to the content model of <sequence or <alternate to avoid having two ways of doing the same thing?

On this point I think RNG's XML syntax is reasonably well defined; it allows both <element name="E"/> and <element name="E"><empty/></element>, so that emptiness can be either implicit or explicit. If one were to require that there be only one way of doing it, then I think an explicit <empty/> element would be better than an implicit emptiness. (But with regular languages, there is almost always more than one way to say the same thing; I don't think that's avoidable. For every language described by expression e the same language is described by (e|e) and (e|∅) and (e, ε) and an infinite number of other expressions.)

By all means put in a ticket requesting the introduction of an element <empty>

Well, not just <empty/> but also <emptySet/> (or <nil/> or <nihil/> or <null/> or <void/> or <nonviable/> or <nugatory/> or whatever one wants to call it -- lots of bikeshedding opportunities here).

But isn't this ticket already a ticket requesting the introduction of those elements?

@lb42
Copy link
Member

lb42 commented Mar 8, 2017

Let's resist the bikeshedding (I had to look this up, but now I know what it means I definitely don't want to do it). I propose to add an element <empty>, member of model.contentPart. I propose further that <content/> is considered synonymous with <content><empty/></content>. All in favour?

@lb42
Copy link
Member

lb42 commented Mar 22, 2017

Adding a new element to P5 is a complicated process, involving at least the following

  • providing a new <elementSpec>
  • adding a reference to that spec and some discussion of it in the appropriate chapter (in this case TD)
  • providing at least one example of its use
  • (in this case) modifying the stylesheets, specifically those generating schemas from ODD, to take notice of its presence in a document

I will happily do the first two of these, if Council approves, alone or with any Council member interested; I'd definitely appreciate some help with the third and the fourth is (to coin a phrase) beyond my pay grade.

@martindholmes
Copy link
Contributor

I still don't understand how allowing both this:

<content/>

and this:

<content><empty/></content>

is an improvement over the current situation; multiple ways of doing the same thing are sometimes inevitable but not a good thing in themselves.

@lb42
Copy link
Member

lb42 commented Mar 27, 2017

MSM claimed earlier on this ticket that the availability of the second form was "more convenient" : I think he had in mind the processing of content models under modification, where (say) one component of a complex structure becomes empty as a result of deletions from a class

@hcayless
Copy link
Member

I think if we added <empty/>, we would have to deprecate and then disallow empty <content>. Allowing the two to coexist would lead only to confusion. Is adding this new element worth the pain?

@cmsmcq
Copy link
Author

cmsmcq commented Mar 27, 2017 via email

@hcayless
Copy link
Member

hcayless commented Mar 27, 2017 via email

@cmsmcq
Copy link
Author

cmsmcq commented Mar 27, 2017 via email

@jamescummings
Copy link
Member

I think <empty/> makes sense and agree that then using <content/> should not be catered for.

@lb42
Copy link
Member

lb42 commented May 3, 2017

I have now added a spec for <empty> and the following brief discussion to TD

<p>In the simplest case, an element may have no content. This may be indicated by supplying an empty <gi>content</gi> element. 
 It may however be considered preferable to indicate this explicitly by supplying a <gi>content</gi> element containing the <gi>empty</gi> element. 

As noted above, we may want to deprecate the use of <content/> in which case the above text will need revision ; however that would not be advisable till the Stylesheets have been adjusted to implement processing of <empty/>. Am raising a new Stylesheets ticket on that subject.

@emylonas
Copy link
Contributor

emylonas commented Feb 26, 2018

deprecate <content/> (@sydb)
add test suite for <empty/> in stylesheets (@martindholmes)

sydb added a commit that referenced this issue Apr 24, 2018
Deprecate use of <content> w/o any content. (Use a child <empty> to indicate the element being defined should be empty.) Deprecated until 2019-08-25
@sydb
Copy link
Member

sydb commented Apr 24, 2018

Step 1, the actual deprecation (to 2019-08-25) completed at 84a80e4. Still need to address the prose and add some remarks to the tagdoc.
Built and tested locally w/o problem. However, folks may wish to wordsmith the message provided by the Schematron assertion (in content.xml).

sydb added a commit that referenced this issue Apr 24, 2018
Update prose in TD and add remarks in tagdoc for <content>.
@sydb
Copy link
Member

sydb commented Apr 24, 2018

Prose (in TD) updated and remarks (to <content>’s tagdoc) added at 6c27e74.

Over to you, @martindholmes, for adding a test to test suite.

@cmsmcq
Copy link
Author

cmsmcq commented Apr 25, 2018

For the record, I repeat once more my view that deprecating empty content models is unnecessary, unhelpful, and harmful.

It appears to be grounded in the view that it's better to have a mechanism for specifying content models in which there is only one way to define a given language. Such a mechanism is not realistically feasible, and the idea that it would be better if it were feasible is at best dubious. Some arguing for it have feared that having two ways to say the same thing will be confusing, but no one has provided any evidence that users of Relax NG (including creators of TEI ODD documents during the years the ODDs used RNG to specify content models) have been seriously confused by having more than one way to write empty-sequence content models, or more than one way to write x?, x+, or x*.

When in the future someone asks how this particular ad-hoc rule came to infest the grammar of pure ODDs, and someone else answers "Oh, it was to resolve a bug raised by Michael," I would like the record to show that Michael had nothing to do with the deprecation of empty content models.

@martindholmes
Copy link
Contributor

@sydb and @emylonas : should I add this to Test2 or to Test? (I'd prefer the former, of course, but those tests don't currently run during the build process.)

@sydb
Copy link
Member

sydb commented Apr 25, 2018

@cmsmcq — I don’t think anyone is imagining that requiring <content><empty/></content> (or <content><rng:empty/></content>) rather than just <content/> to indicate an empty content model resolves the issues raised in the rest of this ticket. It’s just that it is a complaint that came up in the discussion which we could handle quickly.

And, for the record, I lean towards the “one way is better” logic that Martin expressed when he raised this concern, but it is not why I am in favor. I am in favor because using <empty/> (or <rng:empty>) is more explicit and more obvious.

But it sounds like you disagree … is there an advantage to using <content/> that we’re not seeing?

@martindholmes: I think Tests2/ is sufficient, myself, but that’s in part to help pressure us (you) to make Test2/ the real testing paradign. :-)

@martindholmes
Copy link
Contributor

@sydb We can go with Test2 for now, then, but we should make sure we don't end up in a situation where I'm the only one who can easily work on Test2. I've done my best to make it easy to work with, but the longer it's just me, the less likely it is that it'll be generally comprehensible. Do you fancy adding this test yourself to the new build_odd.xml file (assuming that's where it belongs)?

@cmsmcq
Copy link
Author

cmsmcq commented Apr 25, 2018

@sydb No, I think the advantage lies in simple rules with as few special cases as one can manage.

I agree with your view that explicit elements are more explicit, and I personally find them clearer; I disagree with the apparent view that the definition of pure ODDs should include ad-hoc rules to enforce your stylistic preferences, or mine.

@hcayless
Copy link
Member

@cmsmcq Is any actual harm done if the way TEI indicates an empty content model is with <content><empty/></content> rather than <content/>? Granted, there might be other ways you could write a functionally empty content model, and that's fine. We're only talking about disallowing (via deprecation) one of those. I don't see the problem. What am I missing here?

@martindholmes
Copy link
Contributor

@sydb exactly what am I testing here? <content/> is already deprecated, and the deprecation message is working; it'll be invalid after 2019-08-25. Are we testing that <content><empty/></content> is processed correctly?

@lb42
Copy link
Member

lb42 commented Apr 25, 2018

I must have been asleep when deprecation of empty <content> was agreed. My opinion may be coloured by the fact that all my ODD tutorials will now need to be revised, but I still think that was an unnecessary and mistaken move. Should we now expect Council to move towards deprecation of numbered divs and other instances where the TEI has long offered more than one way to do it?

@martindholmes
Copy link
Contributor

@lb42 I hope so. :-)

@lb42
Copy link
Member

lb42 commented Apr 25, 2018

@martindholmes that does not surprise me either :-(

@cmsmcq
Copy link
Author

cmsmcq commented Apr 25, 2018

@hcayless I am sorry to see that my efforts at stating my views clearly have been unsuccessful.

There is no harm done that I can see in allowing the use of an explicit signal for emptiness in the form of <content><empty/></content>; on the contrary, I think it's an improvement and filed this bug report in the hopes of persuading people to add such an explicit signal (and also one for the empty set).

Whether any "actual harm" is done (you didn't ask this explicitly, but I gather it's what you meant to ask) by disallowing <content/> as a way of expressing the same thing will depend on what one understands "actual harm" to be.

Whether any "actual harm" is done (you didn't ask this explicitly, but I gather it's what you meant to ask) by disallowing <content/> as a way of expressing the same thing will depend on what one understands "actual harm" to be. The harms I see are to the simplicity and clarity of the design; I think those are actual enough.

[Overlong discussion of the harms and analogies with design mistakes of other specs deleted here.]

@martindholmes
Copy link
Contributor

I think there may be an actual additional advantage to ODD processing in this decision.

Imagine we have an element whose content model is not intended to be empty. During an ODD-chaining process, all of the elements permitted inside it are eventually deleted -- basically unintentionally as a by-product of other decisions, but leaving an empty content model. The ODD processor in this case would most likely end up outputting <content/> or <content></content>. Under the new restriction, this would be invalid, so the nature of the problem would be clear immediately. Without the restriction, the problem is harder to diagnose.

This obviously depends on the design of ODD processing, though. A clever ODD processor might detect an empty content model and helpfully supply <empty/>.

@lb42
Copy link
Member

lb42 commented Apr 26, 2018

"An element whose content model is not intended to be empty" would presumably express this using @minOccurs and @maxOccurs at some point, and so in the case I think you are hypothesizing the resulting schema would be invalid. There is a well known problem when a content model requires "one or more" members of some class, but all members of the class have been deleted during ODD processing. I don't see how this issue helps with that though.

@raffazizzi
Copy link
Contributor

raffazizzi commented Sep 7, 2018

Tokyo F2F: council agrees that reducing complexity in ODD processing is a priority over language expressiveness. Therefore we decided to keep <sequence/> and <alternate/> unable to denote sequences of length 0 and sets of cardinality 0.

@cmsmcq
Copy link
Author

cmsmcq commented Sep 7, 2018

Thank you for considering this change. I accept the result, although I am disappointed that the council did not see things my way and believe the given rationale to be self-contradictory.

The absence of notations for the empty sequence and the empty set makes the processing of content models more complex, not less complex; some of the bugs I have reported elsewhere in the behavior of the current ODD processors (such as those connected with the deletion of elements from content models) would have been easier to fix had the suggested change been made, and I suspect that the absence of those notations may have been part of the reason the bugs got into the stylesheets in the first place. If the Council seeks to simplify the processing of ODDs, the way to do so is to remove special cases from the language, not to preserve them.

Thank you again for your time.

@martinascholger martinascholger added this to the Guidelines 3.5.0 milestone Jan 20, 2019
sydb added a commit that referenced this issue Mar 21, 2019
Remove vestigal remarks about previous practice.
sydb added a commit that referenced this issue Aug 27, 2019
Change deprecation of <content/> to making it invalid as deprecation period is now over.
hcayless pushed a commit that referenced this issue Aug 11, 2022
Change deprecation of <content/> to making it invalid as deprecation period is now over.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants