-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pure-ODD content model elements appear to be strictly weaker than regular languages -- intended? #1596
Comments
An empty content model is represented by and empty content element in pure odd |
Thank you. Other things being equal, I like being able to say explicitly 'this is empty', and not having an explicit notation for empty sequences does make it less convenient to perform operations on content models, but one can't have everything. Is there a way to denote the empty set, or does one need to define an element whose type is Hmm. The existence of such a workaround seems to mean that pure ODD does have the closure property I claimed it was missing, so the initial entry in this issue is wrong. It's not that the language cannot express the empty sequence or the empty set, only that it makes references to them inconvenient. I would propose to add empty elements to denote these concepts explicitly; their absence is not an error, but their presence would be an improvement. |
We did consider at one point introducing an element *lt;empty/> for this purpose. But it seemed needlessly verbose to require <content><empty/></content> when <content/> did the same job more economically. By all means put in a ticket requesting the introduction of an element <empty> : would it make sense to restrict this to the content model of <sequence or <alternate to avoid having two ways of doing the same thing? |
On this point I think RNG's XML syntax is reasonably well defined; it allows both
Well, not just But isn't this ticket already a ticket requesting the introduction of those elements? |
Let's resist the bikeshedding (I had to look this up, but now I know what it means I definitely don't want to do it). I propose to add an element <empty>, member of model.contentPart. I propose further that <content/> is considered synonymous with <content><empty/></content>. All in favour? |
Adding a new element to P5 is a complicated process, involving at least the following
I will happily do the first two of these, if Council approves, alone or with any Council member interested; I'd definitely appreciate some help with the third and the fourth is (to coin a phrase) beyond my pay grade. |
I still don't understand how allowing both this:
and this:
is an improvement over the current situation; multiple ways of doing the same thing are sometimes inevitable but not a good thing in themselves. |
MSM claimed earlier on this ticket that the availability of the second form was "more convenient" : I think he had in mind the processing of content models under modification, where (say) one component of a complex structure becomes empty as a result of deletions from a class |
I think if we added |
On Mar 27, 2017, at 10:19 AM, Hugh A. Cayless ***@***.***> wrote:
I think if we added <empty/>, we would have to deprecate and then disallow empty <content>. Allowing the two to coexist would lead only to confusion. Is adding this new element worth the pain?
YMMV, but empty elements and elements containing an explict empty element whose basic semantics is “yes, i know this is empty, that’s the way I want it” do co-exist in both RNG and XSD, which I think are the most widely used schema languages for XML.
So as a user, and as a writer of stylesheets which work with content models, I would be perfectly happy to have both
<content/>
and
<content><empty/></content>
Personally I prefer the latter, because I tend to find explicitness more easily readable, after a few weeks or years away from something, than the alternative.
As I have already suggested, trying to ensure that there is only one way to write a content model for a particular language is a losing effort. One may conceivably achieve it for this one case, but not for any more complex languages. (I point to the rules for facet-based restriction in XSD as an impressive example of how much effort a spec can put into a failed attempt to stop people from writing things that the authors of the spec think "don't make sense". It's not a path I recommend to those I wish well.)
................................................................
The crucial difference between the proposal and the status quo is that adding 'empty' and 'emptySet' allows expressions like
<choice><empty/><elementRef key="foo"/></choice>
<choice><emptySet/><elementRef key="foo"/></choice>
as the first step in simplifying expressions like the following (where CARTHAGE is an element to be deleted):
<choice>
<elementRef key="CARTHAGE" minOccurs="0"/>
<elementRef key="foo"/>
</choice>
...
<choice>
<elementRef key="CARTHAGE"/>
<elementRef key="foo"/>
</choice>
A further simplification is of course possible in both cases and would yield
<elementRef key="foo" minOccurs="0"/>
and
<elementRef key="foo"/>
................................................................
As for whether it’s worth doing or not — again, your mileage may vary.
One of the reasons given by some people (including some involved with TEI) for preferring RNG to XSD was that RNG content models are closed under set operations (union, difference, negation, intersection) while XSD and DTD content models are not (owing to the determinism rules they took over from SGML), and that RNG is more obviously governed (or informed) by the theory of formal languages (here: regular languages and regular expressions) than is XSD. An observer who took that claim at face falue might be surprised at the work the TEI has put in on replacing RNG with a new language for content models, which turns out NOT to be closed under set operations and NOT to include simple representations of basic concepts of formal language theory like the empty language and the empty string.
As has been observed, the change currently proposed appears to be more of a cosmetic change and simplification than a fundamental improvement in expressive power. It is worth doing only if the TEI cares about simplicity and aesthetics.
|
Michael, I'm trying to get a feel for cost v. benefit here. It sounds like
the addition of empty (and also emptySet?) would make some of the
content-model rewriting operations you've been talking about elsewhere
easier to implement. Is that true? The argument against is that we have to
do extra work to make the Stylesheets handle (e.g.) both <content/> and
<content><empty/></content> as equivalents. It's not necessarily all that
hard, but it is a bunch of work beyond just adding the element(s).
On Mon, Mar 27, 2017 at 2:02 PM, C. M. Sperberg-McQueen <
notifications@github.com> wrote:
…
> On Mar 27, 2017, at 10:19 AM, Hugh A. Cayless ***@***.***>
wrote:
>
> I think if we added <empty/>, we would have to deprecate and then
disallow empty <content>. Allowing the two to coexist would lead only to
confusion. Is adding this new element worth the pain?
YMMV, but empty elements and elements containing an explict empty element
whose basic semantics is “yes, i know this is empty, that’s the way I want
it” do co-exist in both RNG and XSD, which I think are the most widely used
schema languages for XML.
So as a user, and as a writer of stylesheets which work with content
models, I would be perfectly happy to have both
<content/>
and
<content><empty/></content>
Personally I prefer the latter, because I tend to find explicitness more
easily readable, after a few weeks or years away from something, than the
alternative.
As I have already suggested, trying to ensure that there is only one way
to write a content model for a particular language is a losing effort. One
may conceivably achieve it for this one case, but not for any more complex
languages. (I point to the rules for facet-based restriction in XSD as an
impressive example of how much effort a spec can put into a failed attempt
to stop people from writing things that the authors of the spec think
"don't make sense". It's not a path I recommend to those I wish well.)
................................................................
The crucial difference between the proposal and the status quo is that
adding 'empty' and 'emptySet' allows expressions like
<choice><empty/><elementRef key="foo"/></choice>
<choice><emptySet/><elementRef key="foo"/></choice>
as the first step in simplifying expressions like the following (where
CARTHAGE is an element to be deleted):
<choice>
<elementRef key="CARTHAGE" minOccurs="0"/>
<elementRef key="foo"/>
</choice>
...
<choice>
<elementRef key="CARTHAGE"/>
<elementRef key="foo"/>
</choice>
A further simplification is of course possible in both cases and would
yield
<elementRef key="foo" minOccurs="0"/>
and
<elementRef key="foo"/>
................................................................
As for whether it’s worth doing or not — again, your mileage may vary.
One of the reasons given by some people (including some involved with TEI)
for preferring RNG to XSD was that RNG content models are closed under set
operations (union, difference, negation, intersection) while XSD and DTD
content models are not (owing to the determinism rules they took over from
SGML), and that RNG is more obviously governed (or informed) by the theory
of formal languages (here: regular languages and regular expressions) than
is XSD. An observer who took that claim at face falue might be surprised at
the work the TEI has put in on replacing RNG with a new language for
content models, which turns out NOT to be closed under set operations and
NOT to include simple representations of basic concepts of formal language
theory like the empty language and the empty string.
As has been observed, the change currently proposed appears to be more of
a cosmetic change and simplification than a fundamental improvement in
expressive power. It is worth doing only if the TEI cares about simplicity
and aesthetics.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1596 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABbxd9iWgVcuSaDeG22pCFkUpKyU14Pks5rp_nJgaJpZM4MOk-v>
.
|
On Mar 27, 2017, at 2:06 PM, Hugh A. Cayless ***@***.***> wrote:
Michael, I'm trying to get a feel for cost v. benefit here. It sounds like
the addition of empty (and also emptySet?) would make some of the
content-model rewriting operations you've been talking about elsewhere
easier to implement. Is that true?
I think so.
The argument against is that we have to
do extra work to make the Stylesheets handle (e.g.) both <content/> and
<content><empty/></content> as equivalents. It's not necessarily all that
hard, but it is a bunch of work beyond just adding the element(s).
Yes, agreed.
********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************
|
I think |
I have now added a spec for
As noted above, we may want to deprecate the use of |
deprecate |
Step 1, the actual deprecation (to 2019-08-25) completed at 84a80e4. Still need to address the prose and add some remarks to the tagdoc. |
Prose (in TD) updated and remarks (to Over to you, @martindholmes, for adding a test to test suite. |
For the record, I repeat once more my view that deprecating empty content models is unnecessary, unhelpful, and harmful. It appears to be grounded in the view that it's better to have a mechanism for specifying content models in which there is only one way to define a given language. Such a mechanism is not realistically feasible, and the idea that it would be better if it were feasible is at best dubious. Some arguing for it have feared that having two ways to say the same thing will be confusing, but no one has provided any evidence that users of Relax NG (including creators of TEI ODD documents during the years the ODDs used RNG to specify content models) have been seriously confused by having more than one way to write empty-sequence content models, or more than one way to write x?, x+, or x*. When in the future someone asks how this particular ad-hoc rule came to infest the grammar of pure ODDs, and someone else answers "Oh, it was to resolve a bug raised by Michael," I would like the record to show that Michael had nothing to do with the deprecation of empty content models. |
@cmsmcq — I don’t think anyone is imagining that requiring And, for the record, I lean towards the “one way is better” logic that Martin expressed when he raised this concern, but it is not why I am in favor. I am in favor because using But it sounds like you disagree … is there an advantage to using @martindholmes: I think Tests2/ is sufficient, myself, but that’s in part to help pressure us (you) to make Test2/ the real testing paradign. :-) |
@sydb We can go with Test2 for now, then, but we should make sure we don't end up in a situation where I'm the only one who can easily work on Test2. I've done my best to make it easy to work with, but the longer it's just me, the less likely it is that it'll be generally comprehensible. Do you fancy adding this test yourself to the new build_odd.xml file (assuming that's where it belongs)? |
@sydb No, I think the advantage lies in simple rules with as few special cases as one can manage. I agree with your view that explicit elements are more explicit, and I personally find them clearer; I disagree with the apparent view that the definition of pure ODDs should include ad-hoc rules to enforce your stylistic preferences, or mine. |
@cmsmcq Is any actual harm done if the way TEI indicates an empty content model is with |
@sydb exactly what am I testing here? |
I must have been asleep when deprecation of empty <content> was agreed. My opinion may be coloured by the fact that all my ODD tutorials will now need to be revised, but I still think that was an unnecessary and mistaken move. Should we now expect Council to move towards deprecation of numbered divs and other instances where the TEI has long offered more than one way to do it? |
@lb42 I hope so. :-) |
@martindholmes that does not surprise me either :-( |
@hcayless I am sorry to see that my efforts at stating my views clearly have been unsuccessful. There is no harm done that I can see in allowing the use of an explicit signal for emptiness in the form of Whether any "actual harm" is done (you didn't ask this explicitly, but I gather it's what you meant to ask) by disallowing Whether any "actual harm" is done (you didn't ask this explicitly, but I gather it's what you meant to ask) by disallowing [Overlong discussion of the harms and analogies with design mistakes of other specs deleted here.] |
I think there may be an actual additional advantage to ODD processing in this decision. Imagine we have an element whose content model is not intended to be empty. During an ODD-chaining process, all of the elements permitted inside it are eventually deleted -- basically unintentionally as a by-product of other decisions, but leaving an empty content model. The ODD processor in this case would most likely end up outputting This obviously depends on the design of ODD processing, though. A clever ODD processor might detect an empty content model and helpfully supply |
"An element whose content model is not intended to be empty" would presumably express this using @minOccurs and @maxOccurs at some point, and so in the case I think you are hypothesizing the resulting schema would be invalid. There is a well known problem when a content model requires "one or more" members of some class, but all members of the class have been deleted during ODD processing. I don't see how this issue helps with that though. |
Tokyo F2F: council agrees that reducing complexity in ODD processing is a priority over language expressiveness. Therefore we decided to keep |
Thank you for considering this change. I accept the result, although I am disappointed that the council did not see things my way and believe the given rationale to be self-contradictory. The absence of notations for the empty sequence and the empty set makes the processing of content models more complex, not less complex; some of the bugs I have reported elsewhere in the behavior of the current ODD processors (such as those connected with the deletion of elements from content models) would have been easier to fix had the suggested change been made, and I suspect that the absence of those notations may have been part of the reason the bugs got into the stylesheets in the first place. If the Council seeks to simplify the processing of ODDs, the way to do so is to remove special cases from the language, not to preserve them. Thank you again for your time. |
Change deprecation of <content/> to making it invalid as deprecation period is now over.
Change deprecation of <content/> to making it invalid as deprecation period is now over.
The elements used for content models in pure ODDs include, if I understand chapter 22 correctly:
For
sequence
andalternate
, the content models require at least one child, and the Schematron rules require at least two.That means that
<sequence/>
and<alternate/>
cannot be used, as currently defined, to denote sequences of length 0 and sets of cardinality 0, respectively. Relax NG also requires sequences and alternations (group
andchoice
elements) to contain children, but it provides theempty
andnotAllowed
elements to denote empty sequences and empty sets.Are there other ways to denote the empty sequence and the empty set in pure ODDs?
If not, then pure ODDs appear to have lost one of the nice features of Relax NG, that content models are closed under the standard set operations (intersection, union, complementation, set difference), and that any regular language over elements, typed data, and text nodes can be defined by a content model.
Is the absence of closure in pure ODD intentional?
The text was updated successfully, but these errors were encountered: