Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an interleave element #2154

Open
joeytakeda opened this issue May 22, 2021 · 17 comments · May be fixed by #2538
Open

Create an interleave element #2154

joeytakeda opened this issue May 22, 2021 · 17 comments · May be fixed by #2538

Comments

@joeytakeda
Copy link
Contributor

joeytakeda commented May 22, 2021

Currently @preserveOrder on <sequence> is an optional attribute and the spec states: "if true, indicates that the order in which component elements of a sequence appear in a document must correspond to the order in which they are given in the content model." However, the default behaviour for <sequence> is that order is preserved unless @preserveOrder is false, but the current description makes it seem that the opposite is true (i.e. that order doesn't matter by default); plus, ​setting to false hasn't worked since at least 2017 per TEIC/Stylesheets#241.

Since @preserveOrder is implicitly true, I think that it should gain a default value of true so that it states explicitly what's already happening; I would also suggest that the description be changed to reflect that "true" is default and also outline what happens when its value is "false".

@martindholmes
Copy link
Contributor

martindholmes commented Jun 18, 2021

@sydb and I believe that:

  • The word sequence MEANS ordered, so we should dispense with @preserveOrder completely;
  • Instead, we should introduce a new <bag> element which would do the same job: in other words, specify that all child items must be present in the number(s) assigned through their @minOccurs and @maxOccurs attributes, but the order of them is not significant.

Advantages:

  • The current processing of <sequence> can be left alone, and its semantics will match its name.
  • Processing of <bag> can be implemented separately and will be easier to maintain.

Disadvantages:

  • There will be some overlap between <bag> and <alternate>, depending on how @minOccurs and @maxOccurs are configured.

@sydb
Copy link
Member

sydb commented Jun 18, 2021

Note that we chose the name <bag> based on the definitions used in the 2nd paragraph of 18.7 Collections as Complex Feature Values. (Yes, value collections are quite different from content models, but the underlying idea that a bag is an unordered collection with duplicates seems to match what we are doing, here.)

Thus:

    <alternate minOccurs="1" maxOccurs="1">
      <bag minOccurs="1" maxOccurs="1">
	<elementRef key="sic" minOccurs="1" maxOccurs="1"/>
	<elementRef key="corr" minOccurs="1" maxOccurs="1"/>
      </bag>
      <bag minOccurs="1" maxOccurs="1">
	<elementRef key="abbr" minOccurs="1" maxOccurs="1"/>
	<elementRef key="expan" minOccurs="1" maxOccurs="1"/>
      </bag>
      <bag minOccurs="1" maxOccurs="1">
	<elementRef key="orig" minOccurs="1" maxOccurs="1"/>
	<elementRef key="reg" minOccurs="1" maxOccurs="1"/>
      </bag>
    </alternate>

would be a perfectly reasonable customized content model for <choice>.
The following content model requires that there be 2 <three> elements.

    <bag>
      <elementRef key="one"/>
      <elementRef key="two"/>
      <elementRef key="five"/>
      <elementRef key="three"/>
      <elementRef key="three"/>
    </bag>

and thus is precisely equivalent to

    <bag>
      <elementRef key="one" minOccurs="1" maxOccurs="1"/>
      <elementRef key="two" minOccurs="1" maxOccurs="1"/>
      <elementRef key="five" minOccurs="1" maxOccurs="1"/>
      <elementRef key="three" minOccurs="2" maxOccurs="2"/>
    </bag>

What remains to be seen is how well this maps to the RELAX NG <interleave> (aka, the & connector). My instinct is that it maps perfectly well when all the children of <bag> are <elementRef>s, but may become problematic when there are child <classRef>s or <macroRef>s.

@ebeshero
Copy link
Member

ebeshero commented Jun 18, 2021

Hmmm. My Relax NG antennae are a little skittish here. Can’t <alternate> handle this successfully if we just get rid of @preserveOrder? I’m wondering if a new element is really necessary. @sydb @martindholmes

@martindholmes
Copy link
Contributor

@ebeshero <alternate> has a really confusing name, because it's both a noun and a verb; the noun better describes its function (either/or). I would love to replace <alternate> with <choice>, or <either>.

@sydb
Copy link
Member

sydb commented Jun 18, 2021

Right. Which is to say that <alternate> means “only one of my children” and <bag> means “one of each of my children”. So a <bag> with only one child would mean exactly the same as an <alternate> with the same child. (But at least in the case of <alternate> that would be invalid: it must have 2 or more children.)

So the semantics are different enough that a fully fledged schema language should have both capabilities. And we have tried to provide them with @preserveOrder, but as @joeytakeda has pointed out, we didn’t quite get it right, and as @martindholmes and I point out, the naming is screwy, anyway.

We do not actually use the “one of each of my children” semantics in the Guidelines at all. (In large part because XML DTDs don’t have this concept. (SGML DTDs did.)) But we should provide the capability to ODD writers. (We thought it was necessary when we added @preserveOrder.)

@ebeshero
Copy link
Member

Thanks for explaining the distinction! I was thinking of <bag> as allowing zero or more of a set of child elements in any order, and imagining that could be done with <alternate>.

I like the flexibility and constraint of this <bag> idea, if the default is as you say one of each of these. I think an ODD customization might also want to revise and make some of the members of the bag optional with @minoccurs set to zero.

@lb42
Copy link
Member

lb42 commented Jun 20, 2021

I am cautiously positive about the proposal to introduce <bag> and remove @preserveOrder. A wrinkle that needs to be resolved however is whether or not the bagginess goes all the way down, i.e. is

<bag>
<elementRef key="one" maxOccurs="1"/>
<elementRef key="two" maxOccurs="2"/>
</bag>

satisfied by both of the following, or by only the first?

<two/><two/><one/>
<two/><one/><two/>

@martindholmes
Copy link
Contributor

@lb42 I would say both are OK. My convoluted logic is that if you specify minOccurs = "0" on something, then you can have the absence of that element anywhere in the container. :-) So if you have two of something, either can be anywhere. @sydb, @joeytakeda?

@lb42
Copy link
Member

lb42 commented Jun 20, 2021

Arguing from the case where absence is permissible anywhere doesn't really persuade me. Absence is not the same as presence: for one thing, two absences are not distinguishable from one!
However, looking up the definitions of bags and sets at https://cs.appstate.edu/~dap/classes/1100/sect2_2.html I learn that "Two bags A and B are equal if the number of occurrences of each element in A is the same as in B."
[a, b, c, c] = [b, c, a, c] = [c, c, b, a]
So that supports your interpretation. I don't like it, but I'll have to accept it!

@sydb
Copy link
Member

sydb commented Jun 22, 2021

Thank you for that research, @lb42. We don’t have to accept that definition, of course. We could decide that a new “one of each” element that has a child with @maxOccurs > 1 requires they (the elements that satisfy those occurrences) be adjacent. But we shouldn’t call it (the new “one of each” element) a <bag>, then.

However, the obvious representation of our new toy in RELAX NG is <interleave>. Thus I imagine the RNC that corresponds to

<bag>
  <elementRef key="one" minOccurs="1" maxOccurs="1"/>
  <elementRef key="two" minOccurs="1" maxOccurs="2"/>
</bag>

would be ( one & ( two, two? ) ), which permits both <two/><two/><one/> and <two/><one/><two/>.

@lb42
Copy link
Member

lb42 commented Jun 22, 2021

This reminds me that another possible name for this new element might be <interleave> of course. Which is what was originally proposed back in 2015 or thereabouts. (a search through the Council list archives for the word "interleave" is quite instructive)

@hcayless
Copy link
Member

Commenting to bump this back to people's attention, and also because:

I've been looking into this a bit, and I'm not sure "bag" is a safe name for what's going on. RelaxNG <interleave> seems more like a set (no order, no duplicate children). You can't have duplicate element defs/refs in an RNG interleave (though you can use, e.g., <zeroOrMore> patterns, so that a content model using it would permit multiple children with the same element name).

"Bag" (to my mind) implies "one or more of each child pattern in any order," which is not what RNG interleave does. It means (I think) "one of each child pattern, in any order, and patterns may not overlap". "Set" might be ok, though perhaps it risks causing confusion. Or, as @lb42 suggests, just use "interleave".

One other wrinkle that occurs to me: checking the validity of the content of this thing, whatever we call it, will entail expanding references, since you have to ensure there's no overlap.

@ebeshero
Copy link
Member

ebeshero commented Mar 5, 2023

After some discussion on the TEI list (introduced by Daniel Schopper here: https://listserv.brown.edu/cgi-bin/wa?A2=ind2303&L=TEI-L&P=741 and continuing) we're liking idea of introducing <interleave>.

@hcayless
Copy link
Member

Since this isn't likely to be completed in time for the next release, as a band aid I'm going to fix the description on @preserveOrder to read:

if false, indicates that component elements of a sequence may occur in any order.

hcayless added a commit that referenced this issue Mar 21, 2023
@ebeshero
Copy link
Member

ebeshero commented May 8, 2023

Council decides on 2023-05-08 F2f that we should:

  • Rename this issue to say we should create interleave element (as per Create an interleave element #2154)
  • Add a Stylesheets issue for converting tei:interleave to rng:interleave
  • Deprecate <tei:sequence preserveOrder="false">
    Allow any of msContents/physDesc/history/additional in any order any number of times, but then create a Schematron Warning to suggest any of these only used once.

@ebeshero ebeshero changed the title Description for sequence/@preserveOrder is misleading Create an interleave element: Description for sequence/@preserveOrder is misleading May 8, 2023
@ebeshero ebeshero changed the title Create an interleave element: Description for sequence/@preserveOrder is misleading Create an interleave element May 8, 2023
@trishaoconnor
Copy link
Contributor

Created stylesheet issue for converting tei:interleave to rng:interleave: [(https://github.com/TEIC/Stylesheets/issues/609)].

joeytakeda added a commit that referenced this issue Mar 17, 2024
* New elementSpec and updated TD
* Still (possibly) need to address hcayless' point about validity of the construct (i.e. no overlap)
@joeytakeda
Copy link
Contributor Author

To summarize work so far:

I haven't yet addressed @hcayless 's point about validation though. My inclination is that this is a job for the ODD processor (though we could add some schematron now to catch simple cases), but we should probably take a look at the RelaxNG spec for some guidance as to what we can catch early on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment