Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are the classes of clean and unclean modifications not complementary? #1588

Closed
cmsmcq opened this issue Feb 16, 2017 · 27 comments
Closed
Assignees

Comments

@cmsmcq
Copy link

cmsmcq commented Feb 16, 2017

The class of clean modifications of TEI is defined in 23.3.1 Kinds of Modification as those which define document languages which are subsets of the language defined by the unmodified TEI schema:

We use the term clean modification to describe a modification which regards as valid a subset of the documents considered valid by the same combination of TEI modules unmodified.

In normal usage, the class of unclean modifications ought, I believe, to be the complement of the class of clean modifications and to include all modifications whose document language is not a subset of the TEI. That may be what is intended in 23.3.1, though it's hard to tell. The definition of unclean modifications is currently made difficult to understand by what appears to be a contradiction in the definition between the term "disjoint" and the apparent gloss "neither being properly contained by the other". If the term "disjoint" is taken as correct, then unclean modifications are those which define document languages disjoint from the unmodified TEI. If the gloss is taken as correct, then unclean modifications are those in which neither the modified language nor the unmodified language is a superset of the other. In neither case is 'unclean' a complement of 'clean', because on both interpretations the set of modifications for which L(modification) is a proper superset of L(tei) is neither clean nor unclean.

(The definition of 'unclean' to exclude modifications for which L(mod) is a proper superset of L(tei) may perhaps be a relic of the definition of 'clean' in TEI P3. In P3, 'clean' modifications are all those for which the modified and unmodified schemas define languages which are in a subset/superset relation to each other, regardless of directionality: P3 defines both clean restrictions and clean extensions, and defines unclean modifications as those which are neither clean restrictions nor clean extensions. It looks as if at some point the text was changed to restrict 'cleanliness' to subsets of the TEI, defining the class of 'clean extensions' out of existence, without taking account of the need to modify the characterization of 'unclean' modifications as well, to preserve the natural relation of 'clean' and 'unclean'. But that's merely speculation and not relevant to the substantive issue here.)

Why are the terms 'clean' and 'unclean' not complements of each other?

Proposal: they should be complements.

The exposition could go something like this.

For every modification, there is (a) a 'modified schema' defined by the modification, and (b) a 'corresponding unmodified schema' which includes the same modules (and where appropriate the same elements in each module).

Clean modifications are those which obey the rule that every document valid against the modified schema is valid against the corresponding unmodified schema. Unclean modifications are those modifications which are not clean.

If it is desired to expound on the various ways in which a modification can be unclean, the text can go on to say that unclean modifications may define document languages which are supersets of the unmodified TEI, or languages which overlap with unmodified TEI without being a superset, or languages which are disjoint from unmodified TEI.

@lb42
Copy link
Member

lb42 commented Feb 17, 2017

I'm not sure that this is necessary, elegant though it appears. Unless I misunderstand it, the concept of "unmodified schema" is unnecessary, since we have TEI-All.

@cmsmcq
Copy link
Author

cmsmcq commented Feb 17, 2017

I'm not proposing spec prose, just sketching a line of argument or exposition. The concept of an unmodified schema is already appealed to in the current text; it's just not defined or given a name or otherwise made easy to talk about.

@lb42
Copy link
Member

lb42 commented Mar 8, 2017

Sorry, I disagree. For any TEI subset the corresponding "unmodified" schema would be TEI All. For one which is an extension (i.e. which adds non-TEI components) this is also true. Or do we think that the schema from which non-TEI components are drawn is also concerned? In other words, if I pull in SVG is the unmodified schema now the union of TEI All and SVG? Ouch.
I propose looking for references to "unmodified schema" and changing them to reference TEI All.

@cmsmcq
Copy link
Author

cmsmcq commented Mar 8, 2017

If you think the current text doesn't talk about a corresponding unmodified schema, then I think you need to read the text more carefully.

If you think it's worth anyone's while looking for references to "unmodified schema", however, you cannot possibly think that.

So when you say "I disagree", you mean what, exactly?

@lb42
Copy link
Member

lb42 commented Mar 8, 2017

i'm disagreeing with the assertion that "The concept of an unmodified schema is ... not defined or given a name". It is. The name is TEI All. But you're right to say that this name is missing from the discussion of the concept in ch 23: that needs to be fixed.

@lb42
Copy link
Member

lb42 commented Mar 22, 2017

Does the revised wording proposed in #1587 also satisfy this issue? If so, this issue could be closed.

@cmsmcq
Copy link
Author

cmsmcq commented Mar 28, 2017

The changes proposed for #1587 do seem to define clean and unclean modifications as complementary classes; that appears on the face of it to resolve this issue. It's not clear to me that all uses of "unclean" in chapter 23 are consistent with those changes, but I have not checked them in detail.

@lb42 lb42 added the Status: Go label Apr 2, 2017
@lb42
Copy link
Member

lb42 commented Apr 2, 2017

On the assumption that #1587 is accepted, and with the proviso that usages of "clean"/"unclean" need to be checked, I am marking this one green.

@jamescummings
Copy link
Member

I would agree that the uses of clean/unclean need to be checked, and that tei_all should be mentioned as the equivalent of an 'unmodified schema' and links given to it (and other exemplar customisations which should be mention in the guidelines more).

@raffazizzi
Copy link
Contributor

Not to throw more fuel on the fire, but what about switching from clean/unclean (it sounds biblical and somewhat judgmental) to conformant/non-conformant once we clearly agree on and define conformance?

@lb42
Copy link
Member

lb42 commented Apr 5, 2017

"once we clearly agree on and define conformance" -- so your proposal is to defer this to the Greek Kalends?

@raffazizzi
Copy link
Contributor

raffazizzi commented Apr 5, 2017

I hope sooner than that, since it's clearly a key concept. Adopting a different terminology for what appears to be the same concept seems like a work-around that will bite us back in the long term.

@lb42
Copy link
Member

lb42 commented Apr 5, 2017

I was just being sarky about the notion that we had "clearly agreed" on how to define conformance... but you are quite right, of course. And I have no problem with retiring the term clean/unclean as long as we continue to support a notion of "conformance" which permits extension.

@lb42
Copy link
Member

lb42 commented Apr 7, 2017

clean/unclean definition modified as per 1587, so the concepts are now defined as complementary. However, some usages contradict these definitions. An extension which adds new elements or attributes is defined in one place as ipso facto "unclean", in another as being "clean" provided that the extensions are in another namespace. A third term may be needed.

@peterstadler peterstadler added this to the Guidelines 3.7.0 milestone Sep 16, 2019
@lb42
Copy link
Member

lb42 commented Feb 6, 2020

Working my way through this slowly, at last. First question though:

"Adding a new attribute to a class however can be a clean modification only if the new attribute is labelled as belonging to some namespace other than the TEI."

Attributes defined by the TEI are not in any namespace, of course, and the null namespace is up for colonisation by anyone (e.g. xml:). The reason for the quoted prohibition is presumably so that a TEI-aware application can tell that it's safe to ignore an attribute it doesn't know about, and for consistency with the rule about adding new non-TEI elements. But it is of course unenforceable. Do we all still believe it?

@lb42
Copy link
Member

lb42 commented Feb 6, 2020

Secondly (maybe a new ticket is needed) : I propose removing the section of renaming of TEI element names. It's unclear how this is supposed to work -- should they be in a new namespace or not? -- and I don't believe we currently support this in any tools. It's in the same category as @equiv -- something we provide in tagdocs but don't fully specify how to use.

@sydb
Copy link
Member

sydb commented Feb 6, 2020

I presume (w/o re-reading entire ticket or thinking this through very carefully) that what was intended was “Adding a new attribute can be a clean modification only if the new attribute belongs to some namespace other than a TEI namespace or the null namespace.” or some such.

And why would such a rule be any less enforceable than any of the other rules about conformance?

lb42 added a commit that referenced this issue Feb 6, 2020
@lb42
Copy link
Member

lb42 commented Feb 6, 2020

clean/unclean terminology retained, introduced TEI-All, removed all reference to renaming at 7f79765.

@lb42 lb42 closed this as completed Feb 6, 2020
@jamescummings
Copy link
Member

Re: Renaming TEI Elements.

I don't understand what you mean by us not supporting the renaming of elements (and attributes). Both Old Roma and Roma Beta do so. I certainly used it with for the tei_corset ODD for the data capture phase of the Stationers' Register Online project. c.f. https://github.com/jamescummings/conluvies/blob/master/tei_corset/tei_corset.odd#L731-L737 where I use altIdent to rename 'div' to 'd' and the 'type' attribute to 't'. (Keying company was charging per KB of output regardless of schema.)

Whether this makes that schema non-conformant or not, dirty or clean, I have no real opinion about. I think that given I was documenting this in an ODD using the tagdocs suggested mechanism, that Roma happily renamed them for me, and that I was using equiv to point to an XSL to revert document instances of said renaming back to original TEI, means that if I was being dirty, I was certainly doing it in the way the TEI would like me to be. ;-) I didn't feel unclean, I felt pragmatic. Whether that means it should be mentioned as a form of clean or unclean modification in this chapter is a different issue. But we do use it, and I think support it.

@lb42
Copy link
Member

lb42 commented Feb 7, 2020

Thanks James. I was thinking more of the systematic renaming of elements e,g, to some language other than English which we used to do in P3, but your use case is entirely persuasive. I will ponder further, and have re-opened this ticket accordingly. Would you be willing to see your ODD, or a simplified version of it, added to the list of exemplars?

@lb42 lb42 reopened this Feb 7, 2020
@sydb
Copy link
Member

sydb commented Feb 7, 2020

@jamescummings beat me to it, but I agree: we rename several elements using the <altIdent> mechanism.

And FWIW, I prefer (by far) the old P3 meanings of “clean” and “unclean”. I don’t see that it is useful to have another word that means basically “proper subset”.

@lb42
Copy link
Member

lb42 commented Feb 7, 2020

Happy(ish) to make changes to current wording @sydb, but what exactly are you proposing?

@lb42
Copy link
Member

lb42 commented Feb 7, 2020

When processing teicorset, roma produces rather misleading HTML doc output (e.g. renamed elements in the examples are renamed, but deleted attributes are retained). But the schema seems ok (modulo use of namespaces)

@sydb
Copy link
Member

sydb commented Feb 7, 2020

I’m not actually proposing a quick change to current wording, but rather a reasoned discussion over next few months about how “clean” and “unclean” should be defined and used.

Here, BTW, is a sample ODD that defines the markup language my daughter used for her thesis research data collection. Not even close to TEI conformant, and many major elements are renamed. But it allowed her to collect exactly the information she needed. (I wrote the ODD for her — she knows next to nothing about TEI, but is reasonably adept at XML. And I believe even taught assistants to enter data using this schema.)

@lb42
Copy link
Member

lb42 commented Feb 7, 2020

Thanks syd: i note that your ODD, unlike James', produces a schema which claims to be in the TEI namespace, though it clearly isn't. Using ODD to generate non-TEI schemas is just fine with me, the more the merrier. But telling fibs probably is not something to be encouraged :-)

@lb42
Copy link
Member

lb42 commented Feb 7, 2020

Restored the section on renaming, with some minor rewriting. I suggest that this ticket can be closed for now.

@lb42 lb42 closed this as completed Feb 7, 2020
@sydb
Copy link
Member

sydb commented Feb 7, 2020

@lb42: I presume you mean that it shouldn’t be in the TEI namespace, because clearly it is.

But for the most part, I disagree. No fibs told here, at least not in the data portions, and only a few in the metadata portions. The metadata is, of course, completely un-TEI-like, and I agree the container (<ijbMeta>) should probably be in a different namespace. But many of the elements are just renamed TEI elements (<encounters>, <encounter>, <transcription>, <from> are just <teiCorpus>, <TEI>, <body>, and <nationality>, although I admit that that last one is a bit of a semantic stretch), and most are just straight up TEI elements (<age>, <country>, <date>, <desc>, <district>, <emph>, <equipment>, <foreign>, <gap>, <incident>, <kinesic>, <langKnowledge>, <langKnown>, <measure>, <measureGrp>, <media>, <mentioned>, <note>, <pause>, <person>, <region>, <residence>, <said>, <settlement>, <shift>, <soCalled>, <text>, <time>, <u>, <unclear>, and <vocal>).

All that said, I just noticed right now something that had not occurred to me back when I wrote the schema — the <encounters> element is not entirely analogous to <teiCorpus>, because it does not have metadata for the entire collection. Sigh.

hcayless pushed a commit that referenced this issue Jun 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants