Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements to SG-oth: order of sub-sections, XML declaration, encoding #2076

Closed
sydb opened this issue Dec 12, 2020 · 6 comments
Closed

improvements to SG-oth: order of sub-sections, XML declaration, encoding #2076

sydb opened this issue Dec 12, 2020 · 6 comments

Comments

@sydb
Copy link
Member

sydb commented Dec 12, 2020

The section “Other Components of an XML Document” ("SG-oth", currently v.7) could use some updating.

  1. What got me looking at this section was @martindholmes’s observation that it would be better if the 2nd example in the “Processing Instructions” (SG-pi) subsection showed an encoding of “UTF-8” rather than “iso-8859-1”. This is because lots of people will glance at or copy the example without carefully reading the important discussion that immediately follows. Of course, if the example is changed, said immediately following discussion has to be re-written to accommodate the change.
  2. In that same paragraph, the clause “the 16-bit characters of Unicode have been mapped to the 8-bit character set known as ISO 8859-1” is mildly ambiguous, and both interpretations are wrong. It could mean either that all Unicode characters are represented with 16-bits, or that only the subset of Unicode characters that can or are represented in 16 bits have been mapped. But Unicode has over 143,000 characters, so they cannot all be represented in a mere 16 bits. But even the subset of Unicode that can be represented in 16 bits (i.e., UCS-2) cannot be mapped to the 8-bit ISO 8859-1 encoding. Perhaps “The encoding "ISO-8859-1" could be used to indicate that only a subset of the 128 characters defined by part 1 of the ISO 8859 standard are used in the document, and they are mapped to 8-bit numbers according to that standard.” or some such.
  3. Note that the value should be “ISO-8859-1”, not “iso-8859-1”.
  4. The assertion that the “XML declaration is purely documentary” seems a bit strong. Combined with the warning that getting it wrong can mess things up may leave the reader thinking it is best to leave it out. But the opposite is the case — it is best to include it. Although it is not required, and in the modern ecosystem of processors may not be all that useful, the XML spec says it SHOULD be present. Furthermore, a declaration is required for external entities stored in anything other than UTF-8 or UTF-16 when “external character encoding information (such as MIME headers)” is not available. (This is so that a processor can detect the encoding, because it knows the first two characters after the byte order mark, if any, will be “<?”.) Since any TEI file may find itself XIncluded or entity-referenced into another to be part of a <teiCorpus>, every TEI file should be thought of as an external entity for this purpose.
  5. Seems to me the “Namespaces” subsection (SGname) should occur before “Processing Instructions” (SG-pi), if not before “Character References” (SG-er).
@sydb
Copy link
Member Author

sydb commented Dec 12, 2020

This chapter of the Guidelines, arguably the most popular and influential, is primarily the work of @lb42. Thus assigning to him, as I think he has earned right of first refusal for updates to this chapter.

@lb42
Copy link
Member

lb42 commented Dec 14, 2020

Since the section on the xml declaration says it is not a PI, it really shouldn't be in the section that discusses PIs. I now have a revised version. Do you want me to create a branch and pull request or can I just check it in as in the good ole days?

@sydb
Copy link
Member Author

sydb commented Dec 14, 2020

Agreed (not best served by being in PI section). When you ask “do you want a PR vs just check in”, the answer probably depends on which “you” you are asking. I think just checking in a prose change like that is fine. But @peterstadler, @martindholmes, and probably several others seem to want everything in a PR these days.

lb42 added a commit that referenced this issue Dec 14, 2020
@lb42
Copy link
Member

lb42 commented Dec 14, 2020

OK, I checked it in. Leaving issue open for you to check you like the rewording.

@ebeshero
Copy link
Member

ebeshero commented Oct 1, 2021

VF2F subgroup: we think there are some more revisions to work on here. Assigning @sydb to make revisions and @JanelleJenstad to proofread and fix that which he produces.

@sydb
Copy link
Member Author

sydb commented Sep 2, 2024

While I am not 100% on this, it looks like all issues have been addressed.

@sydb sydb closed this as completed Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants