Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
some comments outstanding, will fix this manually

Co-Authored-By: Syd Bauman <sydb@users.noreply.github.com>
  • Loading branch information
duncdrum and sydb committed Jan 30, 2020
1 parent ca2f7e6 commit f0d3748
Show file tree
Hide file tree
Showing 14 changed files with 632 additions and 659 deletions.
50 changes: 33 additions & 17 deletions P5/Source/Guidelines/en/WD-NonStandardCharacters.xml
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,10 @@ to the Unicode Standard. </item>
<p>Since there are now over 130,000 characters in Unicode,
chances are good that what you need is already there, but it might
not be easy to find, since it might have a different name in
Unicode. Editors working with East Asian writing systems, should consult
Unicode. Editors working with East Asian writing systems should consult
the <ref target="https://unicode.org/charts/unihan.html">Unihan Database</ref>.
Look again, this time at other sites, for example <ptr target="http://www.eki.ee/letter/"/> (no CJK) or <ptr target="https://www.chise.org"/> (CJK only), which also provide searches based on scripts and languages. Take care, however, that all the
Look again, this time at other sites, preferably ones which also provide searches based on scripts and languages. For example <ptr target="https://www.chise.org"/> (for CJK characters) or <ptr target="http://www.eki.ee/letter/"/> (for non-CJK characters) .
Take care, however, that all the
properties of what seems to be a relevant character are consistent
with those of the character you are looking for. For example, if
your character is definitely a digit, but the properties of the
Expand Down Expand Up @@ -176,7 +177,7 @@ for use by such applications in a standard way.</p>
<p> The list of attributes (properties) for characters is modelled on
those in the Unicode Character Database, which distinguishes
<term>normative</term> and <term>informative</term> character
properties. The Unicode Consortium also maintains a separate set of character properties specific to East Asian characters in the <ref target="http://www.unicode.org/charts/unihan.html">Unihan database</ref> which TEI fully supports. Lastly, non-Unicode, properties may also be supplied.
properties. The Unicode Consortium also maintains a separate set of character properties specific to East Asian characters in the <ref target="http://www.unicode.org/charts/unihan.html">Unihan database</ref> which TEI fully supports. Lastly, non-Unicode properties may also be supplied.
Since the list of properties will vary with different versions of the
Unicode Standard, there may not be an exact correspondence between
them and the list of properties defined in these Guidelines.</p>
Expand Down Expand Up @@ -291,7 +292,7 @@ from the private use area as in this example:
</egXML>
</p>
<p>A more precise documentation of the properties of any character or
glyph may be supplied using one of the three: <gi>localProp</gi>, <gi>unicodeProp</gi>, or <gi>unihanProp</gi> elements described in the next section.</p>
glyph may be supplied using one of the three <soCalled>property</soCalled> elements: <gi>localProp</gi>, <gi>unicodeProp</gi>, or <gi>unihanProp</gi>; these are described in the next section.</p>
<div type="div3" xml:id="ucsprops"><head>Character Properties</head>
<p>The Unicode Standard documents <soCalled>ideal</soCalled>
characters, defined by reference to a number of
Expand All @@ -308,18 +309,33 @@ modifications, great care should be taken not to override standard
informative properties for characters which already exist in the Unicode
Standard, as documented in <ref target="#CH-eg-02">Freytag (2006)</ref>.</p>
<!-- TODO phase 6 insert comment about validation of values -->
<p>The <gi>unicodeProp</gi>, <gi>unihanProp</gi>, and <gi>localProp</gi> elements allow TEI encoders to record information about a character or glyph. Where the information concerned relates to
a property which has already been identified in the Unicode Standard, use of the appropriate Unicode property name with <gi>unicodeProp</gi> is strongly encouraged. The use of available Unihan property names with <gi>unihanProp</gi>, is similarly encouraged. With these elements, validation rules for property names <!-- and values --> according to Unicode conventions are incorporated into TEI validation rules. Where neither of these standards suffices use <gi>localProp</gi>.</p>
<p>The <gi>unicodeProp</gi>, <gi>unihanProp</gi>, and
<gi>localProp</gi> elements allow a TEI encoder to record information
about a character or glyph:
<specList>
<specDesc key="unicodeProp" atts="name value"/>
<specDesc key="unihanProp" atts="name value"/>
<specDesc key="localProp" atts="name value"/>
</specList>
</p>
<p>Where the information concerned relates to a property which has
already been identified in the Unicode Standard, use of the
appropriate Unicode property name with <gi>unicodeProp</gi> is
strongly encouraged. The use of available Unihan property names with
<gi>unihanProp</gi> is similarly encouraged. Validation rules for
property names <!-- and values --> according to Unicode conventions
are incorporated into the TEI schemas. Where neither of these
standards suffices use <gi>localProp</gi>.</p>
<!-- Phse 3-5 TODO add @version in here and override possible values for localProp -->
<p>The three elements for recording Unicode or locally defined properties belong to the <gi>att.gaijiProp</gi> class. This class defines two required attributes for record key-value pairs for character properties:
<!-- TODO phase 3: add version -->
<specList>
<specDesc key="att.gaijiProp" atts="name value"/>
</specList>
For each property, the encoder must supply both a
<att>name</att> and a <att>value</att>. In cases of boolean properties TEI requires an explict true or false <gi>value</gi> attribute:
<att>name</att> and a <att>value</att>. In cases of boolean properties TEI requires an explict <val>true</val> or <val>false</val> <att>value</att> attribute:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NONE">
<unicodeProp name="Ideographic" value="False"/>
<unicodeProp name="Ideographic" value="false"/>
</egXML>
</p>
<p>For convenience, we list here some of the normative character
Expand Down Expand Up @@ -465,15 +481,15 @@ Character Database: Canonical Combining Class Values</ref>); these were taken fr
the text direction: it has the value <code>Y</code>
(character is mirrored) or <code>N</code> (code is not mirrored).</item>
</list></p>
<p>The Unicode Standard also defines a set of informative (but non-normative) properties for Unicode characters. If encoders want to provide such properties, they may be included using the suggested Unicode name. If a Unicode name exists for a given character this should always be used, encoders may also supply locally defined names. To tag a unicode name, use <gi>unicodeProp</gi>, or <gi>unihanProp</gi> for Unihan properties. For names specified elsewhere or specified locally use <gi>localProp</gi>.</p>
<p>The Unicode Standard also defines a set of informative (but non-normative) properties for Unicode characters. If encoders wish to provide such properties, they should be included using the Unicode name. If a Unicode name exists for a given character this should always be used, however encoders may also supply locally defined names. To tag a Unicode name, use <tag>unicodeProp name="Name"</tag> (or <tag>unihanProp name="Name"</tag>). For names specified elsewhere or specified locally use <gi>localProp</gi>.</p>
</div>
</div>
<div type="div2" xml:id="D25-30">
<head>Annotating Characters</head>
<p>Annotation of a character becomes necessary when it is desired
to distinguish it on the basis of certain aspects (typically, its
graphical appearance) only. In a manuscript, for example, where
distinctly different forms of the letter "r" can be recognized, it
distinctly different forms of the letter <mentioned>r</mentioned> can be recognized, it
might be useful to distinguish them for analytic purposes, quite
distinct from the need to provide an accurate representation of the
page. A digital facsimile, particularly one linked to a
Expand Down Expand Up @@ -502,7 +518,7 @@ the letter we wish to distinguish: <egXML xmlns="http://www.tei-c.org/ns/Example
</glyph>
</charDecl> </egXML>
With these definitions in place, occurrences of these two special
"r"s in the text can be annotated using the element <gi>g</gi>:
<mentioned>r</mentioned>s in the text can be represented using the element <gi>g</gi>:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NONE">
<p>Wo<g ref="#r1">r</g>ds in this
manusc<g ref="#r2">r</g>ipt are sometimes
Expand All @@ -516,14 +532,14 @@ the letter we wish to distinguish: <egXML xmlns="http://www.tei-c.org/ns/Example
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NONE"><p> ... <g ref="#Filig">Fi</g>lthy riches...</p>
<!-- in the charDecl -->
<glyph xml:id="Filig">
<localProp name="name" value="LATIN UPPER F AND LATIN LOWER I LIGATURE"/>
<localProp name="Name" value="LATIN UPPER F AND LATIN LOWER I LIGATURE"/>
<figure><graphic url="Filig.png"/></figure>
</glyph>
</egXML>
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NONE"><p> ... <abbr><g ref="#per">per</g></abbr> ardua</p>
<!-- in the charDecl -->
<glyph xml:id="per">
<localProp name="name" value="LATIN ABBREVIATION PER"/>
<localProp name="Name" value="LATIN ABBREVIATION PER"/>
<figure><graphic url="per.png"/></figure>
</glyph>

Expand All @@ -534,7 +550,7 @@ the letter we wish to distinguish: <egXML xmlns="http://www.tei-c.org/ns/Example
such as indexing).</p>
<p>With this
markup in place, it will be possible to write programs to analyze
the distribution of the different letters "r" as well as produce
the distribution of the different letters <mentioned>r</mentioned> as well as produce
more <soCalled>faithful</soCalled> renderings of the original. It
will also be possible to produce normalized versions by simply ignoring
the annotation pointed to by the element <gi>g</gi>. <!-- To make
Expand Down Expand Up @@ -649,7 +665,7 @@ representation is to use the <gi>g</gi> element defined by
the module defined in this chapter: <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:lang="und" source="#NONE"><g ref="#ydotacute"/></egXML>. This makes it possible for the encoder to
provide useful documentation for the particular character or glyph so referenced:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NONE"><char xml:id="ydotacute">
<localProp name="name" value="LATIN SMALL LETTER Y WITH DOT ABOVE AND
<localProp name="Name" value="LATIN SMALL LETTER Y WITH DOT ABOVE AND
ACUTE"/>
<localProp name="entitiy" value="ydotacute"/>
<mapping type="composed">&amp;#x0079;&amp;#x0307;&amp;#x0301;</mapping>
Expand All @@ -669,7 +685,7 @@ provide useful documentation for the particular character or glyph so referenced
<mapping type="standard">偽</mapping>
</glyph>
</egXML>
The composition rules and further examples appear in <ref target="https://www.unicode.org/versions/Unicode11.0.0/ch18.pdf#G28626">Chapter 18.2: Ideographic Description Characters</ref> of the Unicode Standard. Editors should be aware that different sequences can accurately describe the same character. In the example the character "人" (U+4EBA) could have been substituted with "亻" (U+4EBB). Local preferences about how sequences are constructed should be documented in the <ptr target="#HD5"/>. Additionally, a number of online services, such as <ref target="https://chise.org"> CHISE</ref>, offer quering and retrieving characters via IDS, which facilitates a greater degree of stablilty across different applications.</p>
The composition rules and further examples appear in <ref target="https://www.unicode.org/versions/Unicode11.0.0/ch18.pdf#G28626">Chapter 18.2: Ideographic Description Characters</ref> of the Unicode Standard. Editors should be aware that different sequences can accurately describe the same character. In the example the character "人" (U+4EBA) could have been substituted with "亻" (U+4EBB). Local preferences about how sequences are constructed should be documented in the <gi>encodingDesc</gi> of the corresponding TEI header (see <ptr target="#HD5"/>). Additionally, a number of online services, such as <ref target="https://chise.org">CHISE</ref>, offer querying and retrieving characters via IDS, which facilitates a greater degree of stabililty across different applications.</p>
<p>Under certain circumstances, Chinese Han characters can be written
within a circle. Rather than considering this as simply an aspect of the rendering, an encoder may wish to treat such circled characters as entirely distinct derived characters. For a given character
(say that represented by the numeric-character reference <code>&amp;#x4EBA;</code>)
Expand All @@ -678,7 +694,7 @@ the circled variant might conveniently be represented as
definition such as the following:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NONE"><char xml:id="U4EBA-circled">
<unicodeProp name="Decomposition_Mapping" value="cicle"/>
<localProp name="name" value="CIRCLED IDEOGRAPH 36"/>
<localProp name="Name" value="CIRCLED IDEOGRAPH 36"/>
<localProp name="daikanwa" value="36"/>
<mapping type="standard">
&amp;#x4EBA;
Expand Down
128 changes: 68 additions & 60 deletions P5/Source/Specs/att.gaijiProp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,64 +7,72 @@ $Date$
$Id$
-->
<?xml-model href="http://jenkins.tei-c.org/job/TEIP5-dev/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
<classSpec xmlns="http://www.tei-c.org/ns/1.0" module="gaiji" type="atts" ident="att.gaijiProp">
<desc versionDate="2019-06-29" xml:lang="en">provides the <att>name</att> and <att>value</att> attributes, to be used in detailed descriptions of non-standard character or glyph data.
</desc>
<attList org="group">
<attDef ident="name" usage="req">
<desc versionDate="2019-06-29" xml:lang="en">contains the name of a name-value pair of character or glyph properties</desc>
<datatype maxOccurs="1"><dataRef key="teidata.xmlName"/></datatype>
</attDef>
<attDef ident="value" usage="req">
<desc versionDate="2019-06-29" xml:lang="en">contains the value of a name-value pair of character or glyph properties</desc>
<datatype><dataRef key="teidata.text"/></datatype>
</attDef>
<attDef ident="version" usage="opt">
<!-- Due to bug this does not have the list of valid unicode version numbers here, see
https://github.com/TEIC/TEI/pull/1901#issuecomment-510460274 -->
<desc versionDate="2019-07-11" xml:lang="en">specifies the version number of an external Standard in which this property name is defined.</desc>
<desc versionDate="2019-07-11" xml:lang="de">gibt die Versionsnummer eines externen Standards an, in dem dieser Eigenschaftsname definiert ist.</desc>
<datatype>
<dataRef key="teidata.enumerated"/>
</datatype>
<valList type="semi">
<valItem ident="1.0.1"/>
<valItem ident="1.1"/>
<valItem ident="2.0"/>
<valItem ident="2.1"/>
<valItem ident="3.0"/>
<valItem ident="3.1"/>
<valItem ident="3.2"/>
<valItem ident="4.0"/>
<valItem ident="4.1"/>
<valItem ident="5.0"/>
<valItem ident="5.1"/>
<valItem ident="5.2"/>
<valItem ident="6.0"/>
<valItem ident="6.1"/>
<valItem ident="6.2"/>
<valItem ident="6.3"/>
<valItem ident="7.0"/>
<valItem ident="8.0"/>
<valItem ident="9.0"/>
<valItem ident="10.0"/>
<valItem ident="11.0"/>
<valItem ident="12.0"/>
<valItem ident="12.1"/>
<valItem ident="unassigned"/>
</valList>
</attDef>
</attList>
<exemplum versionDate="2019-07-01" xml:lang="en">
<p>In this example a definition for the unicode property name and its value are provided.</p>
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#UND">
<unicodeProp name="Decomposition_Mapping" value="circle"/>
</egXML>
</exemplum>
<remarks versionDate="2019-06-29" xml:lang="en">
<p>TODO</p>
</remarks>
<listRef>
<ptr target="#WD"/>
</listRef>
<classSpec ident="att.gaijiProp" module="gaiji" type="atts" xmlns="http://www.tei-c.org/ns/1.0">
<desc versionDate="2020-01-28" xml:lang="en">provides attributes for defining the properties of
non-standard characters or glyphs. </desc>
<desc versionDate="2020-01-28" xml:lang="de">liefert Attribute zur Definition der Eigenschaften
von nicht standardisierten Zeichen und Glyphen.</desc>
<attList org="group">
<attDef ident="name" usage="req">
<desc versionDate="2020-01-28" xml:lang="en">provides the name of the character or glyph
property being defined.</desc>
<datatype maxOccurs="1">
<dataRef key="teidata.xmlName"/>
</datatype>
</attDef>
<attDef ident="value" usage="req">
<desc versionDate="2020-01-28" xml:lang="en">provides the value of the character or
glyph property being defined.</desc>
<datatype>
<dataRef key="teidata.text"/>
</datatype>
</attDef>
<attDef ident="version" usage="opt">
<desc versionDate="2020-01-28" xml:lang="en">specifies the version number of the Unicode
Standard in which this property name is defined.</desc>
<desc versionDate="2019-07-11" xml:lang="de">gibt die Versionsnummer eines externen
Standards an, in dem dieser Eigenschaftsname definiert ist.</desc>
<datatype>
<dataRef key="teidata.enumerated"/>
</datatype>
<valList type="semi">
<valItem ident="1.0.1"/>
<valItem ident="1.1"/>
<valItem ident="2.0"/>
<valItem ident="2.1"/>
<valItem ident="3.0"/>
<valItem ident="3.1"/>
<valItem ident="3.2"/>
<valItem ident="4.0"/>
<valItem ident="4.1"/>
<valItem ident="5.0"/>
<valItem ident="5.1"/>
<valItem ident="5.2"/>
<valItem ident="6.0"/>
<valItem ident="6.1"/>
<valItem ident="6.2"/>
<valItem ident="6.3"/>
<valItem ident="7.0"/>
<valItem ident="8.0"/>
<valItem ident="9.0"/>
<valItem ident="10.0"/>
<valItem ident="11.0"/>
<valItem ident="12.0"/>
<valItem ident="12.1"/>
<valItem ident="unassigned"/>
</valList>
</attDef>
</attList>
<exemplum versionDate="2019-07-01" xml:lang="en">
<p>In this example a definition for the Unicode property <name>Decomposition Mapping</name>
is provided.</p>
<egXML source="#UND" xmlns="http://www.tei-c.org/ns/Examples"> <unicodeProp
name="Decomposition_Mapping" value="circle"/> </egXML>
</exemplum>
<remarks versionDate="2019-06-29" xml:lang="en">
<p>All name-only attributes need an xs:boolean attribute value inside <att>value</att>.</p>
</remarks>
<listRef>
<ptr target="#WD"/>
</listRef>
</classSpec>
2 changes: 1 addition & 1 deletion P5/Source/Specs/char.xml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ otherwise available in the document character set-->.</desc>
<exemplum versionDate="2019-07-01" xml:lang="und">
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<char xml:id="circledU4EBA">
<localProp name="name" value="CIRCLED IDEOGRAPH 4EBA"/>
<localProp name="Name" value="CIRCLED IDEOGRAPH 4EBA"/>
<localProp name="daikanwa" value="36"/>
<unicodeProp name="Decomposition_Mapping" value="circle"/>
<mapping type="standard">人</mapping>
Expand Down
Loading

0 comments on commit f0d3748

Please sign in to comment.