Skip to content

Commit

Permalink
Add requirement for character encoding in trunction
Browse files Browse the repository at this point in the history
Addresses w3c#124
Addresses w3c/i18n-actions#62

- Add a requirement with explanation such that byte length
  truncation needs to specify a character encoding
  (and that legacy encodings should be avoided)
- Add links to glossary terms in this section in some places
- Small tweaks to other text
  • Loading branch information
aphillips committed Dec 14, 2023
1 parent 2a0691f commit ee3539a
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -2969,16 +2969,22 @@ <h3>Text truncation in UTF-8</h3>
</div>

<div class="req" id="char_trunc_grapheme_boundary">
<p class="advisement">Specifications that limit the length of a string SHOULD require truncation on grapheme boundaries, as truncation in the midst of a combining or joining sequence can alter the meaning of the string.</p>
<p class="advisement">Specifications that limit the length of a string SHOULD require truncation on grapheme boundaries, as truncation in the midst of a <a>grapheme</a> or <a>combining character sequence</a> can alter the meaning of the string.</p>
</div>

<div class="req" id="char_trunc_indicator">
<p class="advisement">If a specification specifies a length limit, it SHOULD specify that any string that is truncated include an indicator, such as ellipses, that the string has been altered.</p>
</div>

<div class="req" id="char_trunc_min_size">
<p class="advisement">When specifying a length limitation in code units (such as bytes), specifications SHOULD set the maximum length in a way that accommodates users whose language requires multibyte code unit sequences.</p>
<p class="advisement">When specifying a length limitation in code units (such as bytes), specifications SHOULD set the limit in a way that accommodates users whose language requires multibyte code unit sequences.</p>
</div>

<div class="req" id="char_trunc_character_encoding">
<p class="advisement">If a specification specifies a length limit in code units (such as bytes), it MUST specify the <a>character encoding</a> used in measuring the limit; such a limit SHOULD NOT specify a <a>legacy character encoding</a>.</p>
</div>

<p>If a specification permits or requires truncation of a field, the <a>character encoding</a> is important in knowing what the limit means. If the limit is in bytes and <a>legacy character encodings</a> are permitted, note that conversion of Unicode data to a non-Unicode encoding can also result in data loss (since most <a>legacy character encodings</a> encode only a subset of Unicode).</p>
</section>

<section id="strcat" class="subtopic">
Expand Down

0 comments on commit ee3539a

Please sign in to comment.