Skip to content

Commit

Permalink
No unpaired surrogates in CESU-8.
Browse files Browse the repository at this point in the history
  • Loading branch information
SimonSapin committed May 27, 2015
1 parent bc2ed84 commit 51abeef
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 7 deletions.
9 changes: 5 additions & 4 deletions index.html
Expand Up @@ -602,7 +602,7 @@ <h1 class="p-name no-ref" id="title">The WTF-8 encoding</h1>
<dt>Change history:
<dd><span><a href="https://github.com/SimonSapin/wtf-8/commits/gh-pages">On GitHub</a></span>
<dt>Last updated:
<dd><span>27 May 2015</span>
<dd><span>28 May 2015</span>
</dl>
</div>

Expand Down Expand Up @@ -827,9 +827,9 @@ <h3 class="heading settled" data-level="2.1" id="cesu-8"><span class="secno">2.1
Therefore, CESU-8 is not a superset of <a data-link-type="dfn" href="#utf_8">UTF-8</a>.</p>


<p>It is unclear whether <a data-link-type="dfn" href="#unpaired-surrogate-byte-sequence">unpaired surrogate byte sequences</a>
are supposed to be <a data-link-type="dfn" href="#well_formed">well-formed</a> in CESU-8.
If so, they are encoded the same as in <a data-link-type="dfn" href="#wtf_8">WTF-8</a>.</p>
<p>CESU-8 is also a mapping on <a data-link-type="dfn" href="#utf_16">UTF-16</a> code units.
Therefore <a data-link-type="dfn" href="#unpaired-surrogate-byte-sequence">unpaired surrogate byte sequences</a> are <a data-link-type="dfn" href="#ill_formed">ill-formed</a> in CESU-8,
whereas supporting them is the entire point of <a data-link-type="dfn" href="#wtf_8">WTF-8</a>.</p>



Expand Down Expand Up @@ -1944,6 +1944,7 @@ <h2 class="heading settled" data-level="8" id="acknowledgments"><span class="sec
Dylan Petonke,
Henri Sivonen,
James Graham,
Kevin Ballard,
Mathias Bynens,
Sam Tobin-Hochstadt,
Tab Atkins.</p>
Expand Down
7 changes: 4 additions & 3 deletions index.src.html
Expand Up @@ -122,9 +122,9 @@ <h3 id=cesu-8>
whereas <a>WTF-8</a>, like <a>UTF-8</a>, encodes them as sequences of four bytes.
Therefore, CESU-8 is not a superset of <a>UTF-8</a>.

It is unclear whether <a>unpaired surrogate byte sequences</a>
are supposed to be <a>well-formed</a> in CESU-8.
If so, they are encoded the same as in <a>WTF-8</a>.
CESU-8 is also a mapping on <a>UTF-16</a> code units.
Therefore <a>unpaired surrogate byte sequences</a> are <a>ill-formed</a> in CESU-8,
whereas supporting them is the entire point of <a>WTF-8</a>.

<!--
CESU-8 was probably not designed,
Expand Down Expand Up @@ -869,6 +869,7 @@ <h2 id=acknowledgments>
Dylan Petonke,
Henri Sivonen,
James Graham,
Kevin Ballard,
Mathias Bynens,
Sam Tobin-Hochstadt,
Tab Atkins.

0 comments on commit 51abeef

Please sign in to comment.