You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In section 4.2 the decoding algorithm says to unconditionally consume the character after a lead surrogate unit. This results in improper decoding when there is a sequence of two lead surrogate units followed by a trail surrogate unit. Three code points will be emitted instead of two. The note at the end of the section suggests that only two code points should be emitted, and in fact the two implementations I checked (wtf-8.js and rust-wtf8) will emit two code points in this situation.
The algorithm should consume the next code unit if and only if it is a trail surrogate unit.
The text was updated successfully, but these errors were encountered:
For example, [0xD83D, 0xD83D, 0xDCA9] would have incorrectly decoded to
[U+D83D, U+D83D, U+DCA9] rather than [U+D83D, U+1F4A9].
Thanks @Dylan16807 for the #5 bug report.
In section 4.2 the decoding algorithm says to unconditionally consume the character after a lead surrogate unit. This results in improper decoding when there is a sequence of two lead surrogate units followed by a trail surrogate unit. Three code points will be emitted instead of two. The note at the end of the section suggests that only two code points should be emitted, and in fact the two implementations I checked (wtf-8.js and rust-wtf8) will emit two code points in this situation.
The algorithm should consume the next code unit if and only if it is a trail surrogate unit.
The text was updated successfully, but these errors were encountered: