Utf8Decoder should be compatible with TextDecoder.decode #31370
Labels
area-core-library
SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries.
library-convert
The browser's
TextDecoder.prototype.decode
treats surrogates (U+D800
throughU+DFFF
) differently to Utf8Decoder.This makes it difficult to use TextDecoder to accelerate conversion.
Acceleration is highly desirable - it improves one binary protobuf benchmark by 8x.
The main difference is that Utf8Decoder converts surrogates into a code point, but TextDecoder considers a surrogate to be an error and, depending on the
fatal
option, either throws an error, or decodes the surrogate to U+FFFD REPLACEMENT CHARACTER.It is not possible to get acceptable performance for
allowMalformed: true
by trying with{fatal: true}
and catching the Error and re-decoding with the slow code. Throwing the error is ~1000x more expensive.Everything would be simpler if
Utf8Decoder
was completely aligned withTextDecoder.decode
.I have also verified that for other malformed inputs, TextDecoder and Utf8Decoder disagree on the number of U+FFFD replacements generated.
The text was updated successfully, but these errors were encountered: