Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8Decoder should be compatible with TextDecoder.decode #31370

Closed
rakudrama opened this issue Nov 14, 2017 · 2 comments
Closed

Utf8Decoder should be compatible with TextDecoder.decode #31370

rakudrama opened this issue Nov 14, 2017 · 2 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-convert

Comments

@rakudrama
Copy link
Member

rakudrama commented Nov 14, 2017

The browser's TextDecoder.prototype.decode treats surrogates (U+D800 through U+DFFF) differently to Utf8Decoder.
This makes it difficult to use TextDecoder to accelerate conversion.
Acceleration is highly desirable - it improves one binary protobuf benchmark by 8x.

The main difference is that Utf8Decoder converts surrogates into a code point, but TextDecoder considers a surrogate to be an error and, depending on the fatal option, either throws an error, or decodes the surrogate to U+FFFD REPLACEMENT CHARACTER.

It is not possible to get acceptable performance for allowMalformed: true by trying with {fatal: true} and catching the Error and re-decoding with the slow code. Throwing the error is ~1000x more expensive.

Everything would be simpler if Utf8Decoder was completely aligned with TextDecoder.decode.

I have also verified that for other malformed inputs, TextDecoder and Utf8Decoder disagree on the number of U+FFFD replacements generated.

@lrhn
Copy link
Member

lrhn commented Sep 30, 2020

@askeksa-google did this.

@lrhn lrhn closed this as completed Sep 30, 2020
@askeksa-google
Copy link

In principle, yes. We currently have a workaround for some browser bugs.

We could have an issue for reporting the bugs, waiting for them to be fixed, and then removing the workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-convert
Projects
None yet
Development

No branches or pull requests

4 participants