-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[dart2js/ddc] Use TextDecoder directly for more cases of UTF-8 decoding.
With the changes in #41100 the handling of UTF-8 encoded surrogates in Dart now matches that of JS. Thus, the pre-pass that scans for the presence of surrogates before handing the data to TextDecoder is no longer needed. Removing this gives a significant speedup. On my laptop, in Chrome, on the Utf8Decode benchmark, it gives around 1ns per input byte out of previously roughly 2.5ns (ASCII) to 5ns (Russian). In principle, this also enables TextDecoder for allowMalformed: true, since the number of replacement characters produced by Dart now matches the WHATWG standard. This does result in failures in some browsers, where these no not adhere to the standard. For instance, Chrome outputs one replacement character per undecoded input byte when an unfinished sequence is interrupted by end-of-input, where the standard specifies only one replacement character. To work around the browser deviations, the output from TextDecoder is scanned for replacement characters, and if any are found, the decoding falls back to the Dart implementation. This workaround can be removed if the bugs are fixed in the browsers. Since TextDecoder has a large startup overhead, we also fall back to the Dart implementation for short strings. Change-Id: I9e95a95ce726ce0d9e9a3b46df8ee2512ab05f0a Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/144294 Commit-Queue: Aske Simon Christensen <askesc@google.com> Reviewed-by: Stephen Adams <sra@google.com>
- Loading branch information
Showing
4 changed files
with
140 additions
and
243 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.