[Merged by Bors] - Optimize `String.prototype.normalize` #2848

jedel1043 · 2023-04-20T02:47:25Z

We currently use unicode_normalization to handle the String.prototype.normalize method. However, the crate doesn't support UTF-16 as a first class string, so we had to do some hacks by converting the valid parts of a string to UTF-8, normalizing each one, encoding back to UTF-16 and concatenating everything with the unpaired surrogates within. All of this is obviously suboptimal for performance, which is why I leveraged the icu_normalizer, which does support UTF-16 input, to replace our current implementation.

Additionally, this allows users to override the default normalization data if the intl feature is enabled by providing the required data in the BoaProvider data provider.

github-actions · 2023-04-20T02:57:28Z

Test262 conformance changes

Test result	main count	PR count	difference
Total	94,591	94,591	0
Passed	73,161	73,161	0
Ignored	17,530	17,530	0
Failed	3,900	3,900	0
Panics	0	0	0
Conformance	77.34%	77.34%	0.00%

codecov · 2023-04-20T03:09:19Z

Codecov Report

Merging #2848 (f5615bc) into main (f97ad0d) will decrease coverage by 0.01%.
The diff coverage is 17.54%.

@@            Coverage Diff             @@
##             main    #2848      +/-   ##
==========================================
- Coverage   50.92%   50.92%   -0.01%     
==========================================
  Files         419      419              
  Lines       41780    41799      +19     
==========================================
+ Hits        21278    21286       +8     
- Misses      20502    20513      +11

Impacted Files	Coverage Δ
boa_engine/src/builtins/string/mod.rs	`58.02% <0.00%> (ø)`
boa_engine/src/context/mod.rs	`45.45% <ø> (ø)`
boa_icu_provider/src/lib.rs	`75.00% <25.00%> (-25.00%)`	⬇️
boa_engine/src/context/icu.rs	`34.17% <47.36%> (+4.01%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

raskad

Having the minimal data generated like this seems like a very nice solution. Looks very nice!

raskad · 2023-04-22T22:58:33Z

@jedel1043 I did not look into it much, do you think we could do the same for the UnicodeProperties ID_Start and ID_Continue that we need in boa_parser and currently generate tables ourselves in boa_unicode?

jedel1043 · 2023-04-23T01:00:47Z

@jedel1043 I did not look into it much, do you think we could do the same for the UnicodeProperties ID_Start and ID_Continue that we need in boa_parser and currently generate tables ourselves in boa_unicode?

Yep! There's the icu_properties crate that offers precisely that functionality. I'll make a PR this weekend :)

HalidOdat

Nice work! Looks good to me! :)

jedel1043 · 2023-04-23T08:23:36Z

bors r+

We currently use `unicode_normalization` to handle the `String.prototype.normalize` method. However, the crate doesn't support UTF-16 as a first class string, so we had to do some hacks by converting the valid parts of a string to UTF-8, normalizing each one, encoding back to UTF-16 and concatenating everything with the unpaired surrogates within. All of this is obviously suboptimal for performance, which is why I leveraged the `icu_normalizer`, which does support UTF-16 input, to replace our current implementation. Additionally, this allows users to override the default normalization data if the `intl` feature is enabled by providing the required data in the `BoaProvider` data provider.

bors · 2023-04-23T08:40:02Z

Pull request successfully merged into main.

Build succeeded:

As mentioned in #2848 (comment), this uses our new default ICU4X data to replace `char::is_start` and `char::is_continue` from the `boa_unicode` crate with the [`icu_properties`](https://crates.io/crates/icu_properties) crate. Note that this doesn't deprecate `boa_unicode` yet, since that'll require some discussion about how to proceed with a now unused sub-crate.

jedel1043 added dependencies Pull requests that update a dependency file builtins PRs and Issues related to builtins/intrinsics labels Apr 20, 2023

jedel1043 added this to the v0.17.0 milestone Apr 20, 2023

jedel1043 requested review from Razican, jasonwilliams, HalidOdat, RageKnify, raskad and nekevss April 20, 2023 02:47

jedel1043 added 2 commits April 20, 2023 20:31

Optimize String.prototype.normalize

40e684e

Only enable "icu_normalizer/std" when intl is enabled

f5615bc

jedel1043 force-pushed the fast-normalizers branch from 7d739b8 to f5615bc Compare April 21, 2023 02:33

raskad approved these changes Apr 22, 2023

View reviewed changes

HalidOdat approved these changes Apr 23, 2023

View reviewed changes

bors bot changed the title ~~Optimize String.prototype.normalize~~ [Merged by Bors] - Optimize String.prototype.normalize Apr 23, 2023

bors bot closed this Apr 23, 2023

bors bot deleted the fast-normalizers branch April 23, 2023 08:40

jedel1043 mentioned this pull request Apr 23, 2023

[Merged by Bors] - Implement is_identifier_(start/part) using icu_properties #2865

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Optimize `String.prototype.normalize` #2848

[Merged by Bors] - Optimize `String.prototype.normalize` #2848

jedel1043 commented Apr 20, 2023

github-actions bot commented Apr 20, 2023 •

edited

codecov bot commented Apr 20, 2023 •

edited

raskad left a comment

raskad commented Apr 22, 2023

jedel1043 commented Apr 23, 2023

HalidOdat left a comment

jedel1043 commented Apr 23, 2023

bors bot commented Apr 23, 2023

[Merged by Bors] - Optimize String.prototype.normalize #2848

[Merged by Bors] - Optimize String.prototype.normalize #2848

Conversation

jedel1043 commented Apr 20, 2023

github-actions bot commented Apr 20, 2023 • edited

Test262 conformance changes

codecov bot commented Apr 20, 2023 • edited

Codecov Report

raskad left a comment

Choose a reason for hiding this comment

raskad commented Apr 22, 2023

jedel1043 commented Apr 23, 2023

HalidOdat left a comment

Choose a reason for hiding this comment

jedel1043 commented Apr 23, 2023

bors bot commented Apr 23, 2023

[Merged by Bors] - Optimize `String.prototype.normalize` #2848

[Merged by Bors] - Optimize `String.prototype.normalize` #2848

github-actions bot commented Apr 20, 2023 •

edited

codecov bot commented Apr 20, 2023 •

edited