Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Implement is_identifier_(start/part) using icu_properties #2865

Closed
wants to merge 4 commits into from

Conversation

jedel1043
Copy link
Member

As mentioned in #2848 (comment), this uses our new default ICU4X data to replace char::is_start and char::is_continue from the boa_unicode crate with the icu_properties crate.

Note that this doesn't deprecate boa_unicode yet, since that'll require some discussion about how to proceed with a now unused sub-crate.

@jedel1043 jedel1043 added dependencies Pull requests that update a dependency file parser Issues surrounding the parser discussion Issues needing more discussion labels Apr 23, 2023
@jedel1043 jedel1043 added this to the v0.17.0 milestone Apr 23, 2023
@github-actions
Copy link

Test262 conformance changes

Test result main count PR count difference
Total 94,591 94,591 0
Passed 73,210 73,210 0
Ignored 17,530 17,530 0
Failed 3,851 3,851 0
Panics 0 0 0
Conformance 77.40% 77.40% 0.00%

@codecov
Copy link

codecov bot commented Apr 23, 2023

Codecov Report

Merging #2865 (4835089) into main (739bd5a) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #2865      +/-   ##
==========================================
+ Coverage   50.92%   50.95%   +0.03%     
==========================================
  Files         419      419              
  Lines       41802    41867      +65     
==========================================
+ Hits        21288    21334      +46     
- Misses      20514    20533      +19     
Impacted Files Coverage Δ
boa_parser/src/lexer/identifier.rs 96.55% <100.00%> (+0.63%) ⬆️

... and 13 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@Razican Razican left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Regarding boa_unicodde, we could maybe use cargo-tally (https://github.com/dtolnay/cargo-tally).

With this we could check if someone else uses it and deprecate our crate.

boa_parser/src/lexer/identifier.rs Show resolved Hide resolved
@jedel1043
Copy link
Member Author

Looks good to me! Regarding boa_unicodde, we could maybe use cargo-tally (https://github.com/dtolnay/cargo-tally).

With this we could check if someone else uses it and deprecate our crate.

Apparently there are no dependents of boa_unicode apart from Boa and boa_engine (https://crates.io/crates/boa_unicode/reverse_dependencies), so I think it should be safe to deprecate it.

@Razican
Copy link
Member

Razican commented Apr 23, 2023

Looks good to me! Regarding boa_unicodde, we could maybe use cargo-tally (https://github.com/dtolnay/cargo-tally).
With this we could check if someone else uses it and deprecate our crate.

Apparently there are no dependents of boa_unicode apart from Boa and boa_engine (https://crates.io/crates/boa_unicode/reverse_dependencies), so I think it should be safe to deprecate it.

Let's do it! :)

Copy link
Member

@HalidOdat HalidOdat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jedel1043
Copy link
Member Author

bors r+

@jedel1043
Copy link
Member Author

Then, after this gets merged, we can open a new branch to bump icu_unicode to 0.16.1 with a description signaling its deprecated state, and delete the crate from our main branch.

bors bot pushed a commit that referenced this pull request Apr 24, 2023
As mentioned in #2848 (comment), this uses our new default ICU4X data to replace `char::is_start` and `char::is_continue` from the `boa_unicode` crate with the [`icu_properties`](https://crates.io/crates/icu_properties) crate.

Note that this doesn't deprecate `boa_unicode` yet, since that'll require some discussion about how to proceed with a now unused sub-crate.
@bors
Copy link

bors bot commented Apr 24, 2023

Pull request successfully merged into main.

Build succeeded:

@bors bors bot changed the title Implement is_identifier_(start/part) using icu_properties [Merged by Bors] - Implement is_identifier_(start/part) using icu_properties Apr 24, 2023
@bors bors bot closed this Apr 24, 2023
@bors bors bot deleted the icu-unicode-sets branch April 24, 2023 20:34
@raskad
Copy link
Member

raskad commented Apr 25, 2023

For the sake of documentation; I noticed that after this change our benchmarks for the parser looked better so I ran them and compared cebec9d to c330005.
This change seems to have improved the performance of the parser by a good amount:

Benchmark Changes
Symbols (Parser) [-13.850% -12.097% -10.548%]
For loop (Parser) [-5.3454% -4.3440% -3.4117%]
Fibonacci (Parser) [-13.609% -10.961% -8.5178%]
Object Creation (Parser) [-8.1511% -6.9890% -5.7952%]
Static Object Property Access (Parser) [-10.387% -9.2301% -8.0496%]
Dynamic Object Property Access (Parser) [-7.9534% -6.8847% -5.7769%]
RegExp Literal Creation (Parser) [-7.4906% -6.3700% -5.4132%]
RegExp Creation (Parser) [-10.154% -9.1298% -7.9548%]
RegExp Literal (Parser) [-7.7712% -6.7077% -5.5830%]
RegExp (Parser) [-12.477% -11.538% -10.613%]
Array access (Parser) [-15.038% -13.548% -11.900%]
Array creation (Parser) [-17.589% -15.650% -13.799%]
Array pop (Parser) [-8.6277% -7.0937% -5.5033%]
String concatenation (Parser) [-12.413% -10.984% -9.6802%]
String comparison (Parser) [-8.1121% -6.9242% -5.6435%]
String copy (Parser) [-8.6093% -7.1169% -5.6620%]
Number Object Access (Parser) [-10.555% -9.5270% -8.4492%]
Boolean Object Access (Parser) [-17.107% -16.002% -14.850%]
String Object Access (Parser) [-16.571% -15.397% -14.316%]
Arithmetic operations (Parser) [+1.5678% +3.3078% +5.0827%]
Clean js (Parser) [-7.3581% -6.1867% -4.6942%]
Mini js (Parser) [-6.1803% -5.2065% -4.2830%]

cc @jedel1043

@jedel1043
Copy link
Member Author

Nice! And it makes much sense, since icu_properties is hyperoptimized for char queries, whereas our old implementation was just a succession of queries using several tables and property matches that made the categorization work a lot slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file discussion Issues needing more discussion parser Issues surrounding the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants