New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(lexer): low hanging fruits #151
Comments
Nerd sniping @YoniFeng, since you've already looked at the lexer a bit 😉 |
Performance issues are bugs for oxc. 👀 |
Performance is feature. |
What low hanging fruit do you see? We can make a list of things to look at / try.
Overall, I don't see much actionable experiments here:
What other ideas do you see can try? |
Identifier scanning:
Keywords and hashing:
Internalizing minified identifiers:
Things I don't like so far:
|
I just looked at the code of V8's keyword matching to better understand what this is talking about:
I think we should be able to do something similar to get this working, for example use the phf with a faster hasher, and then just copy the generated table into our code? |
I was quite interested in your discussions in #140. And I wrote a quick prototype of quick-lint-js's approach of matching keywords in rust: https://github.com/YangchenYe323/keyword-match-experiment The idea is simple:
The benchmark result is quite promising: I ran this a couple times on an AWS ubuntu instance. The performance of the hashing approach is stably 32-34ns. The matching approach's run time varies from 33ns to as high as 100ns. Probably it's because the matching approach is more sensitive to the actual length of the string? That part is beyond me. @Boshen It'd be great to get some insight from you. I could look into using this in oxc, too |
Looks great! Few things I noticed:
if clen < keyword_list::MIN_JS_KEYWORD_LENGTH || clen > keyword_list::MAX_JS_KEYWORD_LENGTH
{
return false;
} In practice, with the hashing approach it's probably not worth the branch misprediction? if len != candidate.len() {
return false;
} This is repeated in In quick-lint-js, this check is after the string comparison.
So, I think the SIMD comparison has to be dropped. edit: this might not be relevant for the keyword lookup
|
re: the match approach fluctuating in benchmarks Probably due to string length having a large effect, like you said. |
@YoniFeng Thanks for tagging me!
This would cause
This is correct.
SIMD (or SWAR) provides a perf benefit on its own, but SIMD unlocks another optimization: branchless comparisons. SIMD+branchless improved performance significantly (at least 2x). (Going branchless for the whole algorithm, not just the string comparison, is even better.) (Branchless string comparisons are not implemented in quick-lint-js right now. I didn't want to microoptimize quick-lint-js' current perfect hash table because I am going to throw it away eventually anyway. 😉 I mostly made the perfect hash table for a video (which I'll publish as soon as I make a YouTube thumbnail...).) However, it seems that @YangchenYe323's experiment returns a boolean. I assume calling code is going to branch anyway, so making the string comparison branchless might not be worth it. (In quick-lint-js' case, keyword lookup returns an integer so things are more complicated.)
How would you use cmov in the string comparison while avoiding out of bounds slice access?
What I did for most of my benchmarking/profiling/optimizing is take word-looking things from jQuery and feed that into my lookup function: https://github.com/strager/perfect-hash-tables/blob/3938fb8f7ddb65d0a83c12bc4dcfeabaecc7c4fc/mixed.txt This data set is about 20% keywords, 80% variable names (or words in comments). I didn't measure the ratio if you exclude comments. |
Note that I own the copyright to quick-lint-js' code, and quick-lint-js is released with a GPL-3.0-or-newer license. If your project is effectively a copy-paste of quick-lint-js' code, please respect the software license. @YangchenYe323 I think the following code is UB in Rust. You cannot grow a slice. I think Miri can catch this. |
I wasn't keeping track, I had your implementation in memory (AND with mask to avoid out of bounds).
Uhh - throw it away for what?
Wow https://github.com/strager/perfect-hash-tables seems fairly extensive/thorough.
I didn't have the string comparison in mind, only the outermost layer. |
Thanks for the reminder. I've added the same license to my repo.
Yes it is UB, thanks for pointing this out! But we need to drop that part since we won't go for the Simd anyway. |
I suppose we couldn't use a static mask either? because we don't pad the input so that 16 bytes are always readable. Could try mask against the actual length though. |
Let's respect @strager's work and quick-lint-js's license.
|
Let me check with @strager, I asked him in private. |
@YangchenYe323 can you draft a PR with an implement using We can't use the implementation from quick-lint-js's code given its license restrictions. And @strager will eventually teach us a clean-room implementation soon ;-) |
Sure I'm on it. The tricky thing is that the baseline matching approach is hard to beat : ). I ran the benchmark using @strager 's JQuery input on (a). Rust HashMap, (b). Matching, (c). Perfect Hash: Perfect hash is about 4x faster than using a rust HashMap (roughly consistent with the stats mentioned in #140 ) but only like 25% faster than just matching. Still need to see if we're able to get some speed up when this is migrated into oxc's lexer structure. |
I did some more digging into the LLVM IR of the The big switch in I suppose this might explain why #140 doesn't give the desired performance boost. |
When I looked at the assembly, I was surprised to "find" it was doing comparison after comparison without first branching based on the length (I know this optimization exists in .NET - and would blindly assume LLVM has it as well), so I attributed it to "some reason I don't know about". Nice to see you found this. I probably just wasn't thorough enough looking at the ASM.. (too lazy to go look again). |
Video on my perfect hash table implementation: https://www.youtube.com/watch?v=DMQ_HcNSOAI Feel free to adopt ideas from this video. |
After trying out a bunch of micro optimizations and by looking at the profiler at larger scale within the linter, I conclude that we don't need to look at the lexer anymore (for a long time). The only remaining micro optimization is the phf keyword lookup method, which we'll leave it as an exercise after the video, unless #171 gets an amazing result. |
From the v8 scanner blog, I still see low hanging fruits, let's play with these and make our lexer even faster!
The text was updated successfully, but these errors were encountered: