-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some optimizations #393
Some optimizations #393
Conversation
Enabling PGO could get another >5% speedup. I use this tool to do the optimization https://github.com/Kobzol/cargo-pgo. $ cargo pgo run -- sample_files/slow_before.rs sample_files/slow_after.rs
$ cargo pgo optimize run -- sample_files/slow_before.rs sample_files/slow_after.rs Before (this time I directly invoke the binary rather than through
After:
|
Substitute
Original memory usage:
Current memroy usage:
|
Switching from mimalloc to snmalloc brings a negligible speedup and slightly less memory usage. Since the time difference is too small, hyperfine is used again. Before this change:
After this change:
If you think such an improvement is worthy then I will commit & push it. |
Change a Benchmark results (without PGO and snmalloc):
I don't know why the number of instructions rised a little bit. |
cool! nice work |
Remove
|
I have to focus on other works so the optimization ends here. In conclusion (without PGO and snmalloc):
|
Wow, really great changes! It's incredible to see a ~25% speedup in code I've already tried to make fast :) Thanks for mentioning snmalloc, I will take a look at it too. I've had a few problems with mimalloc (see #297) so I'm interested in looking at other malloc implementations. |
First, enable thin-LTO. This brings ~5% speedup.
Before:
After:
The numbers of instructions are relatively stable.
I also measured them using hyperfine.
Before:
After: