-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Crypto::Bcrypt performance #3880
Optimize Crypto::Bcrypt performance #3880
Conversation
0x01c36ae4, 0xd6ebe1f9, 0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f, | ||
0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6, | ||
], | ||
private S = StaticArray[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not UInt32.static_array(...) instead?
Won't it spare us all the _u32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I wasn't aware of this initializer. I wish Int[1, 2, 3]
would return a StaticArray
instead of an Array
...
093eaf7
to
8e36580
Compare
replace f with, gives also 5% performance: @[AlwaysInline]
private def f(x)
d = x.to_u8
c = (x >> 8).to_u8
b = (x >> 16).to_u8
a = (x >> 24).to_u8
((@s.to_unsafe[a] + @s.to_unsafe[256 + b]) ^ @s.to_unsafe[512 + c]) + @s.to_unsafe[768 + d]
end |
8e36580
to
8eef6af
Compare
Optimizes the Blowfish and Bcrypt algorithms by using stack allocated objects (StaticArray) instead of a HEAP allocation (Array), and raw pointer accesses. The implementation is still not on par with alternative C implementations (e.g. as used by the Ruby extension).
8eef6af
to
535c50f
Compare
@kostya I didn't notice an improvement (above benchmark, LLVM 3.9.1). I got a 6% improvement, by memoizing pointers to It's now down to 2.81s vs 2.46s (ruby). |
Excellent as usual, Julien! 🙇 Thank you so much, this is an amazing contribution 💯 |
The current
Crypto::BCrypt
implementation is roughly ~68% slower than the Ruby extension.Changing the HEAP allocated Array for the stack allocated StaticArray and flattening
P
immediately reduces the difference to ~32% slower. Using raw pointers closes the gap further by going down to ~19% slower. Raw pointers are unsafe, but the algorithm has control over the pointer's range, and it never goes out of bounds. It's still not on par with the Ruby implementation (that uses C) but it's getting closer.Running this benchmark, generating 10 hashes at cost 12 got a
71%41% performance increase, down from 4.15s to 2.94s (vs 2.46s for Ruby).refs #1309