-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use integer division for cong #50427
Conversation
instead do a bitwise and and resample
return *seed % max; | ||
uint64_t mask = ~(uint64_t)0; | ||
--max; | ||
mask >>= __builtin_clzll(max|1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we would be passing in the mask also but I expect this to still be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, we could attempt to keep this as the unbias_cong
argument that it was before, but is it worth it? This seems cheap enough.
src/julia_internal.h
Outdated
mask >>= __builtin_clzll(max|1); | ||
uint64_t x; | ||
do { | ||
while ((*seed = 69069 * (*seed) + 362437) > unbias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this unbias term is now possibly biasing your results slightly and should be removed
(it is from the algorithm "Division with Rejection (Unbiased)" or equivalently "Debiased Modulo (Twice)" previously)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep the api of rand_ptls and just throwaway that argument, or do I potentially just break stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just change the API
function unbias_cong(max::UInt32) | ||
return typemax(UInt32) - ((typemax(UInt32) % max) + UInt32(1)) | ||
end | ||
cong(max::UInt32) = ccall(:jl_rand_ptls, UInt32, (UInt32,), max) + UInt32(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we implement it in Julia?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can as well, though I wonder if we can potentially avoid the PTLS entirely.
Do we have any benchmarks for this? Otherwise, LGTM. |
On my computer the rand_ptls call went from 70 to 50 ns. |
I haven't done extensive profiling though, it might be possible to make this even faster, but I didn't look too deeply. |
Based on the bitmask implementation described in https://www.pcg-random.org/posts/bounded-rands.html. The are potentially more benefits here to switching to a 32 bit only implementation similar to the one in the conclusion of the post, specially because some users of cong only need 32 bits or probably even 16 bits