Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use integer division for cong #50427

Merged
merged 4 commits into from
Jul 24, 2023
Merged

Conversation

gbaraldi
Copy link
Member

@gbaraldi gbaraldi commented Jul 5, 2023

Based on the bitmask implementation described in https://www.pcg-random.org/posts/bounded-rands.html. The are potentially more benefits here to switching to a 32 bit only implementation similar to the one in the conclusion of the post, specially because some users of cong only need 32 bits or probably even 16 bits

instead do a bitwise and and resample
return *seed % max;
uint64_t mask = ~(uint64_t)0;
--max;
mask >>= __builtin_clzll(max|1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we would be passing in the mask also but I expect this to still be better

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, we could attempt to keep this as the unbias_cong argument that it was before, but is it worth it? This seems cheap enough.

mask >>= __builtin_clzll(max|1);
uint64_t x;
do {
while ((*seed = 69069 * (*seed) + 362437) > unbias)
Copy link
Sponsor Member

@vtjnash vtjnash Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this unbias term is now possibly biasing your results slightly and should be removed
(it is from the algorithm "Division with Rejection (Unbiased)" or equivalently "Debiased Modulo (Twice)" previously)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep the api of rand_ptls and just throwaway that argument, or do I potentially just break stuff?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just change the API

function unbias_cong(max::UInt32)
return typemax(UInt32) - ((typemax(UInt32) % max) + UInt32(1))
end
cong(max::UInt32) = ccall(:jl_rand_ptls, UInt32, (UInt32,), max) + UInt32(1)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we implement it in Julia?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can as well, though I wonder if we can potentially avoid the PTLS entirely.

@oscardssmith
Copy link
Member

Do we have any benchmarks for this? Otherwise, LGTM.

@gbaraldi
Copy link
Member Author

On my computer the rand_ptls call went from 70 to 50 ns.

@gbaraldi
Copy link
Member Author

I haven't done extensive profiling though, it might be possible to make this even faster, but I didn't look too deeply.

@vtjnash vtjnash added the status:merge me PR is reviewed. Merge when all tests are passing label Jul 24, 2023
@gbaraldi gbaraldi merged commit 6f6439e into JuliaLang:master Jul 24, 2023
7 checks passed
@oscardssmith oscardssmith removed the status:merge me PR is reviewed. Merge when all tests are passing label Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants