-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve isprime performance for 32 bit integers #9
Conversation
Sorry, I haven't really followed the discussion on the other threads (I don't actually know all that much about prime number algorithms), but is the only downside of the alternative method that it needs a slightly larger lookup table? If so, that's definitely worth it: 512 bytes is basically nothing (a processor these days typically has 64K L1 cache), special functions implementations often use way larger tables, e.g. Also, add a reference to the paper in the comments (this should also be included in the eventual documents as well). |
for m in (3,5,7,11,13,17,19,23) | ||
n % m == 0 && return false | ||
end | ||
n < 841 && return n > 1 | ||
s = trailing_zeros(n-1) | ||
d = (n-1) >>> s | ||
for a in witnesses(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied comment:
I think you can give a type hint
for a in witnesses(n)::Tuple{Vararg{Int}}
making the witnesses stack allocated.
@simonbyrne I also think the larger lookup table is worth it. But the first time the improvement strategy was discussed there was someone that wasn't so happy about it. So I'll change it to the other version and will address your comments about the reference. |
might also see some speedup by using |
i = (((n >> 16) $ n) * 0x45d9f3b) & 0xffffffff | ||
i = ((i >> 16) $ i) & 15 + 1 | ||
(2,(2249,483,194,199,15,369,499,945,419,735,33,471,946,615,497,702)[i]) | ||
# Forišek and Jančina, "Fast Primality Testing for Integers That Fit into a Machine Word", 2015 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this FJ32_256
? that seem to be the only one described in the paper (though it's a little different?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took the hash and bases from the FJ32_256
there, since we were already had a Miller-Rabin test implemented.
That was the only thing we really needed, since everything else was already here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks! Would be good to note that this is the algorithm used :) Is there any value is the 64-bit version (for 64 bit types)? the paper seems to suggest they weren't sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll mention the algorithm used. The 64-bit version needs a way larger look-up table and is probably better to try one of the approaches discussed here JuliaLang/julia#11594
Looks good. |
So is this good to merge? |
Great, thanks! |
Brought from JuliaLang/julia#16349.
As I commented over there there is another possibility (show below) instead of the current PR. It has been argued that it uses more memory (which is true, see JuliaLang/julia#16349 (comment)), but is much less memory than the current implementation anyway and is faster than this PR.