-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
faster expm1 #37440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster expm1 #37440
Conversation
|
This PR looks like it breaks a lot of tests. |
|
Yeah. I need this one to be a bit more accurate still. I think it's currently in the 1.1 ulp range, and it needs to be in the .8 range. |
|
I just pushed a commit inspired by @simonbyrne's idea of using |
If I'm reading this correctly, the range reduction here is only to the interval [-ln(2)/2, ln(2)/2] — assuming #36761 gets merged, couldn't you reuse the range-reduction machinery onto the smaller interval [-ln(2)/512, ln(2)/512] so that your polynomial is optimized over a smaller range? I thought at one point when playing around with the new |
|
@jmert the problem with the obvious way of doing that is that the table will be 2x bigger. This isn't a critical flaw, and I should probably try it. The reason you need a bigger table is that my |
|
Ah OK — that makes sense. |
|
@StefanKarpinski how important is |
78bd26c to
26ec07a
Compare
|
I've just updated this for the most recent master, (specifically fixing the merge with the betterExp code) |
|
also note to not let this merge until I refactor to move the high precision evalpoly to math.jl |
You could make it into a draft PR perhaps? |
6b7eacf to
542ab81
Compare
|
This PR is now ready for review. This new version isn't quite as fast as the original, but is more accurate and faster than the current implementation. Many points to @jmert for getting me to think about how to make the table for |
e12e36b to
6c480fb
Compare
hopefully the tests now pass.
9c40b0f to
b1dcd0b
Compare
|
Bump on this. Would be great to have it in 1.7. |
|
Sorry for being late looking at this, but I am slightly concerned by the use of |
|
The sense I got was that in general processors with 32 bit Floats almost always have 64 bit floats too. Is there a cpu in particular you are thinking about where this isn't the case? If so, any chance you can run benchmarks on it? |
There is no such system AFAIK. All general-purpose CPUs for decades (whether 32-bit or 64-bit) have had hardware 64-bit floating-point arithmetic. (This is a common misunderstanding: "32 bit" only refers to the native pointer size.) |
I guess GPUs can be considered exceptions? And this should be easily compilable to a GPU just using CUDA.jl, I presume. |
|
gpus are an exception, but if we want good gpu performance, we need specialized gpu methods (probably involving intrinsics). As such, I don't think the existence of GPUs changes what we should do in |
|
Yeah, that makes sense. |
|
Given this, can we merge? |
Faster and slightly more accurate (and all Julia).
Faster and slightly more accurate (and all Julia).
Faster and slightly more accurate (and all Julia).
4x faster for Float32, 2x faster for Float64. This has the same problem as #37426 in that it needs
exthornerto achieve necessary precision.