New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pure julia exp function #19831
Add pure julia exp function #19831
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
# Based on FreeBSD lib/msun/src/e_exp.c | ||
# which is made available under the following licence | ||
|
||
## Copyright (C) 2004 by Sun Microsystems, Inc. All rights reserved. Permission | ||
## to use, copy, modify, and distribute this software is freely granted, | ||
## provided that this notice is preserved. | ||
|
||
# Method | ||
# 1. Argument reduction: Reduce x to an r so that |r| <= 0.5*ln(2). Given x, | ||
# find r and integer k such that | ||
# x = k*ln(2) + r, |r| <= 0.5*ln(2). | ||
# Here r is represented as r = hi - lo for better accuracy. | ||
# | ||
# 2. Approximate exp(r) by a special rational function on [0, 0.5*ln(2)]: | ||
# R(r^2) = r*(exp(r)+1)/(exp(r)-1) = 2 + r*r/6 - r^4/360 + ... | ||
# | ||
# A special Remez algorithm on [0, 0.5*ln(2)] is used to generate a | ||
# polynomial to approximate R. | ||
# | ||
# The computation of exp(r) thus becomes | ||
# 2*r | ||
# exp(r) = 1 + ---------- | ||
# R(r) - r | ||
# r*c(r) | ||
# = 1 + r + ----------- (for better accuracy) | ||
# 2 - c(r) | ||
# where | ||
# c(r) = r - (P1*r^2 + P2*r^4 + ... + P5*r^10 + ...). | ||
# | ||
# 3. Scale back: exp(x) = 2^k * exp(r) | ||
|
||
# log(2) | ||
const LN2 = 6.931471805599453094172321214581765680755001343602552541206800094933936219696955e-01 | ||
# log2(e) | ||
const LOG2E = 1.442695040888963407359924681001892137426646 | ||
|
||
# log(2) into upper and lower | ||
LN2U(::Type{Float64}) = 6.93147180369123816490e-1 | ||
LN2U(::Type{Float32}) = 6.9313812256f-1 | ||
|
||
LN2L(::Type{Float64}) = 1.90821492927058770002e-10 | ||
LN2L(::Type{Float32}) = 9.0580006145f-6 | ||
|
||
# max and min arguments for exponential fucntions | ||
MAXEXP(::Type{Float64}) = 7.09782712893383996732e2 # log 2^1023*(2-2^-52) | ||
MAXEXP(::Type{Float32}) = 88.72283905206835f0 # log 2^127 *(2-2^-23) | ||
|
||
# one less than the min exponent since we can sqeeze a bit more from the exp function | ||
MINEXP(::Type{Float64}) = -7.451332191019412076235e2 # log 2^-1075 | ||
MINEXP(::Type{Float32}) = -103.97207708f0 # log 2^-150 | ||
|
||
@inline exp_kernel(x::Float64) = @horner(x, 1.66666666666666019037e-1, | ||
-2.77777777770155933842e-3, 6.61375632143793436117e-5, | ||
-1.65339022054652515390e-6, 4.13813679705723846039e-8) | ||
|
||
@inline exp_kernel(x::Float32) = @horner(x, 1.6666625440f-1, -2.7667332906f-3) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (This is such a low degree that I wonder if we'd be better off using a minimax polynomial rather than a minimax rational function, at least for the single-precision case — the optimal polynomial would require a higher degree, but it would avoid the division. I don't think we need to address that possibility in this PR, however, since this is no worse than what we are doing now.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This a good question. I have already played with many variations and my conclusion is that the msun version is the best of all things I have tested and played with considering both accuracy (critical goal of being < 1 ulp) and performance. Here's the problem with the minmax polynomial approach for the exp function: on my test suite for Float32 NON-FMA system rational FMA system polynomial NON-FMA system polynomial over the range: vcat(-10:0.0002:10, -1000:0.001:1000, -120:0.0023:1000, -1000:0.02:2000) as you can see it's not possible to be under 1 ulp for non-fma systems using this polynomial (below) What about speed? For reference this is the best minmax polynomial I can find for the Float32 @inline exp_kernel{T<:SmallFloat}(x::T) = @horner_oftype(x, 1.0, 1.0, 0.5,
0.1666666567325592041015625,
4.1666455566883087158203125e-2,
8.333526551723480224609375e-3,
1.39357591979205608367919921875e-3,
1.97799992747604846954345703125e-4) Note it's not as simple as adding more degrees to this polynomial to improve the accuracy For Float64 @inline exp_kernel{T<:LargeFloat}(x::T) = @horner_oftype(x, 1.0, 1.0, 0.5,
0.16666666666666685170383743752609007060527801513672,
4.1666666666666692109277647659837384708225727081299e-2,
8.3333333333159547579027659480743750464171171188354e-3,
1.38888888888693412537733706813014578074216842651367e-3,
1.9841269898657093212653024227876130680670030415058e-4,
2.4801587357008890921336585755341275216778740286827e-5,
2.7557232875898009206386968239499424271343741565943e-6,
2.7557245320026768203034231441428403286408865824342e-7,
2.51126540120060271373185023340013355408473216812126e-8,
2.0923712382298872819985862227861600493028504388349e-9) I haven't tested this in a while but the same conclusions holds |
||
|
||
# for values smaller than this threshold just use a Taylor expansion | ||
exp_small_thres(::Type{Float64}) = 2.0^-28 | ||
exp_small_thres(::Type{Float32}) = 2.0f0^-13 | ||
|
||
""" | ||
exp(x) | ||
|
||
Compute the natural base exponential of `x`, in other words ``e^x``. | ||
""" | ||
function exp{T<:Union{Float32,Float64}}(x::T) | ||
xa = reinterpret(Unsigned, x) & ~sign_mask(T) | ||
xsb = signbit(x) | ||
|
||
# filter out non-finite arguments | ||
if xa > reinterpret(Unsigned, MAXEXP(T)) | ||
if xa >= exponent_mask(T) | ||
xa & significand_mask(T) != 0 && return T(NaN) | ||
return xsb ? T(0.0) : T(Inf) # exp(+-Inf) | ||
end | ||
x > MAXEXP(T) && return T(Inf) | ||
x < MINEXP(T) && return T(0.0) | ||
end | ||
|
||
# This implementation gives 2.7182818284590455 for exp(1.0) when T == | ||
# Float64, which is well within the allowable error; however, | ||
# 2.718281828459045 is closer to the true value so we prefer that answer, | ||
# given that 1.0 is such an important argument value. | ||
if x == T(1.0) && T == Float64 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the difference between |
||
return 2.718281828459045235360 | ||
end | ||
|
||
if xa > reinterpret(Unsigned, T(0.5)*T(LN2)) # |x| > 0.5 log(2) | ||
# argument reduction | ||
if xa < reinterpret(Unsigned, T(1.5)*T(LN2)) # |x| < 1.5 log(2) | ||
if xsb | ||
k = -1 | ||
hi = x + LN2U(T) | ||
lo = -LN2L(T) | ||
else | ||
k = 1 | ||
hi = x - LN2U(T) | ||
lo = LN2L(T) | ||
end | ||
else | ||
n = round(T(LOG2E)*x) | ||
k = unsafe_trunc(Int,n) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems bad that this will behave differently on 32-bit and 64-bit machines? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No it doesn't matter in this case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made the change, but I don't fully understand why we should prefer |
||
hi = muladd(n, -LN2U(T), x) | ||
lo = n*LN2L(T) | ||
end | ||
r = hi - lo | ||
|
||
# compute approximation on reduced argument | ||
z = r*r | ||
p = r - z*exp_kernel(z) | ||
y = T(1.0) - ((lo - (r*p)/(T(2.0) - p)) - hi) | ||
|
||
# scale back | ||
if k > -significand_bits(T) | ||
# multiply by 2.0 first to prevent overflow, which helps extends the range | ||
k == exponent_max(T) && return y*T(2.0)*T(2.0)^(exponent_max(T) - 1) | ||
twopk = reinterpret(T, rem(exponent_bias(T) + k, fpinttype(T)) << significand_bits(T)) | ||
return y*twopk | ||
else | ||
# add significand_bits(T) + 1 to lift the range outside the subnormals | ||
twopk = reinterpret(T, rem(exponent_bias(T) + significand_bits(T) + 1 + k, fpinttype(T)) << significand_bits(T)) | ||
return y*twopk*T(2.0)^(-significand_bits(T) - 1) | ||
end | ||
elseif xa < reinterpret(Unsigned, exp_small_thres(T)) # |x| < exp_small_thres | ||
# Taylor approximation for small x | ||
return T(1.0) + x | ||
else | ||
# primary range with k = 0, so compute approximation directly | ||
z = x*x | ||
p = x - z*exp_kernel(z) | ||
return T(1.0) - ((x*p)/(p - T(2.0)) - x) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to import any of these since we don't extend them