-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half-integer order for besselk
#25
Comments
I'm not for sure if there is another function I'm missing but that allocates because it is just misspelled (missing an One thing I would be concerned about is just the amount of branches. For the Though there should be a way to eliminate those branches for constants like that... see comment JuliaMath/SpecialFunctions.jl#178 (comment) and related notes |
Okay, well that's embarrassing about the With regard to the branches, I'm not sure how they could really be removed and if there's something in that issue explaining how I don't think I understand it. Even with manual methods, how would you eventually avoid somehow checking if In any case, I know you're working on |
I believe it should be something like the following (I probably have the details slightly wrong). The key is that you use the
|
Haha ya I was about to comment the same thing.... You can completely avoid these branches. function besslk_halfint(nu, x)
nu = abs(nu)
k0 = sqrt(pi/(2*x))*exp(-x)
k1 = k0*(one(x) + inv(x))
k2 = k1
x2 = 2 / x
arr = range(start=1.5, stop=nu, step=1)
for n in arr
a = x2 * n
k2 = muladd(a, k1, k0)
k0 = k1
k1 = k2
end
return k0
end |
oh, and your version is even right! |
Oh wow, look at that! That's very clever to make |
And you will probably want to combine Oscar's version with mine to avoid the excess divisions. You'll also want to make sqrt(pi/2) a constant then you can solve this with a single division. I've found the branches hard to measure in microbenchmarks until you piece together the full function. Often times when using In regards to the constant-propagation that was mentioned in the other thread. I haven't looked into that, but I think @oscardssmith would better be able to answer that... |
branches are expensive. The for loop is 1 branch per iteration, so this version is strictly better. What about constant-prop? |
JuliaMath/SpecialFunctions.jl#178 (comment) this comment is what we are referring too |
Amazing, thank you both. I'll tinker with this and try to come back with a refined version. |
Okay, so here's a weird one: when I use those fancier range iterators it just destroys performance for me on
But I think I have a modification that works very well for me and still addresses your helpful points about performance:
With benchmark timings:
Using the range also has the same performance properties. With the simple loop in
Hard to imagine doing much better than that. But I've said that before and been very wrong as evidenced by a lot of things in this package....so, thoughts? I should say, I have an Intel i5-11600K CPU, which does have AVX-512, and I've always been afraid that somehow it's actually been killing performance for me that the compiler is trying to use it but doing a bad job or something. So maybe this is a me problem that I should sort out independently of this issue. |
I can also say that I got a similar benchmark Also, you'll probably want to hardcode those constants and do them in extended precision so you can get correctly rounded results const SQRT_PID2(::Type{Float64}) = 1.2533141373155003
function besk_halfint2(v::T, x) where T
v = abs(v)
invx = inv(x)
b0 = b1 = SQRT_PID2(T)*sqrt(invx)*exp(-x)
twodx = 2*invx
_v = T(1/2)
while _v < v
b0, b1 = b1, muladd(b1, twodx*_v, b0)
_v += one(T)
end
b1
end |
What do you think about something like this to handle half-integer orders of
besselk
? It uses the fact that the asymptotic expansion terminates and is exact for half-integer orders.You can't pass AD through this w.r.t.
v
, of course, and this was in my experience the hardest derivative to get and is why I ended up coding up the very expensive Temme routine, which is by far the slowest routine for us. But if you wanted something in the mean time, this has to be pretty competitively fast except for outrageously largev
.The text was updated successfully, but these errors were encountered: