Use Levin transform in Complex airy functions (again) #95

heltonmc · 2023-04-24T19:43:09Z

Supersedes #94 which includes the rebase.

codecov · 2023-04-24T19:48:31Z

Codecov Report

Patch coverage: 96.91% and project coverage change: +0.53 🎉

Comparison is base (cec8fce) 96.09% compared to head (ab50225) 96.63%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #95      +/-   ##
==========================================
+ Coverage   96.09%   96.63%   +0.53%     
==========================================
  Files          20       23       +3     
  Lines        2228     2377     +149     
==========================================
+ Hits         2141     2297     +156     
+ Misses         87       80       -7

Impacted Files	Coverage Δ
src/AiryFunctions/airy.jl	`99.56% <ø> (ø)`
src/BesselFunctions/Float128/besselj.jl	`98.07% <ø> (ø)`
src/BesselFunctions/Float128/bessely.jl	`96.07% <ø> (ø)`
src/BesselFunctions/Polynomials/besselj_polys.jl	`100.00% <ø> (ø)`
src/BesselFunctions/U_polynomials.jl	`99.28% <ø> (ø)`
src/BesselFunctions/asymptotics.jl	`99.48% <ø> (ø)`
src/BesselFunctions/besseli.jl	`99.00% <ø> (ø)`
src/BesselFunctions/besselj.jl	`98.31% <ø> (ø)`
src/BesselFunctions/bessely.jl	`97.95% <ø> (ø)`
src/BesselFunctions/constants.jl	`100.00% <ø> (ø)`
... and 13 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/Airy/cairy.jl

heltonmc · 2023-04-25T19:05:10Z

I still need a little bit of work on this but I've added the Levin transform for the airybi functions. This function is more difficult than airyai (which is difficult in itself) as the asymptotic expansion is a combination of two series.

I made the initial mistake of trying to combine these two series within the transform which led to errors. Therefore, it is required that the transform be performed individually on the two series and then combined after the transformation. This was a bit subtle at first why this was case but it is necessary. Essentially, the other way to go about it is to treat them as individual terms within the series (don't combine them) but that requires essentially double the amount of terms to do it individually.

In the case of the Levin transform it is more optimal to compute two series of length N/2 than it is to compute one series of length N. This is because the amount of computations scales non-linearly with the length of the series. The function itself right now is kind of hacked together but the initial computation of the partial sums of the two series should be able to be computed simultaneously. I'll work on that in the next few days after I take a little break looking at it.

The advantages are now many fold: 1. These are much more accurate than previous version, 2. They are much faster, 3. they completely separate the airy function computation from the bessel function routines (therefore we can separate it well into its own module).

heltonmc · 2023-04-25T19:11:27Z

There is a couple things to figure out that as well. I still need to work on a Float32 routine and get that ironed out. We should only need the power series and asymptotic expansion. I think it might be confusing to readers if we have a primary routine that dispatch on the cutoff parameters but that some of the dispatches are never hit by the Float32 routine as the first two branches cover all the cases.It is also the case right now that it was easier to design the cutoff for the Levin case before the power series case but that could be switched.

And I need to figure out what to do with the large negative values. This will still suffer from same issues on real line.

I don't really like throwing an error personally. I would prefer NaN (like SciPy) or extensively document the expected errors and return value with loss of precision (Boost Math Library). It seems more natural than to throw an error then return a value for -Inf. It also hurts the purity and ability to statically compile these functions with StaticCompiler (my primary interest with some of these routines)...

src/Airy/cairy.jl

heltonmc · 2023-04-26T13:51:21Z

Still haven't figured out a good way to combine the above. But I noticed a large part of the runtime is actually in generating the partial sums while the levin transform (levin_transform(l)) takes like 10% of the runtime. At least for small to medium values (N<22). It of course grows nonlinearly but we are only worried about around N=16.

So in that case it may be more optimal to focus on optimizing the sequence generation. The issue of course is having a complex divide (which I've positioned to a single fast math inv) but even still that is slow. It is possible though to rearrange the formula to avoid the complex division at the cost of a complex - real multiply and a complex-complex multiply. In general complex divides are ridiculously slow (I don't have an exact number on the equivalent complex multiplies) so should be avoided at all costs.

In any case... it is opportune to write the sequence generation as...

@generated function airyaiprimex2_levin(x::Complex{T}, ::Val{N}) where {T <: Union{Float32, Float64}, N}
    :(
        begin
            xsqr = sqrt(x)
            xsqrx = xsqr * x
            t = -GAMMA_ONE_SIXTH(T) * GAMMA_FIVE_SIXTHS(T) / 4
            t2 = 1 / t
            a = @fastmath inv(4*xsqrx)
            a2 = 4 * xsqrx

            s = zero(typeof(x))
            l = @ntuple $N i -> begin
                s += t
                t *= -a * (3 * (i - 7//6) * (i + 1//6) / i)
                t2 *= -a2 * (i / (3 * (i - 7//6) * (i + 1//6)))
                Vec{4, T}((reim(s * t2)..., reim(t2)...))
            end
            return levin_transform(l) * sqrt(xsqr) / T(π)^(3//2)
        end
    )
end

So we now have two accumulators that keep track of the series terms and inverse series terms. Here are the benchmarks.

julia> @benchmark Bessels.airyaiprimex2_levin(z, Val(16)) setup=(z=rand() + rand()*im)
BenchmarkTools.Trial: 10000 samples with 932 evaluations.
 Range (min … max):  109.218 ns … 164.520 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     109.844 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   110.603 ns ±   3.260 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▅█▁      ▄▃▂                                                ▂
  ██████▆▆▆▆████▆▆▆▇▇▇▃▃▁▃▅▆▅▆▆▄▆▃▅▆▅▃▄▃▅▄▅▅▆▄▃▄▅▆▅▄▅▅▆▅▆▅▅▆▄▆▆ █
  109 ns        Histogram: log(frequency) by time        127 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Bessels.airyaiprimex_levin(z, Val(16)) setup=(z=rand() + rand()*im)
BenchmarkTools.Trial: 10000 samples with 839 evaluations.
 Range (min … max):  145.262 ns … 209.377 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     149.484 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   148.999 ns ±   4.852 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▇▆     ▃▂ █▂     ▃▂                                          ▁
  ███▆▆▇▅▅██████▇▇▆▆██▅▆▆▇▆▄▄▅▅▄▅▄▅▃▅▅▃▃▅▄▄▄▄▅▄▂▅▅▆▅▄▆▆▄▅▄▅▃▄▄▅ █
  145 ns        Histogram: log(frequency) by time        172 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

That reduces the time significantly and we also get to avoid any fastmath flags in the loop. I'm looking at the machine code though and it is still not optimal unfortunately. It should now be possible to vectorize the t and t2 accumulation. The auto-vectorizer is doing an ok job but there is still scalar code being generated within this loop. It might be possible to further reduce this comptuational time.

I worry a little bit that this is loosing the meaning of the original formulas making a little more difficult to read. But........ 🤷‍♂️ it also looks like it will be possible to speed this up significantly so it might be worth it. I just need to extensively whats going on....

cgeoga · 2023-04-26T16:09:09Z

Taking a look at the source of this PR as I play with the branch---just as a sanity check, do you mean for besselkx_levin to live in the math.jl file?

heltonmc · 2023-04-26T16:56:53Z

do you mean for besselkx_levin to live in the math.jl file?

No - just used that as an initial reference. I've moved it and added tests for those functions just to make sure they are working. I haven't made the optimizations but they are still reasonably fast right now.

Since this PR is focused on airy functions I'm going to finish those off and merge then I'll work more on the besselk case but you should be able to use these. I removed any premature breaking from them!

heltonmc · 2023-04-26T17:50:50Z

Alright this finally completes the vision of having everything that doesn't depend on each other in separate modules. If someone in the future wants to split these off into separate packages that would be fine with me I think but for now this is good.

Supersedes #84. For example, now the AiryFunctions package just depends on the Math submodule which contains the math constants, levin transform, and reexports what we need from SIMDMath.

I'm sure there are some things to sort out which I will do depending on how we want to precompile some of the methods. Should precompile statements happen at the top module level or within each submodule that are then reexported? Hoping to take full advantage of v1.9 coming soon.

Still have several things to order out in the airy functions though now.

Unsure what's going on with the invalidation CI. I think it's because of the recent movements with SnoopCompile and PrecompileTools?

heltonmc · 2023-04-26T19:10:31Z

A little concerned that the testing times for airy function module has increased from 3s to 7s. Though locally this still takes around 3s. I'm wondering if through CI there is a high cost of compilation that isn't being cached appropriately. I ran a few benchmarks locally and can't reproduce so I think that is ok.

src/AiryFunctions/cairy.jl

heltonmc · 2023-04-27T22:26:02Z

Alright I think this is ready to merge now to make way for #96

heltonmc added 3 commits April 24, 2023 15:40

update complex airy

9c7d8d8

add separate precompile statement

e070e1b

update file name and add Levin

0a7746e

oscardssmith reviewed Apr 24, 2023

View reviewed changes

src/Airy/cairy.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Apr 24, 2023

View reviewed changes

src/Airy/cairy.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Apr 24, 2023

View reviewed changes

src/Airy/cairy.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Apr 24, 2023

View reviewed changes

src/Airy/cairy.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Apr 24, 2023

View reviewed changes

src/Airy/cairy.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Apr 24, 2023

View reviewed changes

src/Airy/cairy.jl Outdated Show resolved Hide resolved

heltonmc added 4 commits April 24, 2023 18:12

add some constant prop

f9d40d8

improve expression

d201df1

save one mul

647fd33

add airybi levin

13804c9

oscardssmith reviewed Apr 25, 2023

View reviewed changes

src/Airy/cairy.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Apr 25, 2023

View reviewed changes

src/Airy/cairy.jl Show resolved Hide resolved

oscardssmith reviewed Apr 25, 2023

View reviewed changes

src/Airy/cairy.jl Show resolved Hide resolved

avoid complex div in airy levin

33d30af

rearrange besselk_levin and add tests

5d395f7

reorg into modules

2574f49

heltonmc mentioned this pull request Apr 26, 2023

Refactor code into separate modules #84

Closed

heltonmc added 3 commits April 26, 2023 13:53

remove SIMDMath at top level

bc39976

improve sequence generation in airybi_levin

8c71a98

remove uncessary fastmath

077d01a

cleanup and add scaled airy tests

8d394ba

oscardssmith reviewed Apr 26, 2023

View reviewed changes

src/AiryFunctions/cairy.jl Outdated Show resolved Hide resolved

heltonmc added 3 commits April 26, 2023 15:49

switch to isinf

283ee5e

add NaN check

5e1da6f

simplify besselk_levin (use con. prop)

077e0ef

heltonmc mentioned this pull request Apr 27, 2023

A different approach to compute besselk in intermediate zones #74

Open

fix levin for AD

47071c2

heltonmc mentioned this pull request Apr 27, 2023

besselk Enzyme compatibility w.r.t. order #96

Closed

3 tasks

fix promotion

ab50225

heltonmc merged commit 152771d into JuliaMath:master Apr 28, 2023

heltonmc deleted the airy_levin2 branch April 28, 2023 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Levin transform in Complex airy functions (again) #95

Use Levin transform in Complex airy functions (again) #95

heltonmc commented Apr 24, 2023

codecov bot commented Apr 24, 2023 •

edited

Loading

heltonmc commented Apr 25, 2023

heltonmc commented Apr 25, 2023

heltonmc commented Apr 26, 2023 •

edited

Loading

cgeoga commented Apr 26, 2023

heltonmc commented Apr 26, 2023

heltonmc commented Apr 26, 2023 •

edited

Loading

heltonmc commented Apr 26, 2023

heltonmc commented Apr 27, 2023

Use Levin transform in Complex airy functions (again) #95

Use Levin transform in Complex airy functions (again) #95

Conversation

heltonmc commented Apr 24, 2023

codecov bot commented Apr 24, 2023 • edited Loading

Codecov Report

heltonmc commented Apr 25, 2023

heltonmc commented Apr 25, 2023

heltonmc commented Apr 26, 2023 • edited Loading

cgeoga commented Apr 26, 2023

heltonmc commented Apr 26, 2023

heltonmc commented Apr 26, 2023 • edited Loading

heltonmc commented Apr 26, 2023

heltonmc commented Apr 27, 2023

codecov bot commented Apr 24, 2023 •

edited

Loading

heltonmc commented Apr 26, 2023 •

edited

Loading

heltonmc commented Apr 26, 2023 •

edited

Loading