-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
RFC: good names for utility functions for efficient FFTs #981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FFTs are most efficient if the length is a product of powers of small primes. nextpow2 allows you to compute the next power-of-two, and is a good choice for 1 dimensional FFTs. However, for 3-dimensional FFTs, using nextpow2 could result in an array that contains almost 8-fold more elements, and in this case the efficiency of the FFT algorithm does not usually compensate for the extra elements.
These utilities allow you to define something like "nextpowb", where b is a list of possible bases. Example:
# compute all integers that are 2^l*3^m*5^n, up to first that is >= 1025
p = powers([2,3,5],1025)
s = nextsize(220, p)
|
Oops, I was off on my running time for "powers": on my machine it's actually i.e., much better than I said above. |
|
Realized I could do much better. The new version completes 10^6 with [2,3,5] in less than 1ms, more than 4-fold faster than the previous version. |
|
Tim, If the functions don't have much use outside of fft applications, I'd suggest prefixing the names with fft_, that is, fft_powers and fft_nextsize. The names powers and nextsize are too generic and prone to collisions IMO. |
|
Agreed on nextsize, and maybe powers too. But powers is basically a "number theory" function (find all numbers less than a threshold which can be expressed as a product of particular primes), so I wonder whether it should be tagged with fft? @StefanKarpinski, you are an ex-algebraist, any thoughts about what this gets called? |
|
I realized I never really showed why this is important: Big savings. |
|
I'm unconvinced that generating a bunch of products of factors up to a certain size is of general enough use to expose it. Wanting the next highest product, however, seems generally useful and worth exposing. Then the algorithm of generating a bunch of products, and taking the lowest one becomes an implementation detail (and can be changed in the future if we figure out a better way to do it). How about having |
|
That time savings is very significant. We should definitely automatically do this so that our FFTs are just magically fast :-) |
|
Know of a good algorithm to do it on-the-fly? My current plan would be to cache results for [2,3,5,7] up to 10^11 and then do a lookup. That table is about 10,000 elements long, and taking the fourth-root (which I suspect is because I made it using 4 primes) is close enough to linear that it would be pretty quick to calculate a starting index for a binary search. But if there's a fast algorithm, I'm all for doing it on the fly (it's what I wanted in the first place). |
|
No, I don't know of a fast algorithm. However, if you only expose returning the next biggest product, then you don't need to allocate or sort an array, you can use constant memory, and just remember the best value seen so far. |
|
Not quite sure what you mean. Let's say we start with 37, and want to express the next power of 2, 3, and 5. This much is easy: Hmm, this is beginning to smell like an integer programming problem. If you don't mind bringing in such heavy machinery, one could probably get good (not necessarily perfect, but good) answers that way. |
|
I was assuming that you'd generate a list of products of 2, 3 and 5 every time and then sort it and take the highest above a certain value. In that case, if you were just iterating through all the various products (in unsorted order), you could just keep track of the smallest one that was greater than your starting value (in this case 37). However, that approach may be too slow. This may, in fact, be an ILP problem. We don't have an ILP solver, however, so I'm not sure how much that helps. |
|
I suggest that we prefix these functions with |
|
Stefan, now I'm with you: that's basically the algorithm I was using (truncated exhaustive search). It turns out it's approximately the same time to calculate the whole list of values as it is to calculate just one of them. Nevertheless I agree with you that this is a much nicer interface, and at least for my own use cases this will be fast enough. This is what I'll commit and merge. |
|
I should have explained further. Suppose you have k factors (e.g., 4 for [2,3,5,7]) and want to calculate nextprod for x. This is roughly an O( (logx)^m ) problem, where m is something like k-1. Doing the whole list is approximately O( (logx)^k ). However, when you do just one of them, that k-1 savings comes at the cost of an extra call to log, and the time it takes to call log is comparable to an extra factor of logx for most reasonable logx. A consequence: we may not want to automatically do this for all FFTs, because for FFTs on the scale of 1000 or so the time to execute nextprod is on the order of the FFT itself. But for bigger FFTs the overhead cost becomes negligible. However, making this part of general FFTs is nontrivial to tune properly (if you want to pad and then "unpad" you have to do 3 FFTs, see http://math.stackexchange.com/questions/77118/non-power-of-2-ffts, and deciding when that's worth it could be a bit tricky). I think for now the best use of this is to allow people to choose their own sizes in advance of taking transforms, but giving much more freedom than power-of-2. |
|
Er, we do have a LIP solver actually :) See this: load("sparse.jl", "glpk.jl", "linprog.jl")
fft_get_f(x) = [log(x), -log(2), -log(3), -log(5)]
fft_power_opts = GLPIntoptParam()
fft_power_opts["msg_lev"] = GLP_MSG_ERR
fft_power_opts["presolve"] = GLP_ON
function fft_power(x)
f = fft_get_f(x)
lb = [1, zeros(3)]
ub = [1, Inf*ones(3)]
(obj, sol, flag, ps_flag) = mixintprog(-f, f', [0.], [], [], lb, ub, nothing, fft_power_opts, nothing)
if !(ps_flag == 0 && flag == 0)
# never saw this while testing, but you never know
error("failed") # resort to other algorithm?
end
shift(sol)
return sol, prod([2, 3, 5] .^ sol)
endTest Timings seem good, too (but I don't have a comparison with the other algorithm): |
|
I was thinking we did have one...thanks Carlo! But, for this case it's a lot slower: julia> @time fft_power(124323) julia> @time nextprod([2,3,5],124323) (Note there's even an extra 0 in front, i.e., this is a factor of about 40.) Of course, if the number of factors grows long enough, surely this would win. For example: julia> @time nextprod([2,3,5,7,11,13,19],124323) Is it worth putting in a check and calling the IP solver in some cases? There would be issues because these modules are in extras/, and nextprod is currently in base. |
|
Thinking it over: since the IP solver will sometimes return an approximate (rather than exact) result, to me it seems this is a case for having two different functions. Leave nextprod as is, and then if anyone wants the more general case they can (given Carlo's excellent example!) easily write nextprod_approx. |
|
I did some more testing and tweaking, and here's what I found:
In general, I noticed that in some cases (but not all) it seems to solve a strict inequality.
I wanted to test whether the provided solution was exact, but I found the bug in (All the code is here, including that for benchmarks and testing) |
|
Argh, I had a < where it should have been <=. Carlo, thanks for catching these! I tried running your test (a very good idea, now that we have two solutions for the problem) and got this: julia> load("fft_test.jl") Is there something else I need to load? Or am I not up-to-date on glp stuff? FYI you can now say "require" to load packages just once (I merged the stat stuff). |
|
Ah, sorry, I just committed the fix for that bug. The glpk package is still in need of some deep testing. |
|
Well, we're testing both of our work. This is useful. The ILP solver is very impressive. I get perfect agreement between the two up to 59,049. For 59,050, I get this: julia> x = nextprod(ps, 59050) julia> y = fft_powers_approx(ps, 59050) Note that the value returned from fft_powers_approx is smaller than the input value. Might it be a roundoff error problem? My initial implementation of "nextpow(a, x)" suffered from that. |
|
A roundoff error seems likely. I'll investigate. |
|
Because I was worried about yet more errors, I went through and worked out a couple of examples by hand. That was instructive. So here's a new algorithm---I'm raising the bar! This is getting out of control... julia> @time nextprod(PRIMES[1:3],10^8+1) julia> @time nextprod(PRIMES[1:4],10^8+1) julia> @time nextprod(PRIMES[1:20],10^8+1) prevprod has not been updated yet. |
|
ahah great! This rules out completely the need for a linear programming approach. (BTW the errors in LP are indeed due to numerical approximation; I can easily fix that up to some point - and then the algorithm proves to be exact in practice - but it gets harder and harder to keep the precision in LP as the target increases, due to the use of logarithms, and |
|
I'll implement prevprod along the same lines, then merge this. I wouldn't have caught the bugs in this without your help. Many thanks to everyone who helped with this! |
Next and previous integer expressible as a product of specified factors nextprod is useful for writing efficient FFT algorithms, to make sure your length is a product of small primes.
FFTs are most efficient if the length is a product of powers of small primes. nextpow2 allows you to compute the next power-of-two, and is a good choice for 1 dimensional FFTs. However, for 3-dimensional FFTs, using nextpow2 could result in an array that contains almost 8-fold more elements, and in this case the efficiency of the FFT algorithm usually does not compensate for the extra elements.
The utilities in this commit allow you to define something like "nextpowb", where b is a list of possible factors. Example:
Using [2,3,5], up to x=10,000 the ratio of nextsize(x,p)/x is never bigger than 1.1538, and has a median value of 1.0124. Consequently, in the worst-case scenario a 3d FFT will require only 1.5362-fold more elements, a very significant savings over using nextpow2. "powers" is also reasonably fast, requiring about 0.023 s up to x=10^5 (using [2,3,5]), about the largest size you could imagine being relevant for multidimensional FFTs. And you only need to call it once, even if you have many FFTs to compute.
In addition to any discussion re the intent, algorithms, etc, I would appreciate feedback about the choice of names! I'd kind of like to make this more discoverable, e.g., by starting a function name with "nextpow", but the interface is a little different (in particular, you likely want to pre-compute p, rather than nextpowb(220, [2,3,5]) on each iteration).
Thoughts?