Skip to content

Conversation

@timholy
Copy link
Member

@timholy timholy commented Jun 27, 2012

FFTs are most efficient if the length is a product of powers of small primes. nextpow2 allows you to compute the next power-of-two, and is a good choice for 1 dimensional FFTs. However, for 3-dimensional FFTs, using nextpow2 could result in an array that contains almost 8-fold more elements, and in this case the efficiency of the FFT algorithm usually does not compensate for the extra elements.

The utilities in this commit allow you to define something like "nextpowb", where b is a list of possible factors. Example:

# compute all integers that are 2^l*3^m*5^n, up to the first that is >= 1025
p = powers([2,3,5],1025)
# find a size appropriate for a vector of 220 elements
s = nextsize(220, p)

Using [2,3,5], up to x=10,000 the ratio of nextsize(x,p)/x is never bigger than 1.1538, and has a median value of 1.0124. Consequently, in the worst-case scenario a 3d FFT will require only 1.5362-fold more elements, a very significant savings over using nextpow2. "powers" is also reasonably fast, requiring about 0.023 s up to x=10^5 (using [2,3,5]), about the largest size you could imagine being relevant for multidimensional FFTs. And you only need to call it once, even if you have many FFTs to compute.

In addition to any discussion re the intent, algorithms, etc, I would appreciate feedback about the choice of names! I'd kind of like to make this more discoverable, e.g., by starting a function name with "nextpow", but the interface is a little different (in particular, you likely want to pre-compute p, rather than nextpowb(220, [2,3,5]) on each iteration).

Thoughts?

FFTs are most efficient if the length is a product of powers of small primes. nextpow2 allows you to compute the next power-of-two, and is a good choice for 1 dimensional FFTs. However, for 3-dimensional FFTs, using nextpow2 could result in an array that contains almost 8-fold more elements, and in this case the efficiency of the FFT algorithm does not usually compensate for the extra elements.

These utilities allow you to define something like "nextpowb", where b is a list of possible bases.  Example:

    # compute all integers that are 2^l*3^m*5^n, up to first that is >= 1025
    p = powers([2,3,5],1025)
    s = nextsize(220, p)
@timholy
Copy link
Member Author

timholy commented Jun 27, 2012

Oops, I was off on my running time for "powers": on my machine it's actually

julia> @time p = powers([2,3,5],10^5);
elapsed time: 0.0013508796691894531 seconds

i.e., much better than I said above.

@timholy
Copy link
Member Author

timholy commented Jun 27, 2012

Realized I could do much better. The new version completes 10^6 with [2,3,5] in less than 1ms, more than 4-fold faster than the previous version.

@mbaz
Copy link
Contributor

mbaz commented Jun 27, 2012

Tim,

If the functions don't have much use outside of fft applications, I'd suggest prefixing the names with fft_, that is, fft_powers and fft_nextsize. The names powers and nextsize are too generic and prone to collisions IMO.

@timholy
Copy link
Member Author

timholy commented Jun 27, 2012

Agreed on nextsize, and maybe powers too. But powers is basically a "number theory" function (find all numbers less than a threshold which can be expressed as a product of particular primes), so I wonder whether it should be tagged with fft? @StefanKarpinski, you are an ex-algebraist, any thoughts about what this gets called?

@timholy
Copy link
Member Author

timholy commented Jun 27, 2012

I realized I never really showed why this is important:

julia> x = randn(9733);   # a prime number

julia> @time for i = 1:1000; xfft = fft(x); end
elapsed time: 3.8260140419006348 seconds

julia> x = randn(9800);   # a number from the "powers" table (with [2,3,5,7])

julia> @time for i = 1:1000; xfft = fft(x); end
elapsed time: 0.5076489448547363 seconds

Big savings.

@StefanKarpinski
Copy link
Member

I'm unconvinced that generating a bunch of products of factors up to a certain size is of general enough use to expose it. Wanting the next highest product, however, seems generally useful and worth exposing. Then the algorithm of generating a bunch of products, and taking the lowest one becomes an implementation detail (and can be changed in the future if we figure out a better way to do it). How about having nextprod([a,b,c...],n) and nextpow(a,n)? I think that those can go in combinatorics since they seem quite general. For completeness we may want prevprod and prevpow as well.

@StefanKarpinski
Copy link
Member

That time savings is very significant. We should definitely automatically do this so that our FFTs are just magically fast :-)

@timholy
Copy link
Member Author

timholy commented Jun 27, 2012

Know of a good algorithm to do it on-the-fly? My current plan would be to cache results for [2,3,5,7] up to 10^11 and then do a lookup. That table is about 10,000 elements long, and taking the fourth-root (which I suspect is because I made it using 4 primes) is close enough to linear that it would be pretty quick to calculate a starting index for a binary search.

But if there's a fast algorithm, I'm all for doing it on the fly (it's what I wanted in the first place).

@StefanKarpinski
Copy link
Member

No, I don't know of a fast algorithm. However, if you only expose returning the next biggest product, then you don't need to allocate or sort an array, you can use constant memory, and just remember the best value seen so far.

@timholy
Copy link
Member Author

timholy commented Jun 27, 2012

Not quite sure what you mean. Let's say we start with 37, and want to express the next power of 2, 3, and 5. This much is easy:
l2 = iceil(log(37)/log(2)) # 6, i.e., 64
l3 = iceil(log(37)/log(3)) # 4, i.e., 81
l5 = iceil(log(37)/log(5)) # 3, i.e., 125
However, the best choice is 40 = 5*2^3. Which value am I supposed to remember?

Hmm, this is beginning to smell like an integer programming problem. If you don't mind bringing in such heavy machinery, one could probably get good (not necessarily perfect, but good) answers that way.

@StefanKarpinski
Copy link
Member

I was assuming that you'd generate a list of products of 2, 3 and 5 every time and then sort it and take the highest above a certain value. In that case, if you were just iterating through all the various products (in unsorted order), you could just keep track of the smallest one that was greater than your starting value (in this case 37). However, that approach may be too slow. This may, in fact, be an ILP problem. We don't have an ILP solver, however, so I'm not sure how much that helps.

@ViralBShah
Copy link
Member

I suggest that we prefix these functions with fft_ for now until we figure out more general usage and better names. In fact, initially, these functions can go into the fft module once we have namespaces.

@timholy
Copy link
Member Author

timholy commented Jun 28, 2012

Stefan, now I'm with you: that's basically the algorithm I was using (truncated exhaustive search).

It turns out it's approximately the same time to calculate the whole list of values as it is to calculate just one of them. Nevertheless I agree with you that this is a much nicer interface, and at least for my own use cases this will be fast enough. This is what I'll commit and merge.

@timholy
Copy link
Member Author

timholy commented Jun 28, 2012

I should have explained further. Suppose you have k factors (e.g., 4 for [2,3,5,7]) and want to calculate nextprod for x. This is roughly an O( (logx)^m ) problem, where m is something like k-1. Doing the whole list is approximately O( (logx)^k ). However, when you do just one of them, that k-1 savings comes at the cost of an extra call to log, and the time it takes to call log is comparable to an extra factor of logx for most reasonable logx.

A consequence: we may not want to automatically do this for all FFTs, because for FFTs on the scale of 1000 or so the time to execute nextprod is on the order of the FFT itself. But for bigger FFTs the overhead cost becomes negligible. However, making this part of general FFTs is nontrivial to tune properly (if you want to pad and then "unpad" you have to do 3 FFTs, see http://math.stackexchange.com/questions/77118/non-power-of-2-ffts, and deciding when that's worth it could be a bit tricky). I think for now the best use of this is to allow people to choose their own sizes in advance of taking transforms, but giving much more freedom than power-of-2.

@carlobaldassi
Copy link
Member

Er, we do have a LIP solver actually :)

See this:

load("sparse.jl", "glpk.jl", "linprog.jl")
fft_get_f(x) = [log(x), -log(2), -log(3), -log(5)]
fft_power_opts = GLPIntoptParam()
fft_power_opts["msg_lev"] = GLP_MSG_ERR
fft_power_opts["presolve"] = GLP_ON
function fft_power(x)
    f = fft_get_f(x)
    lb = [1, zeros(3)]
    ub = [1,  Inf*ones(3)]
    (obj, sol, flag, ps_flag) = mixintprog(-f, f', [0.], [], [], lb, ub, nothing, fft_power_opts, nothing)
    if !(ps_flag == 0 && flag == 0)
        # never saw this while testing, but you never know
        error("failed") # resort to other algorithm?
    end
    shift(sol)
    return sol, prod([2, 3, 5] .^ sol) 
end

Test

julia> load("fft_power.jl")

julia> fft_power(37)
([3.0, 0.0, 1.0],40.0)

Timings seem good, too (but I don't have a comparison with the other algorithm):

julia> @time fft_power(124323)
elapsed time: 0.004158973693847656 seconds
([9.0, 5.0, 0.0],124416.0)

@timholy
Copy link
Member Author

timholy commented Jun 28, 2012

I was thinking we did have one...thanks Carlo!

But, for this case it's a lot slower:

julia> @time fft_power(124323)
elapsed time: 0.0071408748626708984 seconds
([9.0, 5.0, 0.0],124416.0)

julia> @time nextprod([2,3,5],124323)
elapsed time: 0.00017714500427246094 seconds
124416

(Note there's even an extra 0 in front, i.e., this is a factor of about 40.)

Of course, if the number of factors grows long enough, surely this would win. For example:

julia> @time nextprod([2,3,5,7,11,13,19],124323)
elapsed time: 0.041419029235839844 seconds
124416

Is it worth putting in a check and calling the IP solver in some cases? There would be issues because these modules are in extras/, and nextprod is currently in base.

@timholy
Copy link
Member Author

timholy commented Jun 28, 2012

Thinking it over: since the IP solver will sometimes return an approximate (rather than exact) result, to me it seems this is a case for having two different functions. Leave nextprod as is, and then if anyone wants the more general case they can (given Carlo's excellent example!) easily write nextprod_approx.

@carlobaldassi
Copy link
Member

I did some more testing and tweaking, and here's what I found:

  • the code for nextprod seems to be buggy: for example
julia> nextprod([2,3,5], 1) # argh
9223372036854775807

julia> nextprod([2,3,5], 2) # ok
2

julia> nextprod([2,3,5], 3) # wrong (strict inequality)
4

julia> nextprod([2,3,5], 4) # ok
4

julia> nextprod([2,3,5], 5) # wrong (strict inequality)
6

In general, I noticed that in some cases (but not all) it seems to solve a strict inequality.

  • I optimized the ILP code, and did some benchmarks. It seems that it starts to become faster than nextprod when the base size is 6 or more:
benchmark ps=[2, 3, 5] x=154935
  fft_powers_approx: 0.00037003588676452634 ± 3.073425170443325e-6
  nextprod         : 8.647608757019043e-5 ± 1.9569790447187893e-5
benchmark ps=[2, 3, 5, 7] x=154935
  fft_powers_approx: 0.0010482058525085448 ± 4.256198634599601e-6
  nextprod         : 0.0003054943084716797 ± 2.80192879604035e-5
benchmark ps=[2, 3, 5, 7, 11] x=154935
  fft_powers_approx: 0.0023717706203460693 ± 6.264722342297315e-6
  nextprod         : 0.0012734575271606445 ± 5.2676185964271294e-5
benchmark ps=[2, 3, 5, 7, 11, 13] x=154935
  fft_powers_approx: 0.004652169704437256 ± 8.003228776083257e-6
  nextprod         : 0.005933720350265503 ± 0.00011086647029907535
benchmark ps=[2, 3, 5, 7, 11, 13, 17] x=154935
  fft_powers_approx: 0.007533236265182495 ± 8.094710278355936e-6
  nextprod         : 0.032388877153396604 ± 0.00023269273845363258

I wanted to test whether the provided solution was exact, but I found the bug in nextprod instead (fft_powers was giving better results).

(All the code is here, including that for benchmarks and testing)

@timholy
Copy link
Member Author

timholy commented Jun 28, 2012

Argh, I had a < where it should have been <=. Carlo, thanks for catching these!

I tried running your test (a very good idea, now that we have two solutions for the problem) and got this:

julia> load("fft_test.jl")
in _jl_glpk__check_cols_ids: col_ids not defined
in _jl_glpk__check_cols_ids at /home/tim/src/julia/usr/bin/../lib/julia/extras/glpk.jl:511
in glp_set_mat_row at /home/tim/src/julia/usr/bin/../lib/julia/extras/glpk.jl:769
in fft_powers_approx at fft_powers.jl:29
in include at boot.jl:197
at fft_test.jl:3
in load at util.jl:255

Is there something else I need to load? Or am I not up-to-date on glp stuff?

FYI you can now say "require" to load packages just once (I merged the stat stuff).

@carlobaldassi
Copy link
Member

Ah, sorry, I just committed the fix for that bug. The glpk package is still in need of some deep testing.
BTW it's great to finally have "require"!

@timholy
Copy link
Member Author

timholy commented Jun 28, 2012

Well, we're testing both of our work. This is useful.

The ILP solver is very impressive. I get perfect agreement between the two up to 59,049. For 59,050, I get this:

julia> x = nextprod(ps, 59050)
59535

julia> y = fft_powers_approx(ps, 59050)
59049

Note that the value returned from fft_powers_approx is smaller than the input value. Might it be a roundoff error problem? My initial implementation of "nextpow(a, x)" suffered from that.

@carlobaldassi
Copy link
Member

A roundoff error seems likely. I'll investigate.

@timholy
Copy link
Member Author

timholy commented Jun 28, 2012

Because I was worried about yet more errors, I went through and worked out a couple of examples by hand. That was instructive. So here's a new algorithm---I'm raising the bar! This is getting out of control...

julia> @time nextprod(PRIMES[1:3],10^8+1)
elapsed time: 6.413459777832031e-5 seconds
100663296

julia> @time nextprod(PRIMES[1:4],10^8+1)
elapsed time: 0.00011897087097167969 seconds
100018800

julia> @time nextprod(PRIMES[1:20],10^8+1)
elapsed time: 0.025655031204223633 seconds
100000575

prevprod has not been updated yet.

@carlobaldassi
Copy link
Member

ahah great! This rules out completely the need for a linear programming approach. (BTW the errors in LP are indeed due to numerical approximation; I can easily fix that up to some point - and then the algorithm proves to be exact in practice - but it gets harder and harder to keep the precision in LP as the target increases, due to the use of logarithms, and 10^8+1 is completely out of reach in the current implementation, besides being quite slower than your last version.)

@timholy
Copy link
Member Author

timholy commented Jun 29, 2012

I'll implement prevprod along the same lines, then merge this.

I wouldn't have caught the bugs in this without your help. Many thanks to everyone who helped with this!

timholy added a commit that referenced this pull request Jun 29, 2012
Next and previous integer expressible as a product of specified factors

nextprod is useful for writing efficient FFT algorithms, to make sure your length is a product of small primes.
@timholy timholy merged commit 8e9fd6b into JuliaLang:master Jun 29, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants