New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: new DFT api #6193

Closed
wants to merge 12 commits into
base: master
from

Conversation

Projects
None yet
@stevengj
Member

stevengj commented Mar 18, 2014

This is not quite ready to merge yet, but I wanted to bring it to your attention to see what you think. It is a rewritten DFT API that provides:

  • p = plan_fft(x) (and similar) now returns a subtype of Base.DFT.Plan. It acts like a linear operator: you can apply it to an existing array with p * x, and apply it to a preallocated output array with A_mul_B!(y, p, x). You can apply the inverse plan with p \ x or inv(p) * x.
  • It is easy to add new FFT algorithm implementations for specific numeric and array types...

Partly since we have discussed moving FFTW to a module at some point for licensing reasons, and partly to support numeric types like BigFloat, the patch also includes a pure-Julia FFT implementation, with the following features.

  • One-dimensional transforms of arbitrary AbstractVectors of arbitrary Complex types (including BigFloat), with fast algorithms for all sizes (including prime sizes). Multidimensional transforms of arbitrary StridedArray subtypes.
  • Uses code generation (a baby version of the FFTW code generator) in Julia to generate kernels for use as base cases of the transforms; algorithms in general are FFTW1-like.
  • Performance is reasonable. For moderate-power-of-two sizes, it is within a factor of 2 or so of FFTW, with the difference mainly due to the lack of SIMD. For non-power-of-two sizes, the performance is several times worse, but I suspect that there may be some fixable compiler problem there (a good stress test for the compiler). (Non-power-of-two sizes seem fine with latest Julia). (Now they suck again.)
  • import FFTW is enough to automatically switch over to the FFTW algorithms for supported types.

Major to-do items:

  • Documentation
  • More test cases (implemented a self-test algorithm by Funda Ergün)

Minor to-do items (will eventually be needed to move the FFTW code out to an external module, but should not block merge):

  • Native Julia rfft support.
  • Native Julia in-place transforms. (The 1d FFT algorithm here is out-of-place, but it could act "in-place" via a buffer. The multi-dim FFTs only need a buffer of length maximum(size(x)).)
  • Performance improvements in non-power-of-two transforms — these underwent a major performance regression at some point (possibly due to the inlining heuristics changing); the right thing is ultimately to change the generator to spit out kernels in terms of real rather than complex arithmetic (or just to use FFTW's generator).

cc: @JeffBezanson, @timholy

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Mar 18, 2014

Contributor

I know that we have discussed it in #1805 but still it would be nice if the plan would also allow for p \ x in order to perform the inverse fft. Would it be a large overhead to pair the forward and backward fftw plan in order to allow this?

Contributor

tknopp commented Mar 18, 2014

I know that we have discussed it in #1805 but still it would be nice if the plan would also allow for p \ x in order to perform the inverse fft. Would it be a large overhead to pair the forward and backward fftw plan in order to allow this?

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Mar 18, 2014

Member

This is quite amazing. Having native code is particularly amazing. For multidimensional transforms, what would be needed to have this easily interact with SharedArrays, so we can parallelize fft computations?

One other tiny question: the word "dimensions" is used somewhat inconsistently in the standard library. Sometimes it refers to the size of an array (in the same way that you might talk about the dimensions of a room in your house), and sometimes it refers to the indices of particular coordinates (i.e., the 2nd dimension). For what you typically mean by dims, the reduction code uses the term region, which I'm not sure is any better. Any thoughts on a consistent terminology?

Member

timholy commented Mar 18, 2014

This is quite amazing. Having native code is particularly amazing. For multidimensional transforms, what would be needed to have this easily interact with SharedArrays, so we can parallelize fft computations?

One other tiny question: the word "dimensions" is used somewhat inconsistently in the standard library. Sometimes it refers to the size of an array (in the same way that you might talk about the dimensions of a room in your house), and sometimes it refers to the indices of particular coordinates (i.e., the 2nd dimension). For what you typically mean by dims, the reduction code uses the term region, which I'm not sure is any better. Any thoughts on a consistent terminology?

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Mar 18, 2014

Contributor

What about calling it sizes as size() gives us the "dimensions" of ordinary arrays. I also like shape from numpy but as we already inherite size from matlab we should not mix this up.

When it comes to selecting one particular "dimension" one could either use dir for direction or dim

Contributor

tknopp commented Mar 18, 2014

What about calling it sizes as size() gives us the "dimensions" of ordinary arrays. I also like shape from numpy but as we already inherite size from matlab we should not mix this up.

When it comes to selecting one particular "dimension" one could either use dir for direction or dim

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Mar 18, 2014

Member

Like that suggestion; but since the same issue just came up on julia-users and I'll feel guilty if this PR gets hijacked by a terminology bikeshed, let's move this discussion elsewhere.

https://groups.google.com/forum/?fromgroups=#!topic/julia-users/VA5rtWlOdhk

Member

timholy commented Mar 18, 2014

Like that suggestion; but since the same issue just came up on julia-users and I'll feel guilty if this PR gets hijacked by a terminology bikeshed, let's move this discussion elsewhere.

https://groups.google.com/forum/?fromgroups=#!topic/julia-users/VA5rtWlOdhk

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Mar 18, 2014

Contributor

Yes indeed. It is really cool to get a native Julia implementation of the FFT. Even better that this is done by someone that already has some experience with FFTs ... ;-)

Contributor

tknopp commented Mar 18, 2014

Yes indeed. It is really cool to get a native Julia implementation of the FFT. Even better that this is done by someone that already has some experience with FFTs ... ;-)

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Mar 18, 2014

Member

@tknopp, I see two options. One would be to store both the forward and backward plans in p. This would double the plan-creation time in FFTW, which wouldn't be desirable. The other would be to implement p \ x via conj(p * conj(x))/length(x) or similar, which would work but would add overhead.

Member

stevengj commented Mar 18, 2014

@tknopp, I see two options. One would be to store both the forward and backward plans in p. This would double the plan-creation time in FFTW, which wouldn't be desirable. The other would be to implement p \ x via conj(p * conj(x))/length(x) or similar, which would work but would add overhead.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Mar 18, 2014

Contributor

Or solution 3. Have some extra plan that supports both directions and maybe have some keyword for plan_fft that gives this plan on request (or an extra function).

Contributor

tknopp commented Mar 18, 2014

Or solution 3. Have some extra plan that supports both directions and maybe have some keyword for plan_fft that gives this plan on request (or an extra function).

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Mar 18, 2014

Member

Having another plan type seems messy. Maybe the best solution is to have p\x or inv(p) compute the inverse plan lazily the first time it is needed, maybe caching it in p.

Member

stevengj commented Mar 18, 2014

Having another plan type seems messy. Maybe the best solution is to have p\x or inv(p) compute the inverse plan lazily the first time it is needed, maybe caching it in p.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Mar 18, 2014

Contributor

Well, this would be another option. This could be generalized in a way that one can specify (as keyword argument) which of the two plans one wants to be precomputed on initialization.

Contributor

tknopp commented Mar 18, 2014

Well, this would be another option. This could be generalized in a way that one can specify (as keyword argument) which of the two plans one wants to be precomputed on initialization.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Mar 18, 2014

Member

OK, now I have an honest-to-goodness engineering question for you. I'm aware that what I'm about to ask makes it harder to cope with my other question about SharedArrays.

I notice that with FFTW, transforming along dimension 2 is substantially slower than transforming along dimension 1 (all times after suitable warmup):

A = rand(1024,1024);
p1 = plan_fft(A, 1);
p2 = plan_fft(A, 2);

julia> @time p1(A);
elapsed time: 0.019999711 seconds (16778600 bytes allocated)

julia> @time p2(A);
elapsed time: 0.070864814 seconds (16778600 bytes allocated)

Presumably this is because of cache. In Images, I've taken @lindahua's advice to heart, and implemented operations on dimensions higher than 1 in terms of interwoven operations along dimension 1. For example, with imfilter_gaussian one gets

julia> copy!(Acopy, A); @time Images._imfilter_gaussian!(Acopy, [3,0]);
elapsed time: 0.028588639 seconds (27768 bytes allocated)

julia> copy!(Acopy, A); @time Images._imfilter_gaussian!(Acopy, [0,3]);
elapsed time: 0.018678493 seconds (52264 bytes allocated)

Slightly faster along dimension 2 than 1! An even more dramatic example is restrict: in Grid I implemented this using BLAS's blazingly-fast axpy! (which must be using SIMD and other fancy tricks), but with more experience I began to wonder whether the need for multiple passes through the data might outweigh its advantages. So in Images I recently tested a new pure-Julia version. Results:

julia> @time Grid.restrict(A, 1, 0.5);
elapsed time: 0.009557864 seconds (4469584 bytes allocated)

julia> @time Grid.restrict(A, 2, 0.5);
elapsed time: 0.040201774 seconds (4461440 bytes allocated)

julia> @time Images.restrict(A, (1,));
elapsed time: 0.008084057 seconds (4206936 bytes allocated)

julia> @time Images.restrict(A, (2,));
elapsed time: 0.010128509 seconds (4206936 bytes allocated)

Notice that the 4-fold difference with Grid's version is about the same as for FFTW. I'm not sure the FFT can be written so easily in terms of interwoven operations, but given the crucial importance of this algorithm in modern computing I thought it might be worth asking. No better time than when you're in the middle of re-implementing it from scratch.

Member

timholy commented Mar 18, 2014

OK, now I have an honest-to-goodness engineering question for you. I'm aware that what I'm about to ask makes it harder to cope with my other question about SharedArrays.

I notice that with FFTW, transforming along dimension 2 is substantially slower than transforming along dimension 1 (all times after suitable warmup):

A = rand(1024,1024);
p1 = plan_fft(A, 1);
p2 = plan_fft(A, 2);

julia> @time p1(A);
elapsed time: 0.019999711 seconds (16778600 bytes allocated)

julia> @time p2(A);
elapsed time: 0.070864814 seconds (16778600 bytes allocated)

Presumably this is because of cache. In Images, I've taken @lindahua's advice to heart, and implemented operations on dimensions higher than 1 in terms of interwoven operations along dimension 1. For example, with imfilter_gaussian one gets

julia> copy!(Acopy, A); @time Images._imfilter_gaussian!(Acopy, [3,0]);
elapsed time: 0.028588639 seconds (27768 bytes allocated)

julia> copy!(Acopy, A); @time Images._imfilter_gaussian!(Acopy, [0,3]);
elapsed time: 0.018678493 seconds (52264 bytes allocated)

Slightly faster along dimension 2 than 1! An even more dramatic example is restrict: in Grid I implemented this using BLAS's blazingly-fast axpy! (which must be using SIMD and other fancy tricks), but with more experience I began to wonder whether the need for multiple passes through the data might outweigh its advantages. So in Images I recently tested a new pure-Julia version. Results:

julia> @time Grid.restrict(A, 1, 0.5);
elapsed time: 0.009557864 seconds (4469584 bytes allocated)

julia> @time Grid.restrict(A, 2, 0.5);
elapsed time: 0.040201774 seconds (4461440 bytes allocated)

julia> @time Images.restrict(A, (1,));
elapsed time: 0.008084057 seconds (4206936 bytes allocated)

julia> @time Images.restrict(A, (2,));
elapsed time: 0.010128509 seconds (4206936 bytes allocated)

Notice that the 4-fold difference with Grid's version is about the same as for FFTW. I'm not sure the FFT can be written so easily in terms of interwoven operations, but given the crucial importance of this algorithm in modern computing I thought it might be worth asking. No better time than when you're in the middle of re-implementing it from scratch.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Mar 18, 2014

Member

FFTW does implement a variety of strategies for "interweaving" the transforms along the discontiguous direction (see our Proc. IEEE paper). You should notice a much smaller performance difference if you use the FFTW.PATIENT flag when creating the plan. But probably you should file an FFTW issue if you want to discuss this further.

Member

stevengj commented Mar 18, 2014

FFTW does implement a variety of strategies for "interweaving" the transforms along the discontiguous direction (see our Proc. IEEE paper). You should notice a much smaller performance difference if you use the FFTW.PATIENT flag when creating the plan. But probably you should file an FFTW issue if you want to discuss this further.

@simonbyrne

This comment has been minimized.

Show comment
Hide comment
@simonbyrne

simonbyrne Mar 18, 2014

Contributor

This is hugely impressive, very nice.

Contributor

simonbyrne commented Mar 18, 2014

This is hugely impressive, very nice.

@simonster

This comment has been minimized.

Show comment
Hide comment
@simonster

simonster Mar 18, 2014

Member

This looks pretty awesome. One thing I've wondered about: would it be reasonable to use an LRU cache for FFTW plan objects created by fft and friends, or even to keep the plans that have been created but not yet garbage collected in a WeakKeyDict and use them when possible? It seems that, even with an existing identical plan that hasn't been destroyed yet, there is still significant overhead for creating a new plan (using ESTIMATE), and this overhead is typically greater than the time to perform the FFT. I've written a decent amount of code that could be a single function but needs to be a type in order to preserve the plan between invocations and thus avoid incurring the overhead of plan creation.

Member

simonster commented Mar 18, 2014

This looks pretty awesome. One thing I've wondered about: would it be reasonable to use an LRU cache for FFTW plan objects created by fft and friends, or even to keep the plans that have been created but not yet garbage collected in a WeakKeyDict and use them when possible? It seems that, even with an existing identical plan that hasn't been destroyed yet, there is still significant overhead for creating a new plan (using ESTIMATE), and this overhead is typically greater than the time to perform the FFT. I've written a decent amount of code that could be a single function but needs to be a type in order to preserve the plan between invocations and thus avoid incurring the overhead of plan creation.

@simonster simonster referenced this pull request Mar 18, 2014

Merged

added spectrogram #23

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Mar 18, 2014

Member

@simonster, that's a good idea, but I'd prefer to do that in a generic way that is not restricted to FFTW (i.e. a cache of Plans, not just FFTWPlans). Some care would be required in figuring out what to key this cache on, though.

However, I'd prefer to put that functionality, assuming it can be implemented cleanly, into a separate PR following this one, since there's a lot going on in this PR already. Ideally, this sort of cache would be completely transparent and wouldn't affect the public API.

Member

stevengj commented Mar 18, 2014

@simonster, that's a good idea, but I'd prefer to do that in a generic way that is not restricted to FFTW (i.e. a cache of Plans, not just FFTWPlans). Some care would be required in figuring out what to key this cache on, though.

However, I'd prefer to put that functionality, assuming it can be implemented cleanly, into a separate PR following this one, since there's a lot going on in this PR already. Ideally, this sort of cache would be completely transparent and wouldn't affect the public API.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Mar 18, 2014

Member

@timholy, in principle it is straightforward to get shared-memory parallelism in an FFT, since many of the loops can be executed in parallel, though getting good cache performance is hard. You can already use FFTW's shared-memory parallelism on any StridedArray. Shared-memory parallelism for the native Julia FFT is probably something for a later patch, though.

Member

stevengj commented Mar 18, 2014

@timholy, in principle it is straightforward to get shared-memory parallelism in an FFT, since many of the loops can be executed in parallel, though getting good cache performance is hard. You can already use FFTW's shared-memory parallelism on any StridedArray. Shared-memory parallelism for the native Julia FFT is probably something for a later patch, though.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Mar 18, 2014

Member

@tknopp, the latest version of the patch now supports p \ x and inv(p). The inverse plan is computed once, the first time it is needed, and is cached in the plan thereafter.

Member

stevengj commented Mar 18, 2014

@tknopp, the latest version of the patch now supports p \ x and inv(p). The inverse plan is computed once, the first time it is needed, and is cached in the plan thereafter.

@tknopp

This comment has been minimized.

Show comment
Hide comment
@tknopp

tknopp Mar 18, 2014

Contributor

great, thanks!

Contributor

tknopp commented Mar 18, 2014

great, thanks!

Show outdated Hide outdated base/dft.jl Outdated
Show outdated Hide outdated base/dft.jl Outdated
@jakebolewski

This comment has been minimized.

Show comment
Hide comment
@jakebolewski

jakebolewski Mar 18, 2014

Member

@stevengj this is really nice. I love the new Plan api, it will make it much easier to support the same api for GPU accelerated fft's in CLFFT.jl and I suspect cuda's fft library.

Member

jakebolewski commented Mar 18, 2014

@stevengj this is really nice. I love the new Plan api, it will make it much easier to support the same api for GPU accelerated fft's in CLFFT.jl and I suspect cuda's fft library.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Apr 17, 2014

Member

Just rebased, and it looks like the performance problems with non-power-of-two sizes have disappeared: if I disable SIMD in FFTW (via the FFTW.NO_SIMD flag), the performance of the pure-Julia version is within a factor of two of FFTW for moderate non-power-of-two composite sizes.

I'm guessing the recent inlining improvements get the credit here, but I would have to bisect to be sure.

Member

stevengj commented Apr 17, 2014

Just rebased, and it looks like the performance problems with non-power-of-two sizes have disappeared: if I disable SIMD in FFTW (via the FFTW.NO_SIMD flag), the performance of the pure-Julia version is within a factor of two of FFTW for moderate non-power-of-two composite sizes.

I'm guessing the recent inlining improvements get the credit here, but I would have to bisect to be sure.

@jtravs

This comment has been minimized.

Show comment
Hide comment
@jtravs

jtravs Apr 17, 2014

Contributor

This is really exciting! Have you tried whether @simd helps close the last gap in performance? (I've had mixed results with it so far).

Contributor

jtravs commented Apr 17, 2014

This is really exciting! Have you tried whether @simd helps close the last gap in performance? (I've had mixed results with it so far).

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Apr 17, 2014

Member

@jtravs, it looks like @simd makes only a tiny difference (in double precision), which is disappointing since I was hoping it would help with Complex arithmetic.

Member

stevengj commented Apr 17, 2014

@jtravs, it looks like @simd makes only a tiny difference (in double precision), which is disappointing since I was hoping it would help with Complex arithmetic.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Apr 18, 2014

Member

I'm seeing some weirdness in the FFTW linkage now that I've updgraded deps to FFTW 3.3.4; it seems to be getting confused by the FFTW 3.3.3 installed by Homebrew in /usr/local/lib.

In particular, I use

convert(VersionNumber, split(bytestring(cglobal((:fftw_version,Base.DFT.FFTW.libfftw), Uint8)), "-")[2])

to get the FFTW version number. When this command runs during the Julia build in order to initialize the const version in the FFTW module, it returns 3.3.4 as expected. But if I execute the same command at runtime from the REPL, I get 3.3.3. So, Julia is linking different versions of the library at different points in its build?

@staticfloat, could this be similar to the PCRE woes in #1331 and #3838? How can I fix it so that Julia uses its own version of FFTW consistently, regardless of what is installed in /usr/local/lib?

Member

stevengj commented Apr 18, 2014

I'm seeing some weirdness in the FFTW linkage now that I've updgraded deps to FFTW 3.3.4; it seems to be getting confused by the FFTW 3.3.3 installed by Homebrew in /usr/local/lib.

In particular, I use

convert(VersionNumber, split(bytestring(cglobal((:fftw_version,Base.DFT.FFTW.libfftw), Uint8)), "-")[2])

to get the FFTW version number. When this command runs during the Julia build in order to initialize the const version in the FFTW module, it returns 3.3.4 as expected. But if I execute the same command at runtime from the REPL, I get 3.3.3. So, Julia is linking different versions of the library at different points in its build?

@staticfloat, could this be similar to the PCRE woes in #1331 and #3838? How can I fix it so that Julia uses its own version of FFTW consistently, regardless of what is installed in /usr/local/lib?

Show outdated Hide outdated base/fft/FFTW.jl Outdated
Show outdated Hide outdated base/fft/dct.jl Outdated
@ScottPJones

This comment has been minimized.

Show comment
Hide comment
@ScottPJones

ScottPJones May 31, 2015

Contributor

@stevengj The fifth item, about performance, should be checked off though (like the lined-out comment)?
I do hope this gets into 0.4, I do like the design, and the FFT stuff that it does have might be enough for us (here in Belgium) already without the GPL FFTW. 👍

Contributor

ScottPJones commented May 31, 2015

@stevengj The fifth item, about performance, should be checked off though (like the lined-out comment)?
I do hope this gets into 0.4, I do like the design, and the FFT stuff that it does have might be enough for us (here in Belgium) already without the GPL FFTW. 👍

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj May 31, 2015

Member

@ScottPJones, the performance isn't crucial for merging as long as we still have FFTW for all of the common types, which is why I thought it didn't need to be checked off for 0.4. But it would be nice to figure out why the speed keeps oscillating over time.

Member

stevengj commented May 31, 2015

@ScottPJones, the performance isn't crucial for merging as long as we still have FFTW for all of the common types, which is why I thought it didn't need to be checked off for 0.4. But it would be nice to figure out why the speed keeps oscillating over time.

@simonster

This comment has been minimized.

Show comment
Hide comment
@simonster

simonster May 31, 2015

Member

@stevengj Since #11486 was merged, I think you'll also have to switch the argument order of ntuple in this PR to avoid deprecation warnings.

Member

simonster commented May 31, 2015

@stevengj Since #11486 was merged, I think you'll also have to switch the argument order of ntuple in this PR to avoid deprecation warnings.

@ScottPJones

This comment has been minimized.

Show comment
Hide comment
@ScottPJones

ScottPJones May 31, 2015

Contributor

Finally back! @stevengj Sorry, I didn't make myself clear, I meant that it looked like the performance issue mentioned in the top comment had been taken care of, and that somebody had struck out the text about it, but had simply forgotten to check off the box... I didn't mean at all that this PR should be held up for any reason except for any remaining things like deprecation warnings. It looks like a very worthwhile improvement to the code... I like that it uses a more flexible framework... and for people who can't use FFTW, the decent performance it has is still infinitely better than NO performance at all.

Contributor

ScottPJones commented May 31, 2015

Finally back! @stevengj Sorry, I didn't make myself clear, I meant that it looked like the performance issue mentioned in the top comment had been taken care of, and that somebody had struck out the text about it, but had simply forgotten to check off the box... I didn't mean at all that this PR should be held up for any reason except for any remaining things like deprecation warnings. It looks like a very worthwhile improvement to the code... I like that it uses a more flexible framework... and for people who can't use FFTW, the decent performance it has is still infinitely better than NO performance at all.

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jun 1, 2015

Member

Okay, changed the ntuple usage.

Member

stevengj commented Jun 1, 2015

Okay, changed the ntuple usage.

@jtravs

This comment has been minimized.

Show comment
Hide comment
@jtravs

jtravs Jun 7, 2015

Contributor

I'd just like to register a vote here for merging this soon, even if performance and other issues need to be resolved later. I'm currently having to merge this pull request into a local julia repository and it is a bit of a pain TBH.

Contributor

jtravs commented Jun 7, 2015

I'd just like to register a vote here for merging this soon, even if performance and other issues need to be resolved later. I'm currently having to merge this pull request into a local julia repository and it is a bit of a pain TBH.

@simonster

This comment has been minimized.

Show comment
Hide comment
@simonster

simonster Jun 7, 2015

Member

Yes, +1 for merging

Member

simonster commented Jun 7, 2015

Yes, +1 for merging

@jtravs

This comment has been minimized.

Show comment
Hide comment
@jtravs

jtravs Jun 7, 2015

Contributor

As an additional comment, is there any chance of getting a nicer syntax for preallocated output than A_mul_B!(y, p, x)? I really like the y = P*x syntax (and the style of this PR in general), but I really need fast, preallocated, real ffts in my code and A_mul_B!(y, p, x) isn't very clean or clear. I guess I could write a macro... I also realize that this is a general issue for inplace operations, not just the DFT.

Contributor

jtravs commented Jun 7, 2015

As an additional comment, is there any chance of getting a nicer syntax for preallocated output than A_mul_B!(y, p, x)? I really like the y = P*x syntax (and the style of this PR in general), but I really need fast, preallocated, real ffts in my code and A_mul_B!(y, p, x) isn't very clean or clear. I guess I could write a macro... I also realize that this is a general issue for inplace operations, not just the DFT.

@nalimilan

This comment has been minimized.

Show comment
Hide comment
@nalimilan

nalimilan Jun 7, 2015

Contributor

@jtravs Yes, see #249.

Contributor

nalimilan commented Jun 7, 2015

@jtravs Yes, see #249.

@simonbyrne

This comment has been minimized.

Show comment
Hide comment
@simonbyrne

simonbyrne Jun 7, 2015

Contributor

It should also be possible to make InplaceOps.jl work for this.

Contributor

simonbyrne commented Jun 7, 2015

It should also be possible to make InplaceOps.jl work for this.

@stevengj stevengj referenced this pull request Jul 9, 2015

Closed

new DFT api #12087

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jul 20, 2015

Member

Closing this as the new DFT api has been merged separately, and now a separate patch with the pure-Julia FFTs is needed.

Member

stevengj commented Jul 20, 2015

Closing this as the new DFT api has been merged separately, and now a separate patch with the pure-Julia FFTs is needed.

@stevengj stevengj closed this Jul 20, 2015

@yuyichao

This comment has been minimized.

Show comment
Hide comment
@yuyichao

yuyichao Jul 20, 2015

Member

I've rebased this branch, should I push it out?

Member

yuyichao commented Jul 20, 2015

I've rebased this branch, should I push it out?

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Jul 20, 2015

Member

Please do, thanks @yuyichao.

Member

stevengj commented Jul 20, 2015

Please do, thanks @yuyichao.

@yuyichao

This comment has been minimized.

Show comment
Hide comment
@yuyichao

yuyichao Jul 20, 2015

Member

The rebase is here. I simply rebased it and checked all the conflicts.

I didn't check the compatibility with the current master of the non-conflict parts and I also didn't squash any of the non-empty commits.

Edit: the branch is on top of the current master now (after my new DFT API tweak for type inference.)

Member

yuyichao commented Jul 20, 2015

The rebase is here. I simply rebased it and checked all the conflicts.

I didn't check the compatibility with the current master of the non-conflict parts and I also didn't squash any of the non-empty commits.

Edit: the branch is on top of the current master now (after my new DFT API tweak for type inference.)

@hayd

This comment has been minimized.

Show comment
Hide comment
@hayd

hayd Sep 19, 2015

Member

@yuyichao Bump! Is this going to be a PR (now that 0.4 is branched) ? :)

Member

hayd commented Sep 19, 2015

@yuyichao Bump! Is this going to be a PR (now that 0.4 is branched) ? :)

@yuyichao

This comment has been minimized.

Show comment
Hide comment
@yuyichao

yuyichao Sep 19, 2015

Member

I don't think I'm familiar with this enough to do the PR. I rebased the branch to make sure the part I want to get into 0.4 doesn't mess up the rest of this PR too much.

CC. @stevengj

Member

yuyichao commented Sep 19, 2015

I don't think I'm familiar with this enough to do the PR. I rebased the branch to make sure the part I want to get into 0.4 doesn't mess up the rest of this PR too much.

CC. @stevengj

@stevengj

This comment has been minimized.

Show comment
Hide comment
@stevengj

stevengj Sep 22, 2015

Member

It's not a priority for me at the moment, but it is certainly something that could be rebased onto 0.5 if needed.

Member

stevengj commented Sep 22, 2015

It's not a priority for me at the moment, but it is certainly something that could be rebased onto 0.5 if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment