Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient pyfftw implementation: order of magnitude speedups possible! #27

Closed
bmorris3 opened this issue Nov 24, 2016 · 6 comments · Fixed by #28
Closed

Inefficient pyfftw implementation: order of magnitude speedups possible! #27

bmorris3 opened this issue Nov 24, 2016 · 6 comments · Fixed by #28

Comments

@bmorris3
Copy link
Owner

bmorris3 commented Nov 24, 2016

@jkentwallace @LaurentRDC – I've begun doing some profiling, and I'm looking to see where the bottlenecks are. In this test I'm reconstructing the USAF test target example file at one propagation distance:

screen shot 2016-11-24 at 10 35 35 am

49% of the runtime is spent within the pyfftw module's Fourier transform, which confirms that the secondary optimizations like doing arithmetic on masked arrays are probably secondary in importance to figuring out how to do FFTs more efficiently. Reading up on pyfftw, it looks like there's a very specific way to uses it that allows for the speedups that I expected.

In simple comparison with scipy's FFT, my current implementation with pyfftw is almost an order of magnitude slower for reconstructing a single hologram – I'm going to need to spend some time understanding this issue. If you'd like to see how much quicker shampoo could be going simply by switching to scipy's FFT temporarily, comment out this block of imports and replace it with only the scipy import line.

The next slowest step is the phase unwrapping, which is being done in C wrapped in Python with this algorithm: reference, source code.

@bmorris3
Copy link
Owner Author

Here are some more updates on profiling.

pyfftw is only faster than scipy given certain caveats, which are related to how the input array is stored in memory. I was using one of the pyfftw APIs that didn't make these caveats obvious, so the advertised speedups over numpy and scipy weren't happening.

In the plot below, I show the runtime for reconstructing various numbers of holograms using different functions within pyfftw, and scipy's FFT for comparison.

profiling

The default in shampoo thus far has been FFTW.interface, which you can see is the slowest of the options I tested, which is why I noted above that we could get speedups by switching immediately to scipy. However, you can go faster by using a different module within FFTW (builders). This module also has an option that allows you to use multithreading to speed up your reconstructions.

Question for @jkentwallace and @LaurentRDC: is there any reason not to use one of these multithreading options by default? If not, I'll work on migrating shampoo's fft methods to use the fast multithreaded pyfftw module.

@LaurentRDC
Copy link

Since FFTW is written in C, multithreading should yield a performance boost (unlike Python). I don't see why not :)

In the GUI, I'm reconstructing on a separate core right now, and it would be easy to extend that to many cores. Since each core has more than one thread, it would be a good idea to implement multithreading now.

@jkentwallace
Copy link

jkentwallace commented Nov 25, 2016 via email

@bmorris3
Copy link
Owner Author

I'm thinking I'll make pyfftw a mandatory dependency at this point, since we've quantified its performance now, and its the best option. I was concerned at first that the fftw library installation could be challenging on some platforms, but it seems like no one has had a problem with it yet, so I'm ready to lock in. Any objections?

@LaurentRDC
Copy link

Anyone installing Anaconda is equipped with pyFFTW, so there's at least one easy way to get it. Through conda there's no need for a C compiler, so I think making it a dependency is reasonable.

@jkentwallace
Copy link

Laurent and Brett, I've spent some time yesterday refreshing my memory on some discrete Fourier Transform calculations, and I've got a few results to show. This method allows you to reconstruct only that part of the fourier plane you're interested in. It also has benefits for reducing the effect of aliasing - which we really haven't talked about. The current bandaid we use is apodization, but this results in reduced resolution. I'll provide an example of this for Monday's meeting - and we can discuss...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

3 participants