Inefficient pyfftw implementation: order of magnitude speedups possible! #27

bmorris3 · 2016-11-24T19:13:03Z

@jkentwallace @LaurentRDC – I've begun doing some profiling, and I'm looking to see where the bottlenecks are. In this test I'm reconstructing the USAF test target example file at one propagation distance:

49% of the runtime is spent within the pyfftw module's Fourier transform, which confirms that the secondary optimizations like doing arithmetic on masked arrays are probably secondary in importance to figuring out how to do FFTs more efficiently. Reading up on pyfftw, it looks like there's a very specific way to uses it that allows for the speedups that I expected.

In simple comparison with scipy's FFT, my current implementation with pyfftw is almost an order of magnitude slower for reconstructing a single hologram – I'm going to need to spend some time understanding this issue. If you'd like to see how much quicker shampoo could be going simply by switching to scipy's FFT temporarily, comment out this block of imports and replace it with only the scipy import line.

The next slowest step is the phase unwrapping, which is being done in C wrapped in Python with this algorithm: reference, source code.

The text was updated successfully, but these errors were encountered:

bmorris3 · 2016-11-24T21:10:33Z

Here are some more updates on profiling.

pyfftw is only faster than scipy given certain caveats, which are related to how the input array is stored in memory. I was using one of the pyfftw APIs that didn't make these caveats obvious, so the advertised speedups over numpy and scipy weren't happening.

In the plot below, I show the runtime for reconstructing various numbers of holograms using different functions within pyfftw, and scipy's FFT for comparison.

The default in shampoo thus far has been FFTW.interface, which you can see is the slowest of the options I tested, which is why I noted above that we could get speedups by switching immediately to scipy. However, you can go faster by using a different module within FFTW (builders). This module also has an option that allows you to use multithreading to speed up your reconstructions.

Question for @jkentwallace and @LaurentRDC: is there any reason not to use one of these multithreading options by default? If not, I'll work on migrating shampoo's fft methods to use the fast multithreaded pyfftw module.

LaurentRDC · 2016-11-24T21:52:52Z

Since FFTW is written in C, multithreading should yield a performance boost (unlike Python). I don't see why not :)

In the GUI, I'm reconstructing on a separate core right now, and it would be easy to extend that to many cores. Since each core has more than one thread, it would be a good idea to implement multithreading now.

jkentwallace · 2016-11-25T05:15:54Z

Gents, These are great suggestions. I would say full steam ahead. Also, I've mentioned this in passing before, but I think there is a discreet Fourier transform method that can also improve speed. I'll futz with it, and try to provide a good example. Just wanna say it feels like we're getting there guys - thanks to your hard work.

…

-Kent

On Thu, Nov 24, 2016 at 1:52 PM Laurent P. René de Cotret < ***@***.***> wrote: Since FFTW is written in C, multithreading should yield a performance boost (unlike Python). I don't see why not :) In the GUI, I'm reconstructing on a separate core right now, and it would be easy to extend that to many cores. Since each core has more than one thread, it would be a good idea to implement multithreading now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWCJ2e2I9oLCHr2Gu1h5ZRqudz5wmWqoks5rBgc0gaJpZM4K76jw> .

bmorris3 · 2016-11-25T19:08:12Z

I'm thinking I'll make pyfftw a mandatory dependency at this point, since we've quantified its performance now, and its the best option. I was concerned at first that the fftw library installation could be challenging on some platforms, but it seems like no one has had a problem with it yet, so I'm ready to lock in. Any objections?

LaurentRDC · 2016-11-25T21:50:41Z

Anyone installing Anaconda is equipped with pyFFTW, so there's at least one easy way to get it. Through conda there's no need for a C compiler, so I think making it a dependency is reasonable.

jkentwallace · 2016-11-26T17:49:27Z

Laurent and Brett, I've spent some time yesterday refreshing my memory on some discrete Fourier Transform calculations, and I've got a few results to show. This method allows you to reconstruct only that part of the fourier plane you're interested in. It also has benefits for reducing the effect of aliasing - which we really haven't talked about. The current bandaid we use is apodization, but this results in reduced resolution. I'll provide an example of this for Monday's meeting - and we can discuss...

bmorris3 added the efficiency label Nov 24, 2016

bmorris3 mentioned this issue Nov 24, 2016

Raise warning if pyfftw not installed #18

Open

bmorris3 mentioned this issue Nov 25, 2016

Switching to pyfftw only #28

Merged

bmorris3 closed this as completed in #28 Nov 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient pyfftw implementation: order of magnitude speedups possible! #27

Inefficient pyfftw implementation: order of magnitude speedups possible! #27

bmorris3 commented Nov 24, 2016 •

edited

Loading

bmorris3 commented Nov 24, 2016

LaurentRDC commented Nov 24, 2016

jkentwallace commented Nov 25, 2016 via email

bmorris3 commented Nov 25, 2016

LaurentRDC commented Nov 25, 2016

jkentwallace commented Nov 26, 2016

Inefficient pyfftw implementation: order of magnitude speedups possible! #27

Inefficient pyfftw implementation: order of magnitude speedups possible! #27

Comments

bmorris3 commented Nov 24, 2016 • edited Loading

bmorris3 commented Nov 24, 2016

LaurentRDC commented Nov 24, 2016

jkentwallace commented Nov 25, 2016 via email

bmorris3 commented Nov 25, 2016

LaurentRDC commented Nov 25, 2016

jkentwallace commented Nov 26, 2016

bmorris3 commented Nov 24, 2016 •

edited

Loading