Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: compiling with C99 support (non-MSVC only) #300

Merged
merged 10 commits into from
Dec 3, 2017

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Mar 10, 2017

This PR improves performance in filtering complex-valued data by extending the C-level convolution routines to support C99 float complex and double complex. Typical speedup factors observed are in the range from 1.5-2.5.

A drawback is some additional boilerplate on the C/Cython side to support the complex types. I work primarily with complex-valued data, so it is worth it to me, but would be interested in hearing from others if it is desirable for PyWavelets as a whole.

I initially thought this would allow simplifying the python-side code by being able to remove all the special np.iscomplexobj checks (currently used to see if both real and imaginary components need to be processed independently). Unfortunately, due to lack of C99 complex in MSVC, this was not possible. Instead there is a new variable _have_c99_complex which can be imported from
pywt._c99_config that indicates whether C99 support is present in the current build. I have tested locally with both gcc/linux and clang/OS X.

There are many files modified, but conceptually it is fairly simple. I have outlined the basic approach below.

setup.py

defaults to enable C99 support only if os.name == 'posix'. The user can force C99 compilation by defining the environment variable USE_C99_COMPLEX (i.e. may be useful for a mingw build on Windows?).

When C99 is enabled, it sets the C flag HAVE_C99_CPLX as well as setting the Cython macro CYTHON_CCOMPLEX to 1. The Cython flag allows Cython to use C99 instead of rolling its own complex support.

setup.py creates the following two files related to C99 settings:

pywt/_extensions/config.pxi # defines HAVE_C99_CPLX as 1 or 0 for Cython
pywt/_c99_config.py # defines _have_c99_complex as 1 or 0 for Python

For the C code

Templates now use two separate defined values: TYPE and REAL_TYPE

REAL_TYPE is used for the filters and is either float or double

TYPE is for the data/coefficients and can be: float, double, float complex, double complex

when C99 is enabled, complex versions of the functions are generated as well
e.g.:

#ifdef HAVE_C99_COMPLEX
    #define TYPE float_complex
    #define REAL_TYPE float
    #include "convolution.template.h"
    #undef REAL_TYPE
    #undef TYPE

    #define TYPE double_complex
    #define REAL_TYPE double
    #include "convolution.template.h"
    #undef REAL_TYPE
    #undef TYPE
#endif

For the Cython code

in _pywt.pxd if HAVE_C99_COMPLEX is defined, a new fused type is defined as:

IF HAVE_C99_CPLX:
    ctypedef fused cdata_t:
        np.float32_t
        np.float64_t
        np.complex64_t
        np.complex128_t
    have_c99_complex = 1
ELSE:
    ctypedef data_t cdata_t
    have_c99_complex = 0

On the Cython side the data/output use the fused type cdata_t while the filters use the type data_t. When c99 is not enabled they are the same.

For the Python code

The Python code imports _have_c99_complex from pywt._c99_config to determine whether it needs to process the real and imaginary components separately. When C99 is enabled all np.iscomplexobj special cases can be bypassed.

Benchmarking Results

I ran the example from demo/batch_processing.py, but modify the data, cam, to be complex as:

cam = cam + 1j * cam.T
Timing on a 4-core macbook:
    with c99 complex disabled:
        Processing 32 images of shape (512, 512)

        Sequential Case
            Elapsed time: 1100.39 ms

        Multithreaded Case
            Number of concurrent workers: 8
            Elapsed time: 288.02 ms

    with c99 complex enabled:
        Processing 32 images of shape (512, 512)

        Sequential Case
            Elapsed time: 683.89 ms

        Multithreaded Case
            Number of concurrent workers: 8
            Elapsed time: 187.41 ms

        Relative speedup with concurrent = 3.6492053613365583

I see a similar factor 1.5-2 speedup for larger and smaller transform sizes.

grlee77 and others added 6 commits March 10, 2017 16:32
All discrete filters must still be real valued, but complex data arrays are
supported.

Complex support is not yet extended to the CWT routines (cwt.h/cwt.c).
This is no longer needed as the C routines support complex dtypes

Complex data type support was added to the SWT as well
for example: expected ‘__complex__ double *’ but argument is of type ‘struct __pyx_t_double_complex *’

This is because without the above definition, Cython uses its own complex types instead of those from C99
This is needed for MSVC compatiblity.

This requires using defines in C (DEF in Cython) to control whether the complex valued
routines are compiled.  On the Python side, _have_c99_complex is used to control
whether complex data can be passed directly to the Cython DWT and SWT routines or if
the real and imaginary parts must be processed separately.
@codecov-io
Copy link

codecov-io commented Mar 10, 2017

Codecov Report

Merging #300 into master will decrease coverage by 2.72%.
The diff coverage is 64.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #300      +/-   ##
==========================================
- Coverage   86.29%   83.56%   -2.73%     
==========================================
  Files          21       21              
  Lines        3100     3268     +168     
  Branches      538      562      +24     
==========================================
+ Hits         2675     2731      +56     
- Misses        372      475     +103     
- Partials       53       62       +9
Impacted Files Coverage Δ
pywt/_extensions/_cwt.pyx 99.02% <ø> (ø) ⬆️
pywt/_extensions/c/convolution.template.c 97.48% <ø> (ø) ⬆️
pywt/_extensions/c/wt.template.c 87.38% <100%> (ø) ⬆️
pywt/_extensions/_pywt.pyx 70.25% <100%> (-0.33%) ⬇️
pywt/_multidim.py 89.01% <33.33%> (-8.77%) ⬇️
pywt/_dwt.py 81.18% <33.33%> (-16.82%) ⬇️
pywt/_swt.py 86.91% <40%> (-9.97%) ⬇️
pywt/_extensions/_dwt.pyx 68% <59.29%> (-6.89%) ⬇️
pywt/_extensions/_swt.pyx 74.37% <71.92%> (-1.59%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52b1808...18c7b81. Read the comment docs.

@rgommers
Copy link
Member

The added complexity is manageable; a speedup of 1.5-2.5x is worth it I think.

There are dwt and swt benchmarks for complex64 - did you check those, and can we extend the benchmarks to cover the speedups achieved here (complex128, and a couple of other functions)?

Detecting C99 support reliably is fairly tricky, and as you say this leaves out MinGW. I'm not sure if we should do this in the straightforward way you implemented it, or try to steal NumPy's implementation to try and cover more corner cases.

https://github.com/numpy/numpy/blob/master/numpy/core/setup.py#L188

There's checks like these, but they don't go off a lot (if at all):
https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/npy_common.h#L281

The main unknown here is platforms other than Linux / Windows / OS X.

@rgommers rgommers added the build label Mar 11, 2017
@grlee77
Copy link
Contributor Author

grlee77 commented Mar 16, 2017

I was not aware of how the C99 check was done in numpy. I will look at that in a bit more detail when I have a chance, but from first glance it seems that the configuration objects used there would require going back to the setup function from numpy.distutils.core rather than the one from setuptools?

@rgommers
Copy link
Member

rgommers commented Dec 3, 2017

it seems that the configuration objects used there would require going back to the setup function from numpy.distutils.core rather than the one from setuptools?

Yes I think so. Probably not worth it at this point, can always be done later if we get reports of detection issues.

Since this is all green and looks sensible, I'll hit the green button.

@rgommers rgommers merged commit 53cd467 into PyWavelets:master Dec 3, 2017
@rgommers
Copy link
Member

rgommers commented Dec 3, 2017

Thanks @grlee77!

@rgommers rgommers added this to the v1.0 milestone Dec 3, 2017
@grlee77
Copy link
Contributor Author

grlee77 commented Dec 4, 2017

Glad to see this go in!

I did look briefly at the numpy C99 code a while back, but couldn't commit the time to fully get it working here. I think this approach will work in most cases and we can revisit again later if needed.

@rgommers
Copy link
Member

rgommers commented Dec 4, 2017

Yep, agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants