dft/simd/sse2 is compiled even when --disable-sse2 was given to configure #35

UnitedMarsupials-zz · 2015-03-04T23:23:07Z

The build -- both in 3.3.3 and 3.3.4 -- attempts to compile the content of the sse2-directory even when configure was explicitly asked to disable sse2:

--enable-shared --enable-threads --disable-fortran --disable-openmp --enable-float --enable-sse --disable-sse2

When the -march argument is set to a CPU, that has no SSE2 instructions (such as "athlon-xp"), some compilers -- such as clang -- fail:

cc -DHAVE_CONFIG_H -I. -I../../.. -I../../../kernel -I../../../dft -I../../../dft/simd -I../../../simd-support -msse -O2 -pipe -march=athlon-xp -fstack-protector -fno-strict-aliasing -MT n1fv_2.lo -MD -MP -MF .deps/n1fv_2.Tpo -c n1fv_2.c -fPIC -DPIC -o .libs/n1fv_2.o
fatal error: error in backend: Do not know how to split the result of this operator!
cc: error: clang frontend command failed with exit code 70 (use -v to see invocation)
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
Target: i386-unknown-freebsd10.1

Looking into configure script I see the following line:

if test "$have_sse" = "yes"; then have_sse2=yes; fi

Huh?

The text was updated successfully, but these errors were encountered:

UnitedMarsupials-zz · 2015-03-05T00:52:48Z

The following patch seems to help solve the problem:

https://bugs.freebsd.org/bugzilla/attachment.cgi?id=153812&action=diff

rdolbeau · 2015-04-13T07:01:38Z

The distinction between SSE and SSE2 by have_sse* is done at the configure level only. Everywhere else in the code, HAVE_SSE2 is enabled and the data size (single vs. double) is used to distinguish the two. Your patch makes --enable-sse a no-op. fully disabling SSE in single precision, which is probably not what you want.

I can reproduce the error. This is a bug in clang 3.4, perhaps https://llvm.org/bugs/show_bug.cgi?id=16748 or similar. Everything works fine with gcc.

UnitedMarsupials-zz · 2015-04-15T01:54:39Z

My patch may well have been incorrect, but the problem is real -- fftw's code considers SSE and SSE2 to be the same, which is incorrect. Now, it may very well be, that there are no SSE-only optimizations in the code and hardware devoid of SSE2-instructions will have to use generic code.

But, if there is code, that can be used on an SSE-only CPU (without SSE2), it should be possible to turn it on without also turning on SSE2.

I can reproduce the error. This is a bug in clang 3.4

I would not blame clang for barfing -- it is asked to generate SSE2-instructions while a command-line option (-march) is telling it to target a CPU without SSE2 support (Athlon).

Moreover, the fact, that gcc "works" despite being presented with such a conflict, may be considered a bug...

rdolbeau · 2015-04-15T06:23:51Z

My patch may well have been incorrect, but the problem is real -- fftw's
code consider SSE and SSE2 to be the same, which is incorrect.

No it doesn't. SSE only has single-precision FP instructions, whereas SSE2
added double-precision FP.
If you only use one precision, you only use one of the two instruction sets.
You can see all the intrinsics clearly explained here: <
https://software.intel.com/sites/landingpage/IntrinsicsGuide/>
That's why both variants are in the same file, protected by preprocessor
macros using the symbol FFTW_SINGLE.

But, if there is code, that can be used on an SSE-only CPU (without
SSE2), it should be possible to turn it on without also turning on SSE2.

There is SSE code, and it doesn't enable SSE2. It just compiles file named
like that.

I would not blame clang for barfing

It doesn't 'barf', it throws an ICE (internal compiler error) and
recommends filling a bug report (on Debian).

If you're still convinced there is an issue, compile the code with gcc (or
a newer clang, hopefully the bug is fixed) and test.
It should work fine on pre-SSE2 CPUs (I haven't seen one of those in a
decade, so I can't test myself).

Cordially,

UnitedMarsupials-zz · 2015-04-15T09:30:00Z

SSE only has single-precision FP instructions, whereas SSE2 added double-precision FP.

I am very well aware, SSE and SSE2 are different things. Your code seems to consider them to be same. Below is the line from your own configure:

if test "$have_sse" = "yes"; then have_sse2=yes; fi

That's why both variants are in the same file, protected by preprocessor
macros using the symbol FFTW_SINGLE.

I see. So the distinction is made at preprocessing time... Yes, I see FFTW_SINGLE being defined here (by configure) and yet clang (3.4.1) is dying anyway. Ok, I'll investigate further.

matteo-frigo · 2015-04-15T17:46:23Z

It appears that the confusion is caused by our attempts to reduce confusion :-(

Before we added AVX support, it used to be the case that --enable-sse meant "use sse instructions, single-precision only", and --enable-sse2 meant "use sse2 instructions, double-precision only". When we added --enable-avx, this flag necessarily meant "use avx instructions in either precision". For uniformity, we decided to unify treatment of sse and sse2, with the following rules:

--enable-sse and --enable-sse2 are equivalent
when specifying --enable-sse*, FFTW compiles SSE code in single precision and SSE2 code in double precision. In particular, it passes "-msse" to the compiler in single precision (and not -msse2). This preserves compatibility with the venerable athlon xp.

We never thought about the contradictory specification --enable-sse --disable-sse2; it looks like the --enable side wins, causing the OP's issue.

The OP's issue can be solved simply by not passing --enable-sse to the double-precision build.

If the OP's disagrees with this treatment, feel free to suggest an alternative, but please keep in mind that the current behavior is backward-compatible with the historical behavior (i.e., a double-precision build would have failed with --enable-sse in the past), and that avx is precision-agnostic, so a unified treatment is desirable.

rdolbeau · 2015-04-15T18:11:58Z

--enable-sse and --enable-sse2 are equivalent

when specifying --enable-sse*, FFTW compiles SSE code in single
precision and SSE2 code in double precision. In particular, it passes
"-msse" to the compiler in single precision (and not -msse2). This
preserves compatibility with the venerable athlon xp.

Unless I misread configure.ac, almost equivalent:

--enable-sse will throw an error without --enable-single (since a CPU
with only SSE cannot use double-precision SIMD)

--enable-sse2 is accepted for both precision, since all SSE2 system also
have SSE
The help strings are consistent with that behavior (indicating 'SSE' and
'SSE/SSE2').

We never thought about the contradictory specification --enable-sse
--disable-sse2; it looks like the --enable side wins, causing the OP's
issue.

I don't think it causes an issue. Without --enable-single, --enable-sse
will throw an error. With it, --enable-sse will set 'have_sse' to 'yes',
--disable-sse2 will set 'have_sse2' to 'no', then the quoted bit of
configure.ac by the OP will check for 'have_sse' and set 'have_sse2' back
to 'yes'. Since --enable-single is set, everything will be compiled in SSE
just fine. Whether you thought of it or not, it's doing the right thing :-)
"--disable-sse --enable-sse2" will also work for both SP and DP (it can be
argued this one should fail for SP, but it will produce a usable binary
anyway...)

In all cases the suffix for the codelets is "_sse2", adding to the
confusion :-) I think all the confusion stems from Intel's 18 different
SIMD flags (w/o counting the AMD-only flags...)

Cordially,

Romain Dolbeau

rdolbeau closed this as completed Apr 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dft/simd/sse2 is compiled even when --disable-sse2 was given to configure #35

dft/simd/sse2 is compiled even when --disable-sse2 was given to configure #35

UnitedMarsupials-zz commented Mar 4, 2015

UnitedMarsupials-zz commented Mar 5, 2015

rdolbeau commented Apr 13, 2015

UnitedMarsupials-zz commented Apr 15, 2015

rdolbeau commented Apr 15, 2015

UnitedMarsupials-zz commented Apr 15, 2015

matteo-frigo commented Apr 15, 2015

rdolbeau commented Apr 15, 2015

dft/simd/sse2 is compiled even when --disable-sse2 was given to configure #35

dft/simd/sse2 is compiled even when --disable-sse2 was given to configure #35

Comments

UnitedMarsupials-zz commented Mar 4, 2015

UnitedMarsupials-zz commented Mar 5, 2015

rdolbeau commented Apr 13, 2015

UnitedMarsupials-zz commented Apr 15, 2015

rdolbeau commented Apr 15, 2015

UnitedMarsupials-zz commented Apr 15, 2015

matteo-frigo commented Apr 15, 2015

rdolbeau commented Apr 15, 2015