Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dft/simd/sse2 is compiled even when --disable-sse2 was given to configure #35

Closed
UnitedMarsupials-zz opened this issue Mar 4, 2015 · 7 comments

Comments

@UnitedMarsupials-zz
Copy link

The build -- both in 3.3.3 and 3.3.4 -- attempts to compile the content of the sse2-directory even when configure was explicitly asked to disable sse2:

--enable-shared --enable-threads --disable-fortran --disable-openmp --enable-float --enable-sse --disable-sse2

When the -march argument is set to a CPU, that has no SSE2 instructions (such as "athlon-xp"), some compilers -- such as clang -- fail:

cc -DHAVE_CONFIG_H -I. -I../../.. -I../../../kernel -I../../../dft -I../../../dft/simd -I../../../simd-support -msse -O2 -pipe -march=athlon-xp -fstack-protector -fno-strict-aliasing -MT n1fv_2.lo -MD -MP -MF .deps/n1fv_2.Tpo -c n1fv_2.c -fPIC -DPIC -o .libs/n1fv_2.o
fatal error: error in backend: Do not know how to split the result of this operator!
cc: error: clang frontend command failed with exit code 70 (use -v to see invocation)
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
Target: i386-unknown-freebsd10.1

Looking into configure script I see the following line:

if test "$have_sse" = "yes"; then have_sse2=yes; fi

Huh?

@UnitedMarsupials-zz
Copy link
Author

The following patch seems to help solve the problem:

https://bugs.freebsd.org/bugzilla/attachment.cgi?id=153812&action=diff

@rdolbeau
Copy link
Contributor

The distinction between SSE and SSE2 by have_sse* is done at the configure level only. Everywhere else in the code, HAVE_SSE2 is enabled and the data size (single vs. double) is used to distinguish the two. Your patch makes --enable-sse a no-op. fully disabling SSE in single precision, which is probably not what you want.

I can reproduce the error. This is a bug in clang 3.4, perhaps https://llvm.org/bugs/show_bug.cgi?id=16748 or similar. Everything works fine with gcc.

@UnitedMarsupials-zz
Copy link
Author

My patch may well have been incorrect, but the problem is real -- fftw's code considers SSE and SSE2 to be the same, which is incorrect. Now, it may very well be, that there are no SSE-only optimizations in the code and hardware devoid of SSE2-instructions will have to use generic code.

But, if there is code, that can be used on an SSE-only CPU (without SSE2), it should be possible to turn it on without also turning on SSE2.

I can reproduce the error. This is a bug in clang 3.4

I would not blame clang for barfing -- it is asked to generate SSE2-instructions while a command-line option (-march) is telling it to target a CPU without SSE2 support (Athlon).

Moreover, the fact, that gcc "works" despite being presented with such a conflict, may be considered a bug...

@rdolbeau
Copy link
Contributor

My patch may well have been incorrect, but the problem is real -- fftw's
code consider SSE and SSE2 to be the same, which is incorrect.

No it doesn't. SSE only has single-precision FP instructions, whereas SSE2
added double-precision FP.
If you only use one precision, you only use one of the two instruction sets.
You can see all the intrinsics clearly explained here: <
https://software.intel.com/sites/landingpage/IntrinsicsGuide/>
That's why both variants are in the same file, protected by preprocessor
macros using the symbol FFTW_SINGLE.

But, if there is code, that can be used on an SSE-only CPU (without
SSE2), it should be possible to turn it on without also turning on SSE2.

There is SSE code, and it doesn't enable SSE2. It just compiles file named
like that.

I would not blame clang for barfing

It doesn't 'barf', it throws an ICE (internal compiler error) and
recommends filling a bug report (on Debian).

If you're still convinced there is an issue, compile the code with gcc (or
a newer clang, hopefully the bug is fixed) and test.
It should work fine on pre-SSE2 CPUs (I haven't seen one of those in a
decade, so I can't test myself).

Cordially,

@UnitedMarsupials-zz
Copy link
Author

SSE only has single-precision FP instructions, whereas SSE2 added double-precision FP.

I am very well aware, SSE and SSE2 are different things. Your code seems to consider them to be same. Below is the line from your own configure:

if test "$have_sse" = "yes"; then have_sse2=yes; fi

That's why both variants are in the same file, protected by preprocessor
macros using the symbol FFTW_SINGLE.

I see. So the distinction is made at preprocessing time... Yes, I see FFTW_SINGLE being defined here (by configure) and yet clang (3.4.1) is dying anyway. Ok, I'll investigate further.

@matteo-frigo
Copy link
Member

It appears that the confusion is caused by our attempts to reduce confusion :-(

Before we added AVX support, it used to be the case that --enable-sse meant "use sse instructions, single-precision only", and --enable-sse2 meant "use sse2 instructions, double-precision only". When we added --enable-avx, this flag necessarily meant "use avx instructions in either precision". For uniformity, we decided to unify treatment of sse and sse2, with the following rules:

  • --enable-sse and --enable-sse2 are equivalent
  • when specifying --enable-sse*, FFTW compiles SSE code in single precision and SSE2 code in double precision. In particular, it passes "-msse" to the compiler in single precision (and not -msse2). This preserves compatibility with the venerable athlon xp.

We never thought about the contradictory specification --enable-sse --disable-sse2; it looks like the --enable side wins, causing the OP's issue.

The OP's issue can be solved simply by not passing --enable-sse to the double-precision build.

If the OP's disagrees with this treatment, feel free to suggest an alternative, but please keep in mind that the current behavior is backward-compatible with the historical behavior (i.e., a double-precision build would have failed with --enable-sse in the past), and that avx is precision-agnostic, so a unified treatment is desirable.

@rdolbeau
Copy link
Contributor

  • --enable-sse and --enable-sse2 are equivalent
  • when specifying --enable-sse*, FFTW compiles SSE code in single
    precision and SSE2 code in double precision. In particular, it passes
    "-msse" to the compiler in single precision (and not -msse2). This
    preserves compatibility with the venerable athlon xp.

Unless I misread configure.ac, almost equivalent:

  • --enable-sse will throw an error without --enable-single (since a CPU
    with only SSE cannot use double-precision SIMD)
  • --enable-sse2 is accepted for both precision, since all SSE2 system also
    have SSE
    The help strings are consistent with that behavior (indicating 'SSE' and
    'SSE/SSE2').

We never thought about the contradictory specification --enable-sse
--disable-sse2; it looks like the --enable side wins, causing the OP's
issue.

I don't think it causes an issue. Without --enable-single, --enable-sse
will throw an error. With it, --enable-sse will set 'have_sse' to 'yes',
--disable-sse2 will set 'have_sse2' to 'no', then the quoted bit of
configure.ac by the OP will check for 'have_sse' and set 'have_sse2' back
to 'yes'. Since --enable-single is set, everything will be compiled in SSE
just fine. Whether you thought of it or not, it's doing the right thing :-)
"--disable-sse --enable-sse2" will also work for both SP and DP (it can be
argued this one should fail for SP, but it will produce a usable binary
anyway...)

In all cases the suffix for the codelets is "_sse2", adding to the
confusion :-) I think all the confusion stems from Intel's 18 different
SIMD flags (w/o counting the AMD-only flags...)

Cordially,

Romain Dolbeau

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants