Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert vs AssertThrow in the testsuite #598

Closed
bangerth opened this issue Feb 25, 2015 · 22 comments · Fixed by #688
Closed

Assert vs AssertThrow in the testsuite #598

bangerth opened this issue Feb 25, 2015 · 22 comments · Fixed by #688

Comments

@bangerth
Copy link
Member

@Rombur writes:
...................
I checked in the testsuite and we use Assert 2644 times and AssertThrow 63 times. I found several tests that do nothing in release in mode :( Should we just change all of the Assert to AssertThrow or go through all the tests and see what's appropriate ?
...................

It's a good question. I don't know what to do. We could:

  • Ignore the issue. Most of these tests probably still produce output that is indicative of whether the code worked. We run 7200 tests, if even 100 of those don't do anything in release mode, then that's still only 1%.
  • Make a wholesale change. We have a few tests that explicitly want exceptions instead of aborts from Assert but that I think they can only (and probably do) only run in debug mode.
  • Other options?
@tjhei
Copy link
Member

tjhei commented Feb 25, 2015

While I haven't seen/fixed any bug by running the release mode tests, I do think it would be useful to have better coverage in release mode. This is because I had problems with the Intel compiler generating wrong optimized code in the past (hey, that is why we are disabling vectorization...).

But yes, there are several tests that expect exceptions/asserts to trigger so we need to be careful there.

@Rombur
Copy link
Member

Rombur commented Feb 25, 2015

I checked in the testsuite and, for the Intel compiler, there are tests that fail only in release mode. So the Intel compiler is probably in worse shape that we think. I will go through the testsuite and change Assert to AssertThrow when I think it's appropriate.

@tjhei
Copy link
Member

tjhei commented Feb 25, 2015

That might also be due to different output/rounding. What tests do you refer to?

@Rombur
Copy link
Member

Rombur commented Feb 25, 2015

@tjhei
Copy link
Member

tjhei commented Feb 25, 2015

Urgh. That boils down to

      DEAL_II_OPENMP_SIMD_PRAGMA
      for (size_type i=begin; i<end; ++i)
        val[i] += v_val[i];

forgetting the last element. Has anybody looked into this further?

@tjhei
Copy link
Member

tjhei commented Feb 25, 2015

(otherwise we need to blacklist DEAL_II_HAVE_OPENMP_SIMD)

@Rombur
Copy link
Member

Rombur commented Feb 25, 2015

We could but that's kinda sad that we have to disable vectorization on the only compiler that supports Xeon Phi.

@bangerth
Copy link
Member Author

Is this reproducible on a smaller, self-contained testcase? It's a common enough operation, and a common enough compiler, that it worries me :-(

@tjhei
Copy link
Member

tjhei commented Feb 25, 2015

Vectorization is already disabled on intel (see check_03_compiler_bugs):

IF(CMAKE_CXX_COMPILER_ID MATCHES "Intel")
  ENABLE_IF_SUPPORTED(DEAL_II_CXX_FLAGS_RELEASE "-no-vec")
ENDIF()

So Xeon Phi compilations do not use vectorization. I should probably revisit this with Intel 15...

@tjhei
Copy link
Member

tjhei commented Feb 25, 2015

@bangerth I will take a look.

@Rombur
Copy link
Member

Rombur commented Feb 25, 2015

Yeah but that's the autovectorization. DEAL_II_HAVE_OPENMP_SIMD is the vectorization through OpenMP 4 that's another flag.

@tjhei
Copy link
Member

tjhei commented Feb 25, 2015

see #607.

@tjhei
Copy link
Member

tjhei commented Feb 25, 2015

So it is a combination of TBB and SIMD and only for long double. :-(

@Rombur
Copy link
Member

Rombur commented Feb 25, 2015

That's strange, long double should be 128 bits which means that only Haswell and Broadwell can use the SIMD instructions. Well if it only affects long double that's a good news.

@Rombur
Copy link
Member

Rombur commented Feb 25, 2015

Apparently vectorization is not supported for long double. I guess we can specialize the template so we don't use SIMD for long double.

@bangerth
Copy link
Member Author

I thought long double uses the crazy and ill-conceived 80-bit floating point format on Intel. Or did they fix this wart in the x86_64 ABI?

@Rombur
Copy link
Member

Rombur commented Feb 26, 2015

It is 80-bit plus padding to get 128-bit... But I found a post on Intel forum where they say that you cannot use vectorization for either long double or quad precision.

@bangerth
Copy link
Member Author

Sure would have been nice if they just disabled it in that case...

@tjhei
Copy link
Member

tjhei commented Feb 26, 2015

So what do we do, add this as a compiler check and disable SIMD conditionally?

@bangerth
Copy link
Member Author

I like Bruno's suggestion of using an explicit specialization for this one function.

@kronbichler
Copy link
Member

Just one more comment on long double and vectorization to extend on @Rombur : For 80-bit calculations, legacy x87 instructions are used instead of the SSE2 or AVX instructions that get used for double and single precision. Before SSE2/SSE, all computations were done with 80 bit internally but got truncated/rounded before saved to memory. Now the hardware for float and double has introduced vectorization but not for other floating point types. This is well documented in all manuals on SSE2/AVX/AVX512 etc.
I think it makes sense to only have them because vectorization requires several units. A (scalar) 128 bit multiplier certainly has hundreds of thousands of transistors and the field of applications that need true quad precision in hardware is most likely way to narrow to justify general purpose hardware (I don't know many cases where double-double precision is not enough; it is somewhat less accurate but avoids introducing expensive hardware). Let alone to have these guys replicated for vectorization. Hey, as an HPC person I am already happy that double precision runs at half the width of single precision on literally all relevant CPUs. It could be one tenth or less like on GPUs because 95% of processors rarely need double precision.

But back to the topic: thanks for finding the reason and I also favor @Rombur suggestion.

@tjhei
Copy link
Member

tjhei commented Feb 26, 2015

Okay, easy enough. I will prepare a patch.

tjhei added a commit to tjhei/dealii that referenced this issue Feb 28, 2015
Disable long double SIMD instructions on ICC. This is to work around a
bug that generates wrong code at least up to intel 15 (see tests/lac
/vector-vector, tests/lac/intel-15-bug, and the discussion at
dealii#598).
bangerth added a commit to bangerth/dealii that referenced this issue Apr 20, 2015
This also addresses dealii#598.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants