Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D convergence test fails on Sandy Bridge #617

Open
vanzod opened this issue Nov 21, 2018 · 7 comments
Open

2D convergence test fails on Sandy Bridge #617

vanzod opened this issue Nov 21, 2018 · 7 comments

Comments

@vanzod
Copy link

vanzod commented Nov 21, 2018

For some unidentified reason with Meep 1.6.0 the 2D convergence test performed with make test fails when executing it on a Sandy Bridge processor. The same exact build correctly completes all tests on Westmere, Haswell, Broadwell and Skylake. This is the full error message:

FAIL: 2D_convergence
====================

Using MPI version 3.1, 2 processes
Running holes square-lattice resolution convergence test.
Checking convergence for ey field...
(The correct frequency should be 0.179944.)
frequency for a=30 is 0.179632, 0.179749 (shifted), 0.17969 (mean)
Unshifted freq error is -0.281165/30/30
Shifted freq error is -0.175337/30/30
Frequency difference with a of 30 is -0.105828/30/30
frequency for a=25 is 0.17963, 0.179856 (shifted), 0.179743 (mean)
Unshifted freq error is -0.196313/25/25
Shifted freq error is -0.0551143/25/25
Frequency difference with a of 25 is -0.141199/25/25
frequency for a=20 is 0.179822, 0.180153 (shifted), 0.179988 (mean)
Unshifted freq error is -0.048724/20/20
Shifted freq error is 0.0837453/20/20
Frequency difference with a of 20 is -0.132469/20/20
frequency for a=15 is 0.180183, 0.180115 (shifted), 0.180149 (mean)
Unshifted freq error is 0.0538094/15/15
Shifted freq error is 0.0385003/15/15
Frequency difference with a of 15 is 0.0153091/15/15
frequency for a=10 is 0.180252, 0 (shifted), 0.0901258 (mean)
Unshifted freq error is 0.0307579/10/10
Shifted freq error is -17.9944/10/10
meep: Frequency doesn't converge properly with a.
meep: Frequency doesn't converge properly with a.

The code has been compiled with GCC 6.4.0 and OpenMPI 2.1.1. This is the set of configuration flags:

--with-pic 
--with-mpi 
--without-gcc-arch 
--with-blas=openblas 
--with-lapack=openblas 
--with-libctl=<libctl_root>/share/libctl" 
--enable-shared 
--enable-maintainer-mode

Removing --without-gcc-arch does not solve the issue either.

Here is the list of libraries against which Meep has been built:

Harminv 1.4.1
HDF5 1.10.1
libctl 4.1.3
GSL 2.4
Guile 2.2.2
libGDSII 0.1
MPB 1.6.2
Python 2.7.14
h5py 2.7.1
@oskooi
Copy link
Collaborator

oskooi commented Nov 21, 2018

We recently released Meep 1.7 and libctl 4.1.4 (as well as MPB 1.7) which include several gcc-related improvements. Try using these latest tarballs and see whether the test still fails.

@vanzod
Copy link
Author

vanzod commented Nov 27, 2018

@oskooi I have tested Meep 1.7.0 built against libctl 4.1.4 and MPB 1.7.0 and still the same test does not converge on Sandy Bridge processors:

======================================
   meep 1.7.0: tests/test-suite.log
======================================

# TOTAL: 19
# PASS:  18
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: 2D_convergence
====================

Using MPI version 3.1, 2 processes
Running holes square-lattice resolution convergence test.
Checking convergence for ey field...
(The correct frequency should be 0.179944.)
frequency for a=30 is 0.179632, 0.179749 (shifted), 0.17969 (mean)
Unshifted freq error is -0.281165/30/30
Shifted freq error is -0.175337/30/30
Frequency difference with a of 30 is -0.105828/30/30
frequency for a=25 is 0.17963, 0.179856 (shifted), 0.179743 (mean)
Unshifted freq error is -0.196313/25/25
Shifted freq error is -0.0551143/25/25
Frequency difference with a of 25 is -0.141199/25/25
frequency for a=20 is 0.179822, 0.180153 (shifted), 0.179988 (mean)
Unshifted freq error is -0.048724/20/20
Shifted freq error is 0.0837453/20/20
Frequency difference with a of 20 is -0.132469/20/20
frequency for a=15 is 0.180183, 0.180115 (shifted), 0.180149 (mean)
Unshifted freq error is 0.0538094/15/15
Shifted freq error is 0.0385003/15/15
Frequency difference with a of 15 is 0.0153091/15/15
frequency for a=10 is 0.180252, 0 (shifted), 0.0901258 (mean)
Unshifted freq error is 0.0307579/10/10
Shifted freq error is -17.9944/10/10
meep: Frequency doesn't converge properly with a.
meep: Frequency doesn't converge properly with a.

@stevengj
Copy link
Collaborator

Can you try compiling with optimization turned off? For example try configuring with --enable-debug. I want to see if it is a compiler problem.

@ChristopherHogan
Copy link
Contributor

You could also try setting OPENBLAS_NUM_THREADS=1.

@vanzod
Copy link
Author

vanzod commented Dec 3, 2018

@stevengj @ChristopherHogan I have tried both ways and combined but the 2D convergence test still fails as I reported above.

@stevengj
Copy link
Collaborator

stevengj commented Dec 5, 2018

Try configuring Harminv with --enable-debug. (My first guess here is that something screwy is happening with an aggressive compiler optimization, which would disappear if you turn off optimization with --enable-debug. The only question is, compilation of what? Not Meep, apparently, but it could be Harminv, which is called in this test.)

The other option would be to link to a different BLAS library, e.g. the reference BLAS, in case it's a bug in OpenBLAS. What operating system are you using?

@boegel
Copy link

boegel commented Dec 5, 2018

detailed system info is available in the test report gist linked at easybuilders/easybuild-easyconfigs#7129 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants