Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of threads has changed! when running with cp2k.sdbg and more than 64 OMP threads #417

Open
dev-zero opened this issue Jan 8, 2021 · 5 comments
Milestone

Comments

@dev-zero
Copy link
Contributor

dev-zero commented Jan 8, 2021

Describe the bug

we get the following error message when running certain tests with a fresh cp2k.sdbg:

 *******************************************************************************
 *   ___                                                                       *
 *  /   \                                                                      *
 * [ABORT]                                                                     *
 *  \___/                     Number of threads has changed!                   *
 *    |                                                                        *
 *  O/|                                                                        *
 * /| |                                                                        *
 * / \                                         dbcsr_iterator_operations.F:179 *
 *******************************************************************************

To Reproduce
Steps to reproduce the behavior:

  1. Built with the command: make ARCH=local VERSION=sdbg with the arch file from the toolchain
  2. Run like this: cd tests/QS/regtest-ri-rpa-rse ; ../../../exe/local/cp2k.sdbg Cubic_RPA_RSE_H2.inp
  3. On the architecture/host/platform: openSUSE LEAP 15.2, GCC 7.5.0, OpenMPI 3.1.6; system-provded libopenblas_openmp; rest from toolchain
  4. See error

Setting OMP_NUM_THREADS=64 solves the issue.

@dev-zero
Copy link
Contributor Author

dev-zero commented Jan 8, 2021

reproducible with DBCSR itself, configured with: cmake -DTEST_MPI_RANKS=1 -DTEST_OMP_THREADS=72 ..:

$ make test
Running tests...
Test project /data/tiziano/cp2k/exts/dbcsr/build
      Start  1: dbcsr_perf:inputs/test_H2O.perf
 1/17 Test  #1: dbcsr_perf:inputs/test_H2O.perf .......................   Passed   72.55 sec
      Start  2: dbcsr_perf:inputs/test_rect1_dense.perf
 2/17 Test  #2: dbcsr_perf:inputs/test_rect1_dense.perf ...............   Passed    2.56 sec
      Start  3: dbcsr_perf:inputs/test_rect1_sparse.perf
 3/17 Test  #3: dbcsr_perf:inputs/test_rect1_sparse.perf ..............   Passed   10.91 sec
      Start  4: dbcsr_perf:inputs/test_rect2_dense.perf
 4/17 Test  #4: dbcsr_perf:inputs/test_rect2_dense.perf ...............   Passed    2.49 sec
      Start  5: dbcsr_perf:inputs/test_rect2_sparse.perf
 5/17 Test  #5: dbcsr_perf:inputs/test_rect2_sparse.perf ..............   Passed   10.36 sec
      Start  6: dbcsr_perf:inputs/test_singleblock.perf
 6/17 Test  #6: dbcsr_perf:inputs/test_singleblock.perf ...............   Passed    0.85 sec
      Start  7: dbcsr_perf:inputs/test_square_dense.perf
 7/17 Test  #7: dbcsr_perf:inputs/test_square_dense.perf ..............   Passed    1.09 sec
      Start  8: dbcsr_perf:inputs/test_square_sparse.perf
 8/17 Test  #8: dbcsr_perf:inputs/test_square_sparse.perf .............   Passed    3.45 sec
      Start  9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
 9/17 Test  #9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ...   Passed    1.62 sec
      Start 10: dbcsr_unittest1
10/17 Test #10: dbcsr_unittest1 .......................................   Passed  1372.54 sec
      Start 11: dbcsr_unittest2
11/17 Test #11: dbcsr_unittest2 .......................................   Passed  236.76 sec
      Start 12: dbcsr_unittest3
12/17 Test #12: dbcsr_unittest3 .......................................   Passed  308.31 sec
      Start 13: dbcsr_unittest4
13/17 Test #13: dbcsr_unittest4 .......................................   Passed    0.89 sec
      Start 14: dbcsr_tensor_unittest
14/17 Test #14: dbcsr_tensor_unittest .................................***Failed    4.51 sec
      Start 15: dbcsr_tas_unittest
15/17 Test #15: dbcsr_tas_unittest ....................................   Passed    3.59 sec
      Start 16: dbcsr_test_csr_conversions
16/17 Test #16: dbcsr_test_csr_conversions ............................   Passed   10.47 sec
      Start 17: dbcsr_tensor_test
17/17 Test #17: dbcsr_tensor_test .....................................   Passed    0.73 sec

94% tests passed, 1 tests failed out of 17

Total Test time (real) = 2043.68 sec

The following tests FAILED:
	14 - dbcsr_tensor_unittest (Failed)
Errors while running CTest
make: *** [Makefile:124: test] Error 8

and Testing/Temporary/LastTest.log shows for the relevant test:

[...]
--------------------------------------------------------------------------------
TAS MATRIX MULTIPLICATION DONE
--------------------------------------------------------------------------------
 GLOBAL INFO OF (14|25)
   block dimensions:      4     5    11     3
   full dimensions:       25      32      83      28
   process grid dimensions:      1     1     1     1

 DISTRIBUTION OF (14|25)
              Number of non-zero blocks:                                      26
              Percentage of non-zero blocks:                                3.94
              Average number of blocks per CPU:                               26
              Maximum number of blocks per CPU:                               26
              Average number of matrix elements per CPU:                   64680
              Maximum number of matrix elements per CPU:                   64680

 *******************************************************************************
 *   ___                                                                       *
 *  /   \                                                                      *
 * [ABORT]                                                                     *
 *  \___/                     Number of threads has changed!                   *
 *    |                                                                        *
 *  O/|                                                                        *
 * /| |                                                                        *
 * / \                                         dbcsr_iterator_operations.F:179 *
 *******************************************************************************


 ===== Routine Calling Stack ===== 

            4 dbcsr_iterator_start
            3 dbcsr_filter_anytype
            2 dbcsr_t_contract
            1 dbcsr_t_total
[...]

@dev-zero dev-zero changed the title Number of threads has changed! when running with cp2k.sdb and more than 64 OMP threads Number of threads has changed! when running with cp2k.sdbg and more than 64 OMP threads Jan 13, 2021
@dev-zero
Copy link
Contributor Author

it seems that the number of OMP threads gets capped to 64 at some point

@oschuett
Copy link
Member

Maybe this is caused by the NUM_THREADS=64 in install_openblas.sh?

@dev-zero
Copy link
Contributor Author

Could be, should be easy to verify (ref-lapack, mkl, libsci).
If it is indeed OpenBLAS, the question is what we should do. The DBCSR-only test above was with a system-provided OpenBLAS on an openSUSE-system.

We can:

  • explicitly check OpenBLAS for number of threads when linking against OpenBLAS at initialization
  • implicitly check by calling a BLAS routine at initialization
  • reinit the first time this happens and then restrict future allocs to that lower number
  • leave it to the user to provide a stable OMP env before calling into DBCSR

@dev-zero
Copy link
Contributor Author

dev-zero commented Jan 18, 2021

Could now reproduce this on a new (Apple silicon) MacBook Air. The only way around was to set OMP_NUM_THREADS=1. Sorry, this was actually related to NO building with FFTW3, see cp2k/cp2k#1315

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants