Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_square_sparse_rma.perf test in make test #238

Closed
vitesselin opened this issue Oct 15, 2019 · 5 comments
Closed

test_square_sparse_rma.perf test in make test #238

vitesselin opened this issue Oct 15, 2019 · 5 comments
Milestone

Comments

@vitesselin
Copy link

Describe the bug
When I use the v2.0.0-rc7 tag to run the make test in CUDA version but the test will be stuck in test_square_sparse_rma.perf. How much time will it finish or it justs stucks?

To Reproduce
Steps to reproduce the behavior:

  1. Built with the command: 'v2.0.0-rc7'
  2. Run like this: 'make test'
  3. On the architecture/host/platform: 'Ubuntu 18.04.2 LTS' with nvidia 1080ti GPU card
  4. Running tests...
    Test project /home/soga/dbcsr-v2.0.0-rc7/build
    Start 1: dbcsr_perf:inputs/test_H2O.perf
    1/18 Test Delete COPYRIGHT as there is now LICENSE #1: dbcsr_perf:inputs/test_H2O.perf ....................... Passed 22.88 sec
    Start 2: dbcsr_perf:inputs/test_rect1_dense.perf
    2/18 Test Test dbcsr_performance_driver doesn't work for non-square matrices #2: dbcsr_perf:inputs/test_rect1_dense.perf ............... Passed 524.70 sec
    Start 3: dbcsr_perf:inputs/test_rect1_sparse.perf
    3/18 Test Update file headers #3: dbcsr_perf:inputs/test_rect1_sparse.perf .............. Passed 388.77 sec
    Start 4: dbcsr_perf:inputs/test_rect2_dense.perf
    4/18 Test Automate sync to CP2K SVN #4: dbcsr_perf:inputs/test_rect2_dense.perf ............... Passed 460.63 sec
    Start 5: dbcsr_perf:inputs/test_rect2_sparse.perf
    5/18 Test get automated coverage reports working #5: dbcsr_perf:inputs/test_rect2_sparse.perf .............. Passed 441.30 sec
    Start 6: dbcsr_perf:inputs/test_square_dense.perf
    6/18 Test Bug in Cannon #6: dbcsr_perf:inputs/test_square_dense.perf .............. Passed 323.97 sec
    Start 7: dbcsr_perf:inputs/test_square_sparse.perf
    7/18 Test Dev loc #7: dbcsr_perf:inputs/test_square_sparse.perf ............. Passed 274.20 sec
    Start 8: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
    8/18 Test File extensions in the Makefile #8: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ... Passed 281.11 sec
    Start 9: dbcsr_perf:inputs/test_square_sparse_rma.perf
    ... stuck here ...

Environment:

  • Operating system & version: 'Ubuntu 18.04.2 LTS'

  • Compiler vendor & version:

  • Build environment (make or cmake): cmake version 3.15.3

  • Configuration of DBCSR (either the cmake flags or the Makefile.inc): cmake -DUSE_CUDA=ON -DUSE_CUBLAS=ON -DWITH_GPU=P100

  • MPI implementation and version:
    Package: Open MPI buildd@lcy01-amd64-009 Distribution
    Open MPI: 2.1.1
    Open MPI repo revision: v2.1.0-100-ga2fdb5b
    Open MPI release date: May 10, 2017
    Open RTE: 2.1.1
    Open RTE repo revision: v2.1.0-100-ga2fdb5b
    Open RTE release date: May 10, 2017
    OPAL: 2.1.1
    OPAL repo revision: v2.1.0-100-ga2fdb5b
    OPAL release date: May 10, 2017
    MPI API: 3.1.0
    Ident string: 2.1.1
    Prefix: /usr
    Configured architecture: x86_64-pc-linux-gnu
    Configure host: lcy01-amd64-009
    Configured by: buildd
    Configured on: Mon Feb 5 19:59:59 UTC 2018
    Configure host: lcy01-amd64-009
    Built by: buildd
    Built on: Mon Feb 5 20:05:56 UTC 2018
    Built host: lcy01-amd64-009
    C bindings: yes
    C++ bindings: yes
    Fort mpif.h: yes (all)
    Fort use mpi: yes (full: ignore TKR)
    Fort use mpi size: deprecated-ompi-info-value
    Fort use mpi_f08: yes
    Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
    limitations in the gfortran compiler, does not
    support the following: array subsections, direct
    passthru (where possible) to underlying Open MPI's
    C functionality
    Fort mpi_f08 subarrays: no
    Java bindings: yes
    Wrapper compiler rpath: disabled
    C compiler: gcc
    C compiler absolute: /usr/bin/gcc
    C compiler family name: GNU
    C compiler version: 7.3.0
    C++ compiler: g++
    C++ compiler absolute: /usr/bin/g++
    Fort compiler: gfortran
    Fort compiler abs: /usr/bin/gfortran

  • If CUDA is being used: CUDA version and GPU architecture:CUDA 10.1, GPU Card 1080Ti

  • BLAS/LAPACK implementation and version

  • If applicable: Runtime information (how many nodes, type of nodes, ...): one node only.

Thanks,
Vitesse.

@alazzaro
Copy link
Member

Unfortunately, the problem is OpenMPI, which is buggy with RMA...
Do you have any change to use mpich?

@alazzaro
Copy link
Member

BTW, I see that you are using a single node. Could you check without MPI, i.e.

cmake -DUSE_MPI=OFF -DUSE_CUDA=ON -DUSE_CUBLAS=ON -DWITH_GPU=P100

?

@vitesselin
Copy link
Author

vitesselin commented Oct 17, 2019

@alazzaro
I made some experiments and found somethings.

  1. The test can pass when I disable the MPI in cmake setting.
  2. It has the same stuck issue if I use command "dbcsr_perf inputs/test_square_sparse_rma.perf" with MPI mode setting.
  3. The mpi mode is not better than non-mpi mode.
    Test project /home/soga/dbcsr-v2.0.0-rc7/build-nompi
    None-mpi mode--------------------------------------------------------------------
    Start 1: dbcsr_perf:inputs/test_H2O.perf
    1/18 Test Delete COPYRIGHT as there is now LICENSE #1: dbcsr_perf:inputs/test_H2O.perf ....................... Passed 2.15 sec
    Start 2: dbcsr_perf:inputs/test_rect1_dense.perf
    2/18 Test Test dbcsr_performance_driver doesn't work for non-square matrices #2: dbcsr_perf:inputs/test_rect1_dense.perf ............... Passed 1.07 sec
    Start 3: dbcsr_perf:inputs/test_rect1_sparse.perf
    3/18 Test Update file headers #3: dbcsr_perf:inputs/test_rect1_sparse.perf .............. Passed 13.81 sec
    Start 4: dbcsr_perf:inputs/test_rect2_dense.perf
    4/18 Test Automate sync to CP2K SVN #4: dbcsr_perf:inputs/test_rect2_dense.perf ............... Passed 1.41 sec
    Start 5: dbcsr_perf:inputs/test_rect2_sparse.perf
    5/18 Test get automated coverage reports working #5: dbcsr_perf:inputs/test_rect2_sparse.perf .............. Passed 5.96 sec
    Start 6: dbcsr_perf:inputs/test_square_dense.perf
    6/18 Test Bug in Cannon #6: dbcsr_perf:inputs/test_square_dense.perf .............. Passed 0.67 sec
    Start 7: dbcsr_perf:inputs/test_square_sparse.perf
    7/18 Test Dev loc #7: dbcsr_perf:inputs/test_square_sparse.perf ............. Passed 2.68 sec
    Start 8: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
    8/18 Test File extensions in the Makefile #8: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ... Passed 6.05 sec
    Start 9: dbcsr_perf:inputs/test_square_sparse_rma.perf
    9/18 Test dbcsr_api_c.F: broken with gcc-4.8.5 due to OPTIONAL in BIND(C) procedure #9: dbcsr_perf:inputs/test_square_sparse_rma.perf ......... Passed 2.73 sec
    mpi mode--------------------------------------------------------------------
    Test project /home/soga/dbcsr-v2.0.0-rc7/build
    Start 1: dbcsr_perf:inputs/test_H2O.perf
    1/18 Test Delete COPYRIGHT as there is now LICENSE #1: dbcsr_perf:inputs/test_H2O.perf ....................... Passed 3.50 sec
    Start 2: dbcsr_perf:inputs/test_rect1_dense.perf
    2/18 Test Test dbcsr_performance_driver doesn't work for non-square matrices #2: dbcsr_perf:inputs/test_rect1_dense.perf ............... Passed 5.61 sec
    Start 3: dbcsr_perf:inputs/test_rect1_sparse.perf
    3/18 Test Update file headers #3: dbcsr_perf:inputs/test_rect1_sparse.perf .............. Passed 8.58 sec
    Start 4: dbcsr_perf:inputs/test_rect2_dense.perf
    4/18 Test Automate sync to CP2K SVN #4: dbcsr_perf:inputs/test_rect2_dense.perf ............... Passed 5.38 sec
    Start 5: dbcsr_perf:inputs/test_rect2_sparse.perf
    5/18 Test get automated coverage reports working #5: dbcsr_perf:inputs/test_rect2_sparse.perf .............. Passed 6.39 sec
    Start 6: dbcsr_perf:inputs/test_square_dense.perf
    6/18 Test Bug in Cannon #6: dbcsr_perf:inputs/test_square_dense.perf .............. Passed 3.67 sec
    Start 7: dbcsr_perf:inputs/test_square_sparse.perf
    7/18 Test Dev loc #7: dbcsr_perf:inputs/test_square_sparse.perf ............. Passed 4.81 sec
    Start 8: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
    8/18 Test File extensions in the Makefile #8: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ... Passed 8.55 sec

@alazzaro
Copy link
Member

For 1 node there is no need to use MPI with DBCSR and the performance is the same (as expected). At this point, my suspicious that OpenMPI is buggy with RMA seems the reason for your problem.
I will keep open this issue to check it better.

@alazzaro alazzaro added this to the v3.0 milestone Nov 11, 2019
@alazzaro alazzaro modified the milestones: v3.0, v2.1 Mar 6, 2020
@alazzaro
Copy link
Member

alazzaro commented Mar 6, 2020

Closing this issue, now cmake detects OpenMPI version and avoid to run RMA test if OpenMPI doesn't support it (#307 )

@alazzaro alazzaro closed this as completed Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants