-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimental support for DLA-Future eigensolver #3143
Conversation
Woohoo, it's awesome to see DLA-F on the finish line :-) Do you think it would be possible to also add an installer script for the toolchain? I know it's tedious, but without it we can't test it. Regarding the performance we found that a lot of time is also spend in |
I never used the toolchain beforehand (Spack + CMake works great ;) ), so I'll have to look into it. Before spending time on this, could you please share what are the long-term plans for the toolchain? Is https://github.com/cp2k/cp2k-toolchain still relevant? As an alternative, would it be possible to add a CI build based on #3075?
That's good to know, thanks for the information. Version |
Long-term I really hope that we can ditch the toolchain in favor of Spack. However, we're not there yet. For example, building the dependencies with Spack takes currently twice as long as with the toolchain.
I'm working on that and might have something in a few days. However, even then you'll be missing out one all the other test configurations that we run for the dashboard.
No, it's not. Thanks for the reminder. I just took it private. Likewise, I'm planing to remove the tools/toolchain-spack directory.
Cool! It'll require a bit of refactoring on CP2K's side as well. I'm not sure yet when I'll get to it though. |
Probably a discussion for somewhere else, but is there any way we can help accelerate this process?
That's awesome! I'd suggest to use this when it's ready, to avoid the PR sitting here for long. Supporting DLA-Future via the toolchain would indeed be quite tedious and if the toolchain is bound to go soon(-ish) anyways, it's probably not time well spent. Do you think this would be a reasonable way forward? If really needed, we can eventually think about toolchain support later on. |
Yes, thanks for asking :-) Let's discuss offline.
I've just merged #3149, which migrates our first CI test onto Spack. |
acaa7a4
to
6e1cc33
Compare
@oschuett it is not obvious to me why the |
I've run into those segfaults from the CMake test before. The |
Thanks for the additional information. I was indeed trying to figure that out, since using MPICH 4.1 installed with Spack did work. Do you prefer to have MPICH installed with |
If possible I'd like to use the |
One can now browse the full output of failed builds from the artifacts: make.out. Turns out the actual problem is:
Unfortunately, I created a merge conflict in the process. |
Thanks @oschuett! (No worries at all for the conflict.) |
* DLA-Future (#1) * More cmake fixes Signed-off-by: Dr. Mathieu Taillefumier <mathieu.taillefumier@free.fr> * Add dla-future support for cholesky Signed-off-by: Mathieu Taillefumier <mathieu.taillefumier@free.fr> * More work on cmake * New Keyword: MIN_PAIR_LIST_RADIUS can be used to force non-zero blocks (cp2k#2511) in KS and Overlap matrix. * Regtesting: Fix DeprecationWarning * Docker: Do not flag slow tests in minimal build * PAO: Add equivariant parametrization * PAO: Use correct distribution in pao_calc_AB_equi * PAO: Add missing mp_sum to pao_calc_AB_equi * Regtesting: Fix issue with Python 3.6 * Consider target CPU in OpenBLAS toolchain build This should partially fix cp2k#2517 * Update track imag density (#6) * PAO: Fix unused parameter and CMake build * Keep compatibility with bash v3 * Add info for sourceable arch files and links to HowTos * Refactor allocations of imaginary parts for RTP (cp2k#2531) * unify allocation of imaginary parts of matrix_h_im and matrix_ks_im with their real counterparts * refactor allocations of imaginary parts for RTP * clean-up * More work on cmake * FIST: Add NequIP equivariant neural network potentials * FIST: Fix CMake build and conventions for NequIP * Remove unused parameters * Remove unused label from beta_gamma_psi.F * Toolchain: Add -Werror=unused-parameter and -Werror=unused-label * Docker: Fix typo in cmake variable and add check for warnings * Regtesting: Introduce tests/UNIT_TESTS file * Let nequip_unittest fail when libtorch is missing * Remove some more unused parameters * Regtesting: Add Valgrind option * Fix various simple Valgrind issues * update keyword descriptions for RTP * fix strings * Fix typo in arch/Linux-intel-x86_64.psmp Shared part is not really tested afaict, but for the sake of completeness let's fix this typo. * Add more target mappings for gcc to OpenBLAS * Fix funky libxsmm library order * fix libxc * cleanup dlaf cholesky * dirty water: dirty version working for H2O-32 * dlaf * make preffiy * Revert "make preffiy" This reverts commit d27b375. * revert fftw3 change * add some dlaf output * remove copies using new eigensolver api * port @msimberg improvements * clang formatting * add improvements suggested by @msimberg * rename cholesky_dlaf to dlaf * remove single-threades scope from cp2k, it is now in dlaf * general cleanup * clang format * further cleanup * remove unused single precision eigensolver * remove MKL-specific single threaded blas/lapack scope * use new dlaf C API * fix dlaf calls * improve cuSOLVER-stype integration * make cp2k compilable without dlaf * print pika binding * refactoring and cleanup * remove local grid creation/free * zero eigenvalues buffer * cleanup * add missing licence from c file * refactoring * cholesky wrapper * cholesky wrapper * add scalapack fallback and falback parameter dlaf_neigvec_min * make pretty * add unit test for dlaf * remove spurious file * fix usage of CP2K as a library * remove initialization and pika threads * add dlaf timing without upper to full * actually fix mpi * remove string utilities * prettify * Revert "prettify" This reverts commit cd1e89a. * cleanup: * remove redistribution * manual cleanup * prettify cmake * make pretty exluding dlaf.F * add some details to install and __DLAF so that make pretty does not complain * prettify install * more consistent cabort * remove intrinsic * Revert "remove intrinsic" This reverts commit 2458e00. * format doc * cleanup * revert some changes to MPI init * cleanup * prettify * remove redistribution * pretty-test * pretty-test * pretty-test * pretty-test * pretty-test * pretty-test * pretty-test * Revert "pretty-test" This reverts commit f7fd23c. * Revert "Revert "pretty-test"" This reverts commit b9a9879. * Revert "pretty-test" This reverts commit d582f3f. * pretty-test * pretty-test * pretty-test * pretty-test * Revert "pretty-test" This reverts commit b88b373. * pretty-test * remove snippet * add timers -- workaround for cp2k#3071 * prettify * doc cleanup * Update src/fm/cp_dlaf_utils.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * Update src/fm/cp_fm_dlaf_api.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * Update src/fm/cp_dlaf_utils.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * Update src/fm/cp_dlaf_utils.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * Update src/fm/cp_dlaf_utils.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * remove ugly workaround * cmake config * line wrap install * Update src/input_cp2k_global.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * gitignore * add guards * remove pika:print-bind in favour of environment variable * __dlaf * make pretty * Update src/fm/cp_fm_dlaf_api.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * actually fix cmake * fix cmake conflicts * Update src/fm/cp_dlaf_utils.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * Update src/fm/cp_fm_dlaf_api.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * Update src/fm/cp_dlaf_utils.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * Update src/fm/cp_dlaf_utils.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * Update src/fm/cp_dlaf_utils.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * remove ugly workaround * cmake config * line wrap install * Update src/input_cp2k_global.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * gitignore * add guards * remove pika:print-bind in favour of environment variable * __dlaf * make pretty * Update src/fm/cp_fm_dlaf_api.F Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> * actually fix cmake * make pretty * Revert "MPI: Specify imported routines and constants with MPI_F08" This reverts commit aecd330. * Revert "Revert "MPI: Specify imported routines and constants with MPI_F08"" This reverts commit f224465. * fix MPI * initialize HIP/CUDA before MPI * initialize HIP/CUDA before MPI * better mpi init --------- Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
Unfortunately, after some testing, both |
In the toolchain we recently switched from |
src/fm/cp_dlaf_utils.c
Outdated
} | ||
|
||
// Cholesky decomposition (double) | ||
// Wrapper without uplo parameter, avoids passing characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing strings to C is actually rather straightforward. Maybe you don't need these wrappers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings (char*
) yes, but I was having issues with passing single characters (I'm not a Fortran expert and all the information I could find was for strings...). The DLAF C API requires a single character, not a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DLAF C API requires a single character, not a string.
The Fortran type for a single character is CHARACTER
- note the absence of the len=
argument.
The C bindings for passing a single character as value is:
CHARACTER(kind=C_CHAR), VALUE :: uplo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I used, and did not seem work: https://github.com/RMeli/fortran-playground/blob/b84b82b7879471f90d4ba6aea36cab2b699c01ed/fortran_to_c_char/bar.F#L22-L31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC pdpotrf
is only called with 'U'
anyways, that's why I went with this easier solution instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I used, and did not seem work:
Fascinating! This feels like a bug.
I eventually got it to work by passing an integer, which should technically not be a hack.
subroutine foo_char_rubbish(aa)
character :: aa
interface
subroutine foo1_c(aaa) bind(C, name="foo1")
import:: C_SIGNED_CHAR
INTEGER(kind=C_SIGNED_CHAR), value :: aaa
end subroutine
end interface
call foo1_c(IACHAR(aa, C_SIGNED_CHAR))
end subroutine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice workaround. I added it to the reproducer for my future self. ;)
Do you prefer to have this or what I did so far, given that only 'U'
is used anyways? We can always come back to it if 'L'
is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's always good to remove unnecessary code. So, personally I'd remove cp_dlaf_utils.c
. Furthermore, I'd merge cp_dlaf_utils_api.F
into cp_fm_dlaf_api.F
to have that single interface module.
However, since this affects only the DLAF code, I'll leave it up to you.
Otherwise this PR looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used your workaround (thanks!) to remove the wrapper functions. I also removed the .c
file as a whole, as requested. I'm not entirely convinced that passing char**
directly from Fortran is better than a simple wrapper in this case, but I think it's alright.
The separation of the two .F
is to avoid circular dependencies.
src/fm/cp_dlaf_utils.c
Outdated
} | ||
|
||
// Cholesky decomposition (double) | ||
// Wrapper without uplo parameter, avoids passing characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's always good to remove unnecessary code. So, personally I'd remove cp_dlaf_utils.c
. Furthermore, I'd merge cp_dlaf_utils_api.F
into cp_fm_dlaf_api.F
to have that single interface module.
However, since this affects only the DLAF code, I'll leave it up to you.
Otherwise this PR looks good to me.
This PR introduces DLA-Future, a task-based GPU-enabled dense linear algebra library based on pika (a C++ library for concurrency and parallelism), with focus on eigenvalue problems.
The library is still in development, with focus on performance improvements. Therefore, the integration should be considered experimental.
Correctness
Modification to CP2K to use DLA-Future for all matrices of size 3x3 and larger while running the regression tests:
CP2K regression tests:
Results:
The one failed test is
QS/regtest-pao-2/H2O_pao_equi.inp
but it appears to be a sporadic failure. Re running with--restrictdir QS/regtest-pao-2
concluded successfully. This particular problem is tracked in RMeli#4, and will be further investigated.Performance
Performance optimisation of DLA-Future is still an ongoing effort and the library performance will improve with time. For this PR, I benchmarked version
0.3.1
of the library (with pika version0.20.0
). I added comparisons with ELPA and ScaLAPACK, when available.Benchmark results are reported for Daint (multi-core:
daint-mc
, P100:daint-gpu
) and Alps (multi-core:clariden-mc
, A100:clariden-nvgpu
, Mi200:clariden-amdgpu
).Benchmarks:
QS/H2O-512.inp
andQS/H2O-1024.inp
, modified to use standard diagonalization.Daint
Missing points are due to not enough memory being available for the particular configuration.
MC
GPU
Alps (Clariden)
Missing points are due to MPI (
cray-mpich
) issues we are encountering with large matrices on multiple nodes. We have seen the same issues on LUMI (with the DLA-Future miniapps) and we are currently investigating the issue. Unfortunately we don't have further details at this time.Despite these issues, the experimental integration will facilitate further testing and development, as well as early feedback from users.
AMD GPUs
NVidia GPUs
Spack
Support for
cp2k@master +dlaf build_system=cmake
is already included in Spack (see spack/spack#40702).Locally, I used this Spack environment to build CP2K and run the regression tests:
Credits
While not reflected in the commit history, @mtaillefumier and @msimberg contributed to this PR, especially with early integration of the Cholesky decomposition and general comments/suggestions/reviews.