Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building CP2K with GCC 13.1.0 and MPICH 4.2.1 #2855

Open
mkrack opened this issue Jul 10, 2023 · 12 comments
Open

Building CP2K with GCC 13.1.0 and MPICH 4.2.1 #2855

mkrack opened this issue Jul 10, 2023 · 12 comments

Comments

@mkrack
Copy link
Member

mkrack commented Jul 10, 2023

CP2K builds fine with GCC 13.1.0 and MPICH 4.1.2. The following regression test is clean except for the following FE_DIVBYZERO error in grid_unittest

/home/krack/github/mkrack/cp2k/regtesting/Linux-gnu-x86_64/psmp/TEST-Linux-gnu-x86_64-psmp-2023-07-10_16-16-56/UNIT/grid_unittest/grid_unittest.out
Task: /home/krack/github/mkrack/cp2k/src/grid/sample_tasks/ortho_density_l0000.task   Integrate PGF-CPU   Cycles: 1.000000e+00   Max value: 1.181734e+03   Max rel diff: 1.924068e-16   Time: 4.307297e-05 sec
Error: Floating point exception FE_DIVBYZERO.
Task: /home/krack/github/mkrack/cp2k/src/grid/sample_tasks/ortho_density_l0000.task   Integrate PGF-CPU   Cycles: 1.000000e+00   Max value: 1.181734e+03   Max rel diff: 1.924068e-16   Time: 4.449592e-05 sec
Error: Floating point exception FE_DIVBYZERO.

The build is also successful with DO_CHECKS=yes (see here for the details concerning the additional compiler flags), but any run launched with mpiexec requesting more than one MPI ranks fails immediately

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x14e9017c0cff in ???
        at /usr/src/debug/glibc-2.31-150300.46.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#1  0x14e900b5194f in ???
#2  0x14e900b06c22 in ???
#3  0x14e900b01e3a in ???
#4  0x14e90150c861 in ???
#5  0x14e90150132a in ???
#6  0x14e9014f4326 in ???
#7  0x14e9022c4773 in MPII_hwtopo_init
        at src/util/mpir_hwtopo.c:206
#8  0x14e90223b394 in MPII_Init_thread
        at src/mpi/init/mpir_init.c:169
#9  0x14e901f871be in internal_Init_thread
        at src/binding/c/init/init_thread.c:54
#10  0x14e901f871be in PMPI_Init_thread
        at src/binding/c/init/init_thread.c:109
#11  0x14e904a342e9 in mpi_init_thread_f08_
        at src/binding/fortran/use_mpi_f08/wrappers_f/f08ts.f90:8487
#12  0x3e3048a in __message_passing_MOD_mp_world_init
        at /home/krack/github/mkrack/cp2k/src/mpiwrap/message_passing.F:1028
#13  0xf7b48b in __f77_interface_MOD_init_cp2k
        at /home/krack/github/mkrack/cp2k/src/f77_interface.F:234
#14  0x88ec34 in cp2k
        at /home/krack/github/mkrack/cp2k/src/start/cp2k.F:284
#15  0x53d148 in main
        at /home/krack/github/mkrack/cp2k/src/start/cp2k.F:44

Obviously, the thread initialisation causes already a SIGFPE with debug flags.
See also issue #1030

@fstein93
Copy link
Contributor

fstein93 commented Aug 14, 2023

Sorry for the late reply. I don't know what it could be. I cannot reproduce this issue with the old MPI interface+MPICH+gfortran13. What arch file did you use for compilation? Did you explicitly turn on the support for the mpi_f08 module in the arch file? I turned it off in case of MPICH in the toolchain due to some compiler bug in gfortran 11/12. I am currently recompiling and testing with MPICH.

@mkrack
Copy link
Member Author

mkrack commented Aug 14, 2023

Yes, I added -D__MPI_F08 in the arch file Linux-gnu-x86_64.psmp. So, I understand that this flag is not (yet) working with MPICH and GNU v11 to v13 because of compiler bugs.

@fstein93
Copy link
Contributor

I do not know the current state regarding GCC13+MPICH. I am currently trying it myself for the first time. Eventually, we have to open a ticket on MPICH's github repository. I just know that there is a compiler bug in GCC12 preventing the use of MPICH's .mpi_f08

@hfp
Copy link
Member

hfp commented Aug 14, 2023

I just know that there is a compiler bug in GCC12

One should avoid GCC 12.1 and question any distribution deploying it under "LTS".

@fstein93
Copy link
Contributor

I can run the unittests with GCC13 and MPICH 4.0.3 (toolchain) and -D__MPI_F08 set in the arch file. Because of my notebook, I have to configure MPICH with --with-pm=gforker and --with-device=ch3:sock.

@mkrack
Copy link
Member Author

mkrack commented Aug 14, 2023

Yes, MPICH 4.0.3 is working fine. The issue is about MPICH 4.1+ which requires obviously mpi_f08 for compilation. We will need mpi_f08 as default in the future to have CP2K working (compiling) with recent MPICH releases. It seems that MPICH has eventually dropped the transitional support (interfaces) for older MPI versions.

@alazzaro
Copy link
Member

This is a replica of what we did in DBCSR (cp2k/dbcsr#661) and @mkrack is definitely right (unless you want to use Cray pointers and support the old interface). You need F08 as default with MPICH 4.1.

@fstein93
Copy link
Contributor

Cray pointers are not standard compliant which is do not want to use them. Still, older versions of gcc are still around even at supercomputing centers. As such, Daint and potentially also other older supercomputers do not support mpi_f08 which is why I would not drop the support for the old interface yet which does not mean that I would keep that support forever. Related to the MPI interface is also support for later standards (F2008+TS or F2018) which we could push forward. I suggest to discuss this issue at the developers' meeting.

@fstein93
Copy link
Contributor

fstein93 commented Aug 14, 2023

Still, we should consult the MPICH developers what to do about the issue @mkrack found with their library.

@mkrack
Copy link
Member Author

mkrack commented Aug 14, 2023

We could make mpi_f08 the default for MPICH > v4.0.

@hfp
Copy link
Member

hfp commented Aug 14, 2023

Still, older versions of gcc are still around even at supercomputing centers.

Very true, in particular with RHEL's disruptive change, many small clusters are not only stuck with RHEL 7.x (or 8.x) but cannot decide where to go. This software base is actually aging rapidly and becomes an increasing issue for maintaining the "status quo" like "works".

We could make mpi_f08 the default for MPICH > v4.0.

Sounds like best effort and the right way to go. If any infrastructure is missing to detect this case at source-level (not just toolchain), it sounds like worth doing it.

@alazzaro
Copy link
Member

alazzaro commented Aug 14, 2023

Let me clarify @fstein93 . As we did in DBCSR, MPI F08 is only needed for MPICH 4.1 (and again, I agree with @mkrack 's suggestion). Never said to remove the old interface, neither to use Cray pointers (it was under parenthesis as something that I will not do, we never did it in the past). Please note that MPICH is now fully compliant with the standard, but it is something you can ask.

For instance, on the same lines: pmodels/mpich#2659

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants