Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test suite segfaults with gfortran 9.2.1 #25

Closed
mbanck opened this issue Sep 7, 2019 · 5 comments
Closed

Test suite segfaults with gfortran 9.2.1 #25

mbanck opened this issue Sep 7, 2019 · 5 comments

Comments

@mbanck
Copy link

mbanck commented Sep 7, 2019

Running the fast testsuite, all tests fail similarily in , like for t30:

[...]
 ETOT 13  -8.7264004259227    -1.599E-14 8.874E-25 9.869E-22
 prteigrs : about to open file t30o_EIG
 Fermi (or HOMO) energy (hartree) =   0.19736   Average Vxc (hartree)=  -0.35360
 Eigenvalues (hartree) for nkpt=   2  k points:
 kpt#   1, nband=  4, wtk=  0.25000, kpt=  0.2500  0.2500  0.2500 (reduced coord)
  -0.18526    0.07865    0.19736    0.19736
 kpt#   2, nband=  4, wtk=  0.75000, kpt=  0.2500  0.5000  0.5000 (reduced coord)
  -0.11309   -0.01075    0.09006    0.13464
 Fermi (or HOMO) energy (eV) =   5.37042   Average Vxc (eV)=  -9.62196
 Eigenvalues (   eV  ) for nkpt=   2  k points:
 kpt#   1, nband=  4, wtk=  0.25000, kpt=  0.2500  0.2500  0.2500 (reduced coord)
  -5.04123    2.14017    5.37042    5.37042
 kpt#   2, nband=  4, wtk=  0.75000, kpt=  0.2500  0.5000  0.5000 (reduced coord)
  -3.07722   -0.29262    2.45073    3.66369
 scprqt: <Vxc>= -3.5360049E-01 hartree

 At SCF step   13   max residual=  8.87E-25 < tolwfr=  1.00E-24 =>converged.

Program received signal SIGSEGV, Segmentation fault.
0x0000555556141276 in m_forces::forces (atindx1=..., diffor=0, dtefield=..., dtset=..., favg=..., fcart=..., fock=0x0, forold=..., fred=..., grchempottn=..., gresid=..., grewtn=..., grhf=..., grnl=..., 
    grvdw=..., grxc=..., gsqcut=0.4052847345693511, indsym=..., maxfor=0, mgfft=10, mpi_enreg=..., n1xccc=2501, n3xccc=1000, nattyp=..., nfft=1000, ngfft=..., ngrvdw=0, ntypat=1, pawrad=..., pawtab=..., 
    ph1d=..., psps=..., rhog=..., rhor=..., rprimd=..., symrec=..., synlgr=..., usefock=0, vresid=..., vxc=..., wvl=..., wvl_den=..., xred=..., electronpositron=0x0) at m_forces.F90:494
494	 if (usefock==1 .and. associated(fock).and.fock%fock_common%optfor) then
(gdb) bt full      
#0  0x0000555556141276 in m_forces::forces (atindx1=..., diffor=0, dtefield=..., dtset=..., favg=..., fcart=..., fock=0x0, forold=..., fred=..., grchempottn=..., gresid=..., grewtn=..., grhf=..., grnl=..., 
    grvdw=..., grxc=..., gsqcut=0.4052847345693511, indsym=..., maxfor=0, mgfft=10, mpi_enreg=..., n1xccc=2501, n3xccc=1000, nattyp=..., nfft=1000, ngfft=..., ngrvdw=0, ntypat=1, pawrad=..., pawtab=..., 
    ph1d=..., psps=..., rhog=..., rhor=..., rprimd=..., symrec=..., synlgr=..., usefock=0, vresid=..., vxc=..., wvl=..., wvl_den=..., xred=..., electronpositron=0x0) at m_forces.F90:494
        atmrho_dum = <not allocated>
        atmvloc_dum = <not allocated>
        calc_epaw3_forces = .FALSE.
        coredens_method = 2
        dummy6 = (0, 0, 0, 0, 0, 0)
        dyfrlo_dum = <not allocated>
        dyfrn_dum = <not allocated>
        dyfrv_dum = <not allocated>
        dyfrx2_dum = <not allocated>
        eei_dum1 = 0
        eei_dum2 = 9.1995023255640107e-321
        efield_flag = .FALSE.
        eltfrn_dum = <not allocated>
        ep3 = (4.6355756815252366e-310, 2.1219957913605248e-313, 4.0414569829813967e-321)
        epawf3red = <not allocated>
        fdir = -118248
        fin = (( 0, 0, 0) ( 0, 0, 0) )
        fioncart = (1.0185579797819065e-312, 6.9533558006145313e-310, 0)
        fionred = <not allocated>
        gauss_dum = <not allocated>
        gmet = (( 0.028481528154392026, -0.0094938427181306753, -0.0094938427181306753) ( -0.0094938427181306753, 0.028481528154392026, -0.0094938427181306736) ( -0.0094938427181306753, -0.0094938427181306736, 0.028481528154392026) )
        gprimd = (( -0.097436352138874097, 0.097436352138874097, 0.097436352138874097) ( 0.097436352138874097, -0.097436352138874097, 0.097436352138874097) ( 0.097436352138874097, 0.097436352138874097, -0.097436352138874097) )
        grl = (( -6.6613381477509392e-16, 2.2204460492503131e-16, 2.2204460492503131e-16) ( 8.8817841970012523e-16, -4.4408920985006262e-16, -4.4408920985006262e-16) )
        grnl_tmp = <not allocated>
        grtn = (( 1.5668782803424292e-05, 1.5668782823254329e-05, 1.5668782829866912e-05) ( -1.5668782758980094e-05, -1.5668782785622871e-05, -1.5668782779227677e-05) )
        grtn_indx = <not allocated>
        iatom = 3
        idir = -90200
        indx = 3
        ipositron = 21845
        is_hybrid_ncpp = .FALSE.
        itypat = 74
        mu = 1444228441
        optatm = 0
        optdyfr = 0
        opteltfr = 32767
        optgr = -146528
        option = 2
        optn = 1880027648
        optn2 = 32767
        optstr = -146528
        optv = -709758133
        qprtrb_dum = (-147118, 32767, 1880027648)
        rmet = (( 52.665713436049991, 26.332856718024996, 26.332856718024996) ( 26.332856718024996, 52.665713436049991, 26.332856718024996) ( 26.332856718024996, 26.332856718024996, 52.665713436049991) )
        strn_dummy6 = (-nan(0xfffffffffffff), 0, -nan(0xfffffffffffff), 3.9525251667299724e-323, 1.6360587548337309e-311, 6.9533485820324448e-310)
        strv_dummy6 = (2.1218986537067748e-314, 0, 6.9533557981212785e-310, 0, -nan(0xfffffffffffff), 0)
        tsec = (0, 0)
        ucvol = 270.25700511132948
        ucvol_local = 9.8813129168249309e-324
        v_dum = <not allocated>
        vloc_method = 2
        vol_element = 0
        vprtrb_dum = (6.9533522250891695e-310, 6.9533558005818242e-310)
        vxctotg = <not allocated>
        xccc3d_dum = <not allocated>
#1  0x0000555556153f75 in m_forstr::forstr (atindx1=..., cg=..., cprj=..., diffor=0, dtefield=..., dtset=..., eigen=..., electronpositron=0x0, energies=..., favg=..., fcart=..., fock=0x0, forold=..., 
    fred=..., grchempottn=..., gresid=..., grewtn=..., grhf=..., grvdw=..., grxc=..., gsqcut=0.4052847345693511, 
    indsym=<error reading variable: value requires 935902144 bytes, which is more than max-value-size>, kg=..., kxc=..., maxfor=0, mcg=296, mcprj=0, mgfftf=10, mpi_enreg=..., my_natom=2, n3xccc=1000, 
    nattyp=<error reading variable: value requires 4294606496 bytes, which is more than max-value-size>, nfftf=1000, ngfftf=..., ngrvdw=0, nhat=..., nkxc=0, npwarr=..., ntypat=1, nvresid=..., occ=..., 
    optfor=1, optres=0, paw_ij=..., pawang=..., pawfgr=..., pawfgrtab=..., pawrad=..., pawrhoij=..., pawtab=<error reading variable: value requires 1815569456 bytes, which is more than max-value-size>, 
    ph1d=<error reading variable: value requires 4292698624 bytes, which is more than max-value-size>, ph1df=..., psps=..., rhog=..., rhor=..., rprimd=..., stress_needed=1, strsxc=..., strten=..., symrec=..., 
    synlgr=..., ucvol=270.25700511132948, usecprj=0, vhartr=..., vpsp=..., vxc=..., wvl=..., xccc3d=..., xred=..., ylm=..., ylmgr=..., qvpotzero=0) at m_forstr.F90:485
[...]

It seems the fock pointer is invalid, though I am not sure whether that is the root of the problem:

(gdb) p fock
$1 = (PTR TO -> ( Type fock_type )) 0x0
(gdb) p *fock
Cannot access memory at address 0x58

The stacktrace as printed by the testsuite is as follows:

Backtrace for this error:
#0  0x7f0836659b90 in ???
#1  0x7f0836658dc5 in ???
#2  0x7f083631183f in ???
#3  0x55d9a4a23276 in __m_forces_MOD_forces
	at /<<PKGBUILDDIR>>/src/67_common/m_forces.F90:494
#4  0x55d9a4a35f74 in __m_forstr_MOD_forstr
	at /<<PKGBUILDDIR>>/src/67_common/m_forstr.F90:485
#5  0x55d9a40d6c81 in __m_afterscfloop_MOD_afterscfloop
	at /<<PKGBUILDDIR>>/src/94_scfcv/m_afterscfloop.F90:932
#6  0x55d9a40b9dbc in __m_scfcv_core_MOD_scfcv_core
	at /<<PKGBUILDDIR>>/src/94_scfcv/m_scfcv_core.F90:2108
#7  0x55d9a408f014 in scfcv_scfcv
	at /<<PKGBUILDDIR>>/src/94_scfcv/m_scfcv.F90:746
#8  0x55d9a408f5ef in __m_scfcv_MOD_scfcv_run
	at /<<PKGBUILDDIR>>/src/94_scfcv/m_scfcv.F90:536
#9  0x55d9a4077638 in __m_gstate_MOD_gstate
	at /<<PKGBUILDDIR>>/src/95_drive/m_gstate.F90:1330
#10  0x55d9a3e7eeb8 in __m_gstateimg_MOD_gstateimg
	at /<<PKGBUILDDIR>>/src/95_drive/m_gstateimg.F90:550
#11  0x55d9a3e5b142 in __m_driver_MOD_driver
	at /<<PKGBUILDDIR>>/src/95_drive/m_driver.F90:705
#12  0x55d9a3e49185 in abinit
	at /<<PKGBUILDDIR>>/src/98_main/abinit.F90:444
#13  0x55d9a3e4caaf in main
	at /<<PKGBUILDDIR>>/src/98_main/abinit.F90:94
Segmentation fault
@mbanck
Copy link
Author

mbanck commented Sep 7, 2019

If I comment out the if (usefock==1 .and. associated(fock).and.fock%fock_common%optstr) then and similar in m_forces.F90, m_forstr.F90 and m_stress.F90, the test case runs fine (but t08-t12 are still failing).

If I step through those functions in gdb, I clearly see usefock = 0 though, so I am a bit baffled why that is needed.

@mbanck
Copy link
Author

mbanck commented Sep 7, 2019

It runs fine with FCFLAGS=-ffree-line-length-none -g -O2 but I get the segfaults with FCFLAGS=-ffree-line-length-none -g

@mbanck
Copy link
Author

mbanck commented Sep 7, 2019

So it appears that fock%fock_common%optstr cannot be guaranteed not to be evaluated after associated(fock) is false as the Fortran language apparently does not define short-circuit beahviour, see https://www.scivision.dev/fortran-short-circuit-logic/

That explains why it works fine at -O2 (and would as well at -O1 in gfortran), but crashed in -O0.

I guess the fix would be to have two ifs here, like:

 if (usefock==1 .and. associated(fock)) then
   if (fock%fock_common%optfor) then
     grtn(:,:)=grtn(:,:)+fock%fock_common%forces(:,:)
   end if
 end if

@jmbeuken
Copy link
Contributor

jmbeuken commented Nov 1, 2019

Thank you for pointing out this problem.
We will integrate the correction in version 8.11.12.

In addition, how did you compile ABINIT ( which version? ) to "trigger" this error?
We have not detected this problem on our testfarm.
We compile ABINIT on a bot with gnu 9.2 and

FCFLAGS_EXTRA="-O2 -g -Wall -Wno-maybe-uninitialized -ffpe-trap=invalid,zero,overflow -fbacktrace -pedantic -fcheck=all"
thank you

@mbanck
Copy link
Author

mbanck commented Nov 2, 2019

You need to build it with -O0 in order to crash it, at -O2 GFortran decides to use the short-circuit behavior and runs fine.

The Debian packaging was using -O0 (at least for a subset that included the problematic source file) so we saw this crash. Why it did not crash in earlier versions (-O0 was set for a long time) I do not know, maybe the GFortran behavior has changed, so I could not find any substantive information on that when tried to research it back in September.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants