Skip to content

test failures with POWER10 kernel and GCC 16 #5728

@sharkcz

Description

@sharkcz

I am seeing test failures with POWER10 kernel built with GCC 16 (gcc-16.0.1-0.10.fc45.ppc64le) and run on Power10 hardware. The tests pass when built with GCC 15 on the same hardware. Based on the previous experiences I would guess GCC 16 became stricter (or more advanced) again and the inline assembly code in the Power10 kernel isn't fully valid any more.

...
gfortran  -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize  -o dblat3 dblat3.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib  -latomic_asneeded -lc 
gfortran  -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize  -o cblat2 cblat2.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib  -latomic_asneeded -lc 
gfortran  -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize  -o zblat2 zblat2.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib  -latomic_asneeded -lc 
rm -f ?BLAT2.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_bgemv > BBLAT2.SUMM
gfortran  -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize  -o cblat3 cblat3.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib  -latomic_asneeded -lc 
gfortran  -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp  -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize  -o zblat3 zblat3.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib  -latomic_asneeded -lc 
rm -f ?BLAT3.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_sbgemm > SBBLAT3.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_bgemm > BBLAT3.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_sbgemv > SBBLAT2.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat3 < ./sblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat3 < ./dblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat3 < ./cblat3.dat
 TESTS OF THE COMPLEX          LEVEL 3 BLAS

 THE FOLLOWING PARAMETER VALUES WILL BE USED:
   FOR N                   0     1     2     3     7    31
   FOR ALPHA          ( 0.0, 0.0)  ( 1.0, 0.0)  ( 0.7,-0.9)  
   FOR BETA           ( 0.0, 0.0)  ( 1.0, 0.0)  ( 1.3,-1.1)  

 ERROR-EXITS WILL NOT BE TESTED

 ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00

 RELATIVE MACHINE PRECISION IS TAKEN TO BE  1.2E-07

 CGEMM  PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)

 CHEMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
                       EXPECTED RESULT                    COMPUTED RESULT
       1  (    1.57757    ,  -0.324314    )  (    1.57757    ,  -0.324314    )
       2  (  -0.149664    ,   0.581641    )  (  -0.149664    ,   0.581641    )
       3  (  -0.748555    ,   -1.09547    )  (  -0.748555    ,   -1.09547    )
       4  (  -0.604366    ,  -0.895836    )  (  -0.604366    ,  -0.895836    )
       5  (  -0.650925    ,   0.394394    )  (  -0.650925    ,   0.394394    )
       6  (  -0.465727    ,   0.842006    )  (  -0.465727    ,   0.842006    )
       7  (   0.420629    ,   0.597693    )  (  -0.587136E-01,  -0.543813    )
       8  (   0.786457    ,   0.544220E-01)  (  -0.138154E-01,   0.184827    )
       9  (   0.167691    ,   0.207608    )  (   0.167691    ,   0.207608    )
      10  (  -0.321436    ,  -0.667076    )  (  -0.321436    ,  -0.667076    )
      11  (  -0.303583    ,  -0.249012E-01)  (  -0.303584    ,  -0.249011E-01)
      12  (   -1.20584    ,   0.376045    )  (   -1.20584    ,   0.376044    )
      13  (   0.280570    ,   0.680643    )  (   0.280570    ,   0.680643    )
      14  (    1.11913    ,   0.831795    )  (    1.11913    ,   0.831795    )
      15  (  -0.445470    ,   -1.08482    )  (  -0.743962    ,   0.729312    )
      16  (  -0.425975    ,  -0.378074    )  (  -0.964980E-01,  -0.319019    )
      17  (  -0.740210    ,   -1.03159    )  (  -0.740210    ,   -1.03159    )
      18  (    1.00878    ,   0.580040    )  (    1.00878    ,   0.580040    )
      19  (   0.123999    ,  -0.418330    )  (   0.123999    ,  -0.418330    )
      20  (  -0.207821    ,  -0.467468    )  (  -0.207821    ,  -0.467468    )
      21  (  -0.471160    ,   -1.47356    )  (  -0.471160    ,   -1.47356    )
      22  (  -0.329621    ,   0.782363    )  (  -0.329621    ,   0.782364    )
      23  (  -0.248915    ,   0.671276    )  (   0.515318    ,  -0.225023    )
      24  (  -0.154857    ,  -0.108282    )  (  -0.479790E-01,   0.823263E-01)
      25  (   0.327719    ,  -0.149753    )  (   0.327719    ,  -0.149753    )
      26  (   0.104212    ,   0.378216    )  (   0.104212    ,   0.378216    )
      27  (   0.111354    ,  -0.524580E-01)  (   0.111354    ,  -0.524580E-01)
      28  (   0.301476    ,   0.218972E-01)  (   0.301476    ,   0.218972E-01)
      29  (  -0.185482    ,   0.210484    )  (  -0.185482    ,   0.210484    )
      30  (   0.535875    ,   0.368959    )  (   0.535875    ,   0.368959    )
      31  (   0.969031E-01,   0.298701    )  (   0.969031E-01,   0.298701    )
      THESE ARE THE RESULTS FOR COLUMN   1
 ******* CTRMM  FAILED ON CALL NUMBER:
   2450: CTRMM ('L','U','N','U', 31,  7,( 1.0, 0.0), A, 32, B, 32)               .

 CTRSM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)

 CHERK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYRK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CHER2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYR2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 END OF TESTS
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat3 < ./zblat3.dat
 TESTS OF THE COMPLEX*16       LEVEL 3 BLAS

 THE FOLLOWING PARAMETER VALUES WILL BE USED:
   FOR N                   0     1     2     3     7    31
   FOR ALPHA          ( 0.0, 0.0)  ( 1.0, 0.0)  ( 0.7,-0.9)  
   FOR BETA           ( 0.0, 0.0)  ( 1.0, 0.0)  ( 1.3,-1.1)  

 ERROR-EXITS WILL NOT BE TESTED

 ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00

 RELATIVE MACHINE PRECISION IS TAKEN TO BE  2.2D-16

 ZGEMM  PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)

 ZHEMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ZSYMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
                       EXPECTED RESULT                    COMPUTED RESULT
       1  (  -0.803402E-01,   0.421751    )  (  -0.803402E-01,   0.421751    )
       2  (   0.691964    ,   0.209721    )  (   0.691964    ,   0.209721    )
       3  (   0.553420    ,  -0.312582    )  (   0.440480    ,  -0.729041E-02)
       4  (   0.283286    ,  -0.145302    )  (   0.153001    ,   0.189155    )
       5  (  -0.816776E-01,  -0.546559    )  (  -0.816776E-01,  -0.546559    )
       6  (  -0.270234    ,   0.120707    )  (  -0.270234    ,   0.120707    )
       7  (   0.106893    ,   0.242757    )  (   0.106893    ,   0.242757    )
 ******* ZTRMM  FAILED ON CALL NUMBER:
   1802: ZTRMM ('L','U','N','U',  7,  1,( 1.0, 0.0), A,  8, B,  8)               .

 ZTRSM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)

 ZHERK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ZSYRK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ZHER2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ZSYR2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 END OF TESTS
rm -f ?BLAT3.SUMM
OMP_NUM_THREADS=2 ./test_sbgemm > SBBLAT3.SUMM
SBGEMV FAILURES: 705118
make[1]: *** [Makefile:149: level2] Error 1
make[1]: *** Waiting for unfinished jobs....
OMP_NUM_THREADS=2 ./test_bgemm > BBLAT3.SUMM
OMP_NUM_THREADS=2 ./sblat3 < ./sblat3.dat
OMP_NUM_THREADS=2 ./dblat3 < ./dblat3.dat
OMP_NUM_THREADS=2 ./cblat3 < ./cblat3.dat
 TESTS OF THE COMPLEX          LEVEL 3 BLAS

 THE FOLLOWING PARAMETER VALUES WILL BE USED:
   FOR N                   0     1     2     3     7    31
   FOR ALPHA          ( 0.0, 0.0)  ( 1.0, 0.0)  ( 0.7,-0.9)  
   FOR BETA           ( 0.0, 0.0)  ( 1.0, 0.0)  ( 1.3,-1.1)  

 ERROR-EXITS WILL NOT BE TESTED

 ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00

 RELATIVE MACHINE PRECISION IS TAKEN TO BE  1.2E-07

 CGEMM  PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)

 CHEMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
                       EXPECTED RESULT                    COMPUTED RESULT
       1  (    1.57757    ,  -0.324314    )  (    1.57757    ,  -0.324314    )
       2  (  -0.149664    ,   0.581641    )  (  -0.149664    ,   0.581641    )
       3  (  -0.748555    ,   -1.09547    )  (  -0.748555    ,   -1.09547    )
       4  (  -0.604366    ,  -0.895836    )  (  -0.604366    ,  -0.895836    )
       5  (  -0.650925    ,   0.394394    )  (  -0.650925    ,   0.394394    )
       6  (  -0.465727    ,   0.842006    )  (  -0.465727    ,   0.842006    )
       7  (   0.420629    ,   0.597693    )  (  -0.587136E-01,  -0.543813    )
       8  (   0.786457    ,   0.544220E-01)  (  -0.138154E-01,   0.184827    )
       9  (   0.167691    ,   0.207608    )  (   0.167691    ,   0.207608    )
      10  (  -0.321436    ,  -0.667076    )  (  -0.321436    ,  -0.667076    )
      11  (  -0.303583    ,  -0.249012E-01)  (  -0.303584    ,  -0.249011E-01)
      12  (   -1.20584    ,   0.376045    )  (   -1.20584    ,   0.376044    )
      13  (   0.280570    ,   0.680643    )  (   0.280570    ,   0.680643    )
      14  (    1.11913    ,   0.831795    )  (    1.11913    ,   0.831795    )
      15  (  -0.445470    ,   -1.08482    )  (  -0.743962    ,   0.729312    )
      16  (  -0.425975    ,  -0.378074    )  (  -0.964980E-01,  -0.319019    )
      17  (  -0.740210    ,   -1.03159    )  (  -0.740210    ,   -1.03159    )
      18  (    1.00878    ,   0.580040    )  (    1.00878    ,   0.580040    )
      19  (   0.123999    ,  -0.418330    )  (   0.123999    ,  -0.418330    )
      20  (  -0.207821    ,  -0.467468    )  (  -0.207821    ,  -0.467468    )
      21  (  -0.471160    ,   -1.47356    )  (  -0.471160    ,   -1.47356    )
      22  (  -0.329621    ,   0.782363    )  (  -0.329621    ,   0.782364    )
      23  (  -0.248915    ,   0.671276    )  (   0.515318    ,  -0.225023    )
      24  (  -0.154857    ,  -0.108282    )  (  -0.479790E-01,   0.823263E-01)
      25  (   0.327719    ,  -0.149753    )  (   0.327719    ,  -0.149753    )
      26  (   0.104212    ,   0.378216    )  (   0.104212    ,   0.378216    )
      27  (   0.111354    ,  -0.524580E-01)  (   0.111354    ,  -0.524580E-01)
      28  (   0.301476    ,   0.218972E-01)  (   0.301476    ,   0.218972E-01)
      29  (  -0.185482    ,   0.210484    )  (  -0.185482    ,   0.210484    )
      30  (   0.535875    ,   0.368959    )  (   0.535875    ,   0.368959    )
      31  (   0.969031E-01,   0.298701    )  (   0.969031E-01,   0.298701    )
      THESE ARE THE RESULTS FOR COLUMN   1
 ******* CTRMM  FAILED ON CALL NUMBER:
   2450: CTRMM ('L','U','N','U', 31,  7,( 1.0, 0.0), A, 32, B, 32)               .

 CTRSM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)

 CHERK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYRK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CHER2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYR2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 END OF TESTS
OMP_NUM_THREADS=2 ./zblat3 < ./zblat3.dat
 TESTS OF THE COMPLEX*16       LEVEL 3 BLAS

 THE FOLLOWING PARAMETER VALUES WILL BE USED:
   FOR N                   0     1     2     3     7    31
   FOR ALPHA          ( 0.0, 0.0)  ( 1.0, 0.0)  ( 0.7,-0.9)  
   FOR BETA           ( 0.0, 0.0)  ( 1.0, 0.0)  ( 1.3,-1.1)  

 ERROR-EXITS WILL NOT BE TESTED

 ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00

 RELATIVE MACHINE PRECISION IS TAKEN TO BE  2.2D-16

 ZGEMM  PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)

 ZHEMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ZSYMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
                       EXPECTED RESULT                    COMPUTED RESULT
       1  (  -0.803402E-01,   0.421751    )  (  -0.803402E-01,   0.421751    )
       2  (   0.691964    ,   0.209721    )  (   0.691964    ,   0.209721    )
       3  (   0.553420    ,  -0.312582    )  (   0.440480    ,  -0.729041E-02)
       4  (   0.283286    ,  -0.145302    )  (   0.153001    ,   0.189155    )
       5  (  -0.816776E-01,  -0.546559    )  (  -0.816776E-01,  -0.546559    )
       6  (  -0.270234    ,   0.120707    )  (  -0.270234    ,   0.120707    )
       7  (   0.106893    ,   0.242757    )  (   0.106893    ,   0.242757    )
 ******* ZTRMM  FAILED ON CALL NUMBER:
   1802: ZTRMM ('L','U','N','U',  7,  1,( 1.0, 0.0), A,  8, B,  8)               .

 ZTRSM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)

 ZHERK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ZSYRK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ZHER2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 ZSYR2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 END OF TESTS
make[1]: Leaving directory '/root/projects/OpenBLAS/test'
make: *** [Makefile:176: tests] Error 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Bug in other softwareCompiler, Virtual Machine, etc. bug affecting OpenBLASDistribution packaging problemThird party package incompatibilities, inappropriate build flags or unmet dependencies etc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions