-
Notifications
You must be signed in to change notification settings - Fork 1.7k
test failures with POWER10 kernel and GCC 16 #5728
Copy link
Copy link
Closed as not planned
Labels
Bug in other softwareCompiler, Virtual Machine, etc. bug affecting OpenBLASCompiler, Virtual Machine, etc. bug affecting OpenBLASDistribution packaging problemThird party package incompatibilities, inappropriate build flags or unmet dependencies etcThird party package incompatibilities, inappropriate build flags or unmet dependencies etc
Description
I am seeing test failures with POWER10 kernel built with GCC 16 (gcc-16.0.1-0.10.fc45.ppc64le) and run on Power10 hardware. The tests pass when built with GCC 15 on the same hardware. Based on the previous experiences I would guess GCC 16 became stricter (or more advanced) again and the inline assembly code in the Power10 kernel isn't fully valid any more.
...
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o dblat3 dblat3.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o cblat2 cblat2.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o zblat2 zblat2.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
rm -f ?BLAT2.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_bgemv > BBLAT2.SUMM
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o cblat3 cblat3.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o zblat3 zblat3.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
rm -f ?BLAT3.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_sbgemm > SBBLAT3.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_bgemm > BBLAT3.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_sbgemv > SBBLAT2.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat3 < ./sblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat3 < ./dblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat3 < ./cblat3.dat
TESTS OF THE COMPLEX LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 1.2E-07
CGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
CHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( 1.57757 , -0.324314 ) ( 1.57757 , -0.324314 )
2 ( -0.149664 , 0.581641 ) ( -0.149664 , 0.581641 )
3 ( -0.748555 , -1.09547 ) ( -0.748555 , -1.09547 )
4 ( -0.604366 , -0.895836 ) ( -0.604366 , -0.895836 )
5 ( -0.650925 , 0.394394 ) ( -0.650925 , 0.394394 )
6 ( -0.465727 , 0.842006 ) ( -0.465727 , 0.842006 )
7 ( 0.420629 , 0.597693 ) ( -0.587136E-01, -0.543813 )
8 ( 0.786457 , 0.544220E-01) ( -0.138154E-01, 0.184827 )
9 ( 0.167691 , 0.207608 ) ( 0.167691 , 0.207608 )
10 ( -0.321436 , -0.667076 ) ( -0.321436 , -0.667076 )
11 ( -0.303583 , -0.249012E-01) ( -0.303584 , -0.249011E-01)
12 ( -1.20584 , 0.376045 ) ( -1.20584 , 0.376044 )
13 ( 0.280570 , 0.680643 ) ( 0.280570 , 0.680643 )
14 ( 1.11913 , 0.831795 ) ( 1.11913 , 0.831795 )
15 ( -0.445470 , -1.08482 ) ( -0.743962 , 0.729312 )
16 ( -0.425975 , -0.378074 ) ( -0.964980E-01, -0.319019 )
17 ( -0.740210 , -1.03159 ) ( -0.740210 , -1.03159 )
18 ( 1.00878 , 0.580040 ) ( 1.00878 , 0.580040 )
19 ( 0.123999 , -0.418330 ) ( 0.123999 , -0.418330 )
20 ( -0.207821 , -0.467468 ) ( -0.207821 , -0.467468 )
21 ( -0.471160 , -1.47356 ) ( -0.471160 , -1.47356 )
22 ( -0.329621 , 0.782363 ) ( -0.329621 , 0.782364 )
23 ( -0.248915 , 0.671276 ) ( 0.515318 , -0.225023 )
24 ( -0.154857 , -0.108282 ) ( -0.479790E-01, 0.823263E-01)
25 ( 0.327719 , -0.149753 ) ( 0.327719 , -0.149753 )
26 ( 0.104212 , 0.378216 ) ( 0.104212 , 0.378216 )
27 ( 0.111354 , -0.524580E-01) ( 0.111354 , -0.524580E-01)
28 ( 0.301476 , 0.218972E-01) ( 0.301476 , 0.218972E-01)
29 ( -0.185482 , 0.210484 ) ( -0.185482 , 0.210484 )
30 ( 0.535875 , 0.368959 ) ( 0.535875 , 0.368959 )
31 ( 0.969031E-01, 0.298701 ) ( 0.969031E-01, 0.298701 )
THESE ARE THE RESULTS FOR COLUMN 1
******* CTRMM FAILED ON CALL NUMBER:
2450: CTRMM ('L','U','N','U', 31, 7,( 1.0, 0.0), A, 32, B, 32) .
CTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
CHERK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat3 < ./zblat3.dat
TESTS OF THE COMPLEX*16 LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 2.2D-16
ZGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
ZHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( -0.803402E-01, 0.421751 ) ( -0.803402E-01, 0.421751 )
2 ( 0.691964 , 0.209721 ) ( 0.691964 , 0.209721 )
3 ( 0.553420 , -0.312582 ) ( 0.440480 , -0.729041E-02)
4 ( 0.283286 , -0.145302 ) ( 0.153001 , 0.189155 )
5 ( -0.816776E-01, -0.546559 ) ( -0.816776E-01, -0.546559 )
6 ( -0.270234 , 0.120707 ) ( -0.270234 , 0.120707 )
7 ( 0.106893 , 0.242757 ) ( 0.106893 , 0.242757 )
******* ZTRMM FAILED ON CALL NUMBER:
1802: ZTRMM ('L','U','N','U', 7, 1,( 1.0, 0.0), A, 8, B, 8) .
ZTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
ZHERK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
rm -f ?BLAT3.SUMM
OMP_NUM_THREADS=2 ./test_sbgemm > SBBLAT3.SUMM
SBGEMV FAILURES: 705118
make[1]: *** [Makefile:149: level2] Error 1
make[1]: *** Waiting for unfinished jobs....
OMP_NUM_THREADS=2 ./test_bgemm > BBLAT3.SUMM
OMP_NUM_THREADS=2 ./sblat3 < ./sblat3.dat
OMP_NUM_THREADS=2 ./dblat3 < ./dblat3.dat
OMP_NUM_THREADS=2 ./cblat3 < ./cblat3.dat
TESTS OF THE COMPLEX LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 1.2E-07
CGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
CHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( 1.57757 , -0.324314 ) ( 1.57757 , -0.324314 )
2 ( -0.149664 , 0.581641 ) ( -0.149664 , 0.581641 )
3 ( -0.748555 , -1.09547 ) ( -0.748555 , -1.09547 )
4 ( -0.604366 , -0.895836 ) ( -0.604366 , -0.895836 )
5 ( -0.650925 , 0.394394 ) ( -0.650925 , 0.394394 )
6 ( -0.465727 , 0.842006 ) ( -0.465727 , 0.842006 )
7 ( 0.420629 , 0.597693 ) ( -0.587136E-01, -0.543813 )
8 ( 0.786457 , 0.544220E-01) ( -0.138154E-01, 0.184827 )
9 ( 0.167691 , 0.207608 ) ( 0.167691 , 0.207608 )
10 ( -0.321436 , -0.667076 ) ( -0.321436 , -0.667076 )
11 ( -0.303583 , -0.249012E-01) ( -0.303584 , -0.249011E-01)
12 ( -1.20584 , 0.376045 ) ( -1.20584 , 0.376044 )
13 ( 0.280570 , 0.680643 ) ( 0.280570 , 0.680643 )
14 ( 1.11913 , 0.831795 ) ( 1.11913 , 0.831795 )
15 ( -0.445470 , -1.08482 ) ( -0.743962 , 0.729312 )
16 ( -0.425975 , -0.378074 ) ( -0.964980E-01, -0.319019 )
17 ( -0.740210 , -1.03159 ) ( -0.740210 , -1.03159 )
18 ( 1.00878 , 0.580040 ) ( 1.00878 , 0.580040 )
19 ( 0.123999 , -0.418330 ) ( 0.123999 , -0.418330 )
20 ( -0.207821 , -0.467468 ) ( -0.207821 , -0.467468 )
21 ( -0.471160 , -1.47356 ) ( -0.471160 , -1.47356 )
22 ( -0.329621 , 0.782363 ) ( -0.329621 , 0.782364 )
23 ( -0.248915 , 0.671276 ) ( 0.515318 , -0.225023 )
24 ( -0.154857 , -0.108282 ) ( -0.479790E-01, 0.823263E-01)
25 ( 0.327719 , -0.149753 ) ( 0.327719 , -0.149753 )
26 ( 0.104212 , 0.378216 ) ( 0.104212 , 0.378216 )
27 ( 0.111354 , -0.524580E-01) ( 0.111354 , -0.524580E-01)
28 ( 0.301476 , 0.218972E-01) ( 0.301476 , 0.218972E-01)
29 ( -0.185482 , 0.210484 ) ( -0.185482 , 0.210484 )
30 ( 0.535875 , 0.368959 ) ( 0.535875 , 0.368959 )
31 ( 0.969031E-01, 0.298701 ) ( 0.969031E-01, 0.298701 )
THESE ARE THE RESULTS FOR COLUMN 1
******* CTRMM FAILED ON CALL NUMBER:
2450: CTRMM ('L','U','N','U', 31, 7,( 1.0, 0.0), A, 32, B, 32) .
CTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
CHERK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
OMP_NUM_THREADS=2 ./zblat3 < ./zblat3.dat
TESTS OF THE COMPLEX*16 LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 2.2D-16
ZGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
ZHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( -0.803402E-01, 0.421751 ) ( -0.803402E-01, 0.421751 )
2 ( 0.691964 , 0.209721 ) ( 0.691964 , 0.209721 )
3 ( 0.553420 , -0.312582 ) ( 0.440480 , -0.729041E-02)
4 ( 0.283286 , -0.145302 ) ( 0.153001 , 0.189155 )
5 ( -0.816776E-01, -0.546559 ) ( -0.816776E-01, -0.546559 )
6 ( -0.270234 , 0.120707 ) ( -0.270234 , 0.120707 )
7 ( 0.106893 , 0.242757 ) ( 0.106893 , 0.242757 )
******* ZTRMM FAILED ON CALL NUMBER:
1802: ZTRMM ('L','U','N','U', 7, 1,( 1.0, 0.0), A, 8, B, 8) .
ZTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
ZHERK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
make[1]: Leaving directory '/root/projects/OpenBLAS/test'
make: *** [Makefile:176: tests] Error 2
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Bug in other softwareCompiler, Virtual Machine, etc. bug affecting OpenBLASCompiler, Virtual Machine, etc. bug affecting OpenBLASDistribution packaging problemThird party package incompatibilities, inappropriate build flags or unmet dependencies etcThird party package incompatibilities, inappropriate build flags or unmet dependencies etc