Many test failures when compiled with `-march=haswell -ftree-vectorize` #732

bartoldeman · 2022-10-17T12:20:47Z

Description

Tested on master, compiling LAPACK with

cp make.inc.example make.inc
make FC="gfortran-11.3.0" make FFLAGS="-O2 -frecursive -march=haswell -ftree-vectorize" -j 32

brings about many test failures:

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1296387         1301    (0.100%)        0       (0.000%)
DOUBLE PRECISION        1304985         1274    (0.098%)        0       (0.000%)
COMPLEX                 748111          1240    (0.166%)        0       (0.000%)
COMPLEX16               761754          561     (0.074%)        0       (0.000%)

--> ALL PRECISIONS      4111237         4376    (0.106%)        0       (0.000%)

There are three categories of failures here:

A bug in GCC, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254. Fortunately a fix is provided there and it can be trivially backported.
Two minor test failures in REAL and COMPLEX, see stest_rfp: STFSM test is very sensitive to small numeric errors in BLAS STRSM #679
The tests insisting that when e.g. CGEEV is called to provide eigenvectors, it will give the exact same eigenvalues (no numerical tolerance) as when it does not provide eigenvectors.

Applying the patch to GCC, tests go down to:

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1310427         1       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1319025         0       (0.000%)        0       (0.000%)
COMPLEX                 762151          254     (0.033%)        0       (0.000%)
COMPLEX16               763938          494     (0.065%)        0       (0.000%)

--> ALL PRECISIONS      4155541         749     (0.018%)        0       (0.000%)

Increasing the tolerance for 2) using

diff --git a/TESTING/ctest_rfp.in b/TESTING/ctest_rfp.in
index d6988f2a7..82e147535 100644
--- a/TESTING/ctest_rfp.in
+++ b/TESTING/ctest_rfp.in
@@ -5,5 +5,5 @@ Data file for testing COMPLEX LAPACK linear equation routines RFP format
 1 2 15                         Values of NRHS (number of right hand sides)
 9                              Number of matrix types (list types on next line if 0 < NTYPES <  9)
 1 2 3 4 5 6 7 8 9              Matrix Types
-30.0                           Threshold value of test ratio
+42.0                           Threshold value of test ratio
 T                              Put T to test the error exits
diff --git a/TESTING/stest_rfp.in b/TESTING/stest_rfp.in
index 08dbf83e6..9b082b7df 100644
--- a/TESTING/stest_rfp.in
+++ b/TESTING/stest_rfp.in
@@ -5,5 +5,5 @@ Data file for testing REAL LAPACK linear equation routines RFP format
 1 2 15                         Values of NRHS (number of right hand sides)
 9                              Number of matrix types (list types on next line if 0 < NTYPES <  9)
 1 2 3 4 5 6 7 8 9              Matrix Types
-30.0                           Threshold value of test ratio
+42.0                           Threshold value of test ratio
 T                              Put T to test the error exits

we get

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1318203         0       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1319025         0       (0.000%)        0       (0.000%)
COMPLEX                 769927          253     (0.033%)        0       (0.000%)
COMPLEX16               763938          494     (0.065%)        0       (0.000%)

--> ALL PRECISIONS      4171093         747     (0.018%)        0       (0.000%)

The third one is more complex to analyze. The way I understand it is that gfortran has some freedom in how to apply FMA's to expressions such as a*b+c*d (ie. fma(a,b,c*d) or fma(c,d,a*b)) and these give slightly different results.
The loop bounds for computing and not computing eigenvectors for various loops are different, hence it is often the case that a loop with eigenvectors is vectorized, and without isn't, or for different loop iterations, and the vectorized use of FMA isn't identical to the unvectorized use of FMA. And with complex numbers of course there are even more permutations possible. Adding parentheses forces the compiler to use one variety.

Applying this patch:

diff --git a/SRC/clahqr.f b/SRC/clahqr.f
index dbd848e2f..260764ef9 100644
--- a/SRC/clahqr.f
+++ b/SRC/clahqr.f
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = CONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = CONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*CONJG( V2 )
    90       CONTINUE
diff --git a/SRC/claqr5.f b/SRC/claqr5.f
index 0a01cc226..a19800b1e 100644
--- a/SRC/claqr5.f
+++ b/SRC/claqr5.f
@@ -616,7 +616,7 @@
                T3 = T1*CONJG( V( 3, M ) )
                DO 70 J = JTOP, MIN( KBOT, K+3 )
                   REFSUM = H( J, K+1 ) + V( 2, M )*H( J, K+2 )
-     $                     + V( 3, M )*H( J, K+3 )
+     $                     + ( V( 3, M )*H( J, K+3 ) )
                   H( J, K+1 ) = H( J, K+1 ) - REFSUM*T1
                   H( J, K+2 ) = H( J, K+2 ) - REFSUM*T2
                   H( J, K+3 ) = H( J, K+3 ) - REFSUM*T3
diff --git a/SRC/zlahqr.f b/SRC/zlahqr.f
index 9413f20cc..c632f578a 100644
--- a/SRC/zlahqr.f
+++ b/SRC/zlahqr.f
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = DCONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = DCONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*DCONJG( V2 )
    90       CONTINUE
diff --git a/SRC/zlaqr5.f b/SRC/zlaqr5.f
index 4fa5ee5b0..656a7ab37 100644
--- a/SRC/zlaqr5.f
+++ b/SRC/zlaqr5.f
@@ -446,7 +446,7 @@
                T2 = T1*V( 2, M22 )
                DO 40 J = K+1, JBOT
                   REFSUM = H( K+1, J ) +
-     $                     DCONJG( V( 2, M22 ) )*H( K+2, J )
+     $                     ( DCONJG( V( 2, M22 ) )*H( K+2, J ) )
                   H( K+1, J ) = H( K+1, J ) - REFSUM*T1
                   H( K+2, J ) = H( K+2, J ) - REFSUM*T2
    40          CONTINUE

makes the test failures go to 0.

But it looks quite fragile, in that it takes trial and error to see which expressions need braces.

Would it be ok instead to replace test 5, e.g.

lapack/TESTING/EIG/cdrvev.f

Lines 799 to 805 in 28f7e83

    
           * 
        
           *              Do Test (5) 
        
           * 
        
                          DO 150 J = 1, N 
        
                             IF( W( J ).NE.W1( J ) ) 
        
                $               RESULT( 5 ) = ULPINV 
        
             150          CONTINUE

by a test that uses tolerances, like the one used below, or would that go against some specifications?

lapack/TESTING/EIG/cchkhs.f

Line 87 in 28f7e83

*> (8) | W(Z computed) - W(Z not computed) | / ( |W| ulp )

lapack/TESTING/EIG/cchkhs.f

Lines 825 to 835 in 28f7e83

    
           * 
        
           *           Do Test 8: | W3 - W1 | / ( max(|W1|,|W3|) ulp ) 
        
           * 
        
                       TEMP1 = ZERO 
        
                       TEMP2 = ZERO 
        
                       DO 130 J = 1, N 
        
                          TEMP1 = MAX( TEMP1, ABS( W1( J ) ), ABS( W3( J ) ) ) 
        
                          TEMP2 = MAX( TEMP2, ABS( W1( J )-W3( J ) ) ) 
        
             130       CONTINUE 
        
           * 
        
                       RESULT( 8 ) = TEMP2 / MAX( UNFL, ULP*MAX( TEMP1, TEMP2 ) )

Checklist

I've included a minimal example to reproduce the issue
I'd be willing to make a PR to solve this issue

The text was updated successfully, but these errors were encountered:

angsch · 2022-10-31T18:34:52Z

Aggressive compiler optimization and/or the usage of a highly optimized BLAS is known sources for failures in the stringent LAPACK tests. See for example the LAPACK test summary shown in OpenMathLib/OpenBLAS#3609. The failures do not necessarily mean that something is fundamentally wrong.

How do I interpret LAPACK testing failures?
[...] Testing failures can be divided into two categories. Minor testing failures, and major testing failures. A minor testing failure is one in which the test ratio reported in the LAPACK/TESTING/.out file slightly exceeds the threshold (specified in the associated LAPACK/TESTING/.in file). The cause of such failures can mainly be attributed to differences in the implementation of math libraries (square root, absolute value, complex division, complex absolute value, etc). These failures are negligible, and do not affect the proper functioning of the library.

The failures related to laqr fall into the category of the minor failures. The flag march=haswell activates vectorization and heavy FMA usage. With aggressive compiler optimization, many tests are slightly above the threshold. When you apply the laqr patch, one instruction is saved per update. That means one instruction less to accumulate errors - just enough to fall below the tight error threshold. This was actually one of the intentions behind this patch.

bartoldeman added the Type: Bug label Oct 17, 2022

bartoldeman mentioned this issue Nov 10, 2022

{numlib}[GCC/12.2.0] FlexiBLAS v3.2.1, BLIS v0.9.0, OpenBLAS v0.3.21 easybuilders/easybuild-easyconfigs#16602

Merged

martin-frbg mentioned this issue Apr 18, 2023

One failure in the tests for the generalized non-symmetric eigenvalue problem ZGGEV #744

Open

martin-frbg mentioned this issue May 5, 2023

LAPACK test fails with Error code from DDRGES3 = 9 on AMD Genoa OpenMathLib/OpenBLAS#4032

Open

martin-frbg mentioned this issue Aug 9, 2023

Lapack test failures for OpenBLAS 0.3.21 on ARM Neoverse_v1 OpenMathLib/OpenBLAS#4187

Closed

martin-frbg mentioned this issue Nov 21, 2023

elevated lapack test failures on Arm Neoverse v1 OpenMathLib/OpenBLAS#4335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many test failures when compiled with `-march=haswell -ftree-vectorize` #732

Many test failures when compiled with `-march=haswell -ftree-vectorize` #732

bartoldeman commented Oct 17, 2022

angsch commented Oct 31, 2022

Many test failures when compiled with -march=haswell -ftree-vectorize #732

Many test failures when compiled with -march=haswell -ftree-vectorize #732

Comments

bartoldeman commented Oct 17, 2022

angsch commented Oct 31, 2022

Many test failures when compiled with `-march=haswell -ftree-vectorize` #732

Many test failures when compiled with `-march=haswell -ftree-vectorize` #732