Description
I have the following matrix:
10,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,10,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,10,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,0,0,
0,0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,0,
0,0,0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,
0,0,0,-5.6353902507748659,0,0,15.635391250774864,0,0,-10,-9.9999999999999995e-07,0,0,0,
0,0,0,0,-5.6353902507748659,0,0,25.635391250774866,0,-10,-10,-9.9999999999999995e-07,0,0,
0,0,0,0,0,-5.6353902507748659,0,0,25.635390250774865,-10,0,-10,0,0,
0,0,0,0,0,0,-10,-10,-10,30,0,0,0,0,
0,0,0,0,0,0,-9.9999999999999995e-07,-10,0,0,20.000001999999999,-10,-9.9999999999999995e-07,0,
0,0,0,0,0,0,0,-9.9999999999999995e-07,-10,0,-10,30.001000860019595,0,-10.000999860019597,
0,0,0,0,0,0,0,0,0,0,-9.9999999999999995e-07,0,5.635391250774866,-5.6353902507748659,
0,0,0,0,0,0,0,0,0,0,0,-10.000999860019597,-5.6353902507748659,15.636390110794462,
...which should be singular. At least getting its inverse in GNU Octave shows that it's singular to machine precision.
If I try to do LU decomposition for it using OpenBLAS:
#include <stdio.h>
#define lapack_int int
#define LAPACK_dgetrf dgetrf_
void LAPACK_dgetrf(const lapack_int*, const lapack_int*, double*, const lapack_int*, lapack_int*, lapack_int*);
double M[] = {
10,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,10,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,10,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,0,0,
0,0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,0,
0,0,0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,
0,0,0,-5.6353902507748659,0,0,15.635391250774864,0,0,-10,-9.9999999999999995e-07,0,0,0,
0,0,0,0,-5.6353902507748659,0,0,25.635391250774866,0,-10,-10,-9.9999999999999995e-07,0,0,
0,0,0,0,0,-5.6353902507748659,0,0,25.635390250774865,-10,0,-10,0,0,
0,0,0,0,0,0,-10,-10,-10,30,0,0,0,0,
0,0,0,0,0,0,-9.9999999999999995e-07,-10,0,0,20.000001999999999,-10,-9.9999999999999995e-07,0,
0,0,0,0,0,0,0,-9.9999999999999995e-07,-10,0,-10,30.001000860019595,0,-10.000999860019597,
0,0,0,0,0,0,0,0,0,0,-9.9999999999999995e-07,0,5.635391250774866,-5.6353902507748659,
0,0,0,0,0,0,0,0,0,0,0,-10.000999860019597,-5.6353902507748659,15.636390110794462,
};
int main(int argc, char **argv)
{
const int n = 14;
int ipiv[14];
int info = 0;
LAPACK_dgetrf(&n, &n, M, &n, ipiv, &info);
printf("Info %d\n", info);
return 0;
}
...it prints info 0. However, if I set the environment variable OPENBLAS_NUM_THREADS to 1, it prints 14.
So it incorrectly detects the matrix as decomposable if I run it with many threads, and only detects it correctly as singular if I run it with one thread. The separator seems to be 3 vs 4 threads: with 3 threads it prints info 14 but with 4 threads it prints info 0.
I have OpenBLAS 0.3.8. I understand that in newer versions of OpenBLAS, small matrices are always handled with only one thread so it improves performance, so testing the issue with a new OpenBLAS isn't even possible. However, since the calculation happens differently depending on the number of threads, I think there's an issue somewhere that should be tracked down. I would expect OpenBLAS to give identical results no matter how many threads are in use.