Skip to content

Parallelized dgetrf doesn't detect matrix as singular #4505

Closed
@jmtilli

Description

@jmtilli

I have the following matrix:

10,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,10,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,10,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,0,0,
0,0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,0,
0,0,0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,
0,0,0,-5.6353902507748659,0,0,15.635391250774864,0,0,-10,-9.9999999999999995e-07,0,0,0,
0,0,0,0,-5.6353902507748659,0,0,25.635391250774866,0,-10,-10,-9.9999999999999995e-07,0,0,
0,0,0,0,0,-5.6353902507748659,0,0,25.635390250774865,-10,0,-10,0,0,
0,0,0,0,0,0,-10,-10,-10,30,0,0,0,0,
0,0,0,0,0,0,-9.9999999999999995e-07,-10,0,0,20.000001999999999,-10,-9.9999999999999995e-07,0,
0,0,0,0,0,0,0,-9.9999999999999995e-07,-10,0,-10,30.001000860019595,0,-10.000999860019597,
0,0,0,0,0,0,0,0,0,0,-9.9999999999999995e-07,0,5.635391250774866,-5.6353902507748659,
0,0,0,0,0,0,0,0,0,0,0,-10.000999860019597,-5.6353902507748659,15.636390110794462,

...which should be singular. At least getting its inverse in GNU Octave shows that it's singular to machine precision.

If I try to do LU decomposition for it using OpenBLAS:

#include <stdio.h>

#define lapack_int int
#define LAPACK_dgetrf dgetrf_
void LAPACK_dgetrf(const lapack_int*, const lapack_int*, double*, const lapack_int*, lapack_int*, lapack_int*);

double M[] = {
10,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,10,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,10,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,0,0,
0,0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,0,
0,0,0,0,0,5.6353902507748659,0,0,-5.6353902507748659,0,0,0,0,0,
0,0,0,-5.6353902507748659,0,0,15.635391250774864,0,0,-10,-9.9999999999999995e-07,0,0,0,
0,0,0,0,-5.6353902507748659,0,0,25.635391250774866,0,-10,-10,-9.9999999999999995e-07,0,0,
0,0,0,0,0,-5.6353902507748659,0,0,25.635390250774865,-10,0,-10,0,0,
0,0,0,0,0,0,-10,-10,-10,30,0,0,0,0,
0,0,0,0,0,0,-9.9999999999999995e-07,-10,0,0,20.000001999999999,-10,-9.9999999999999995e-07,0,
0,0,0,0,0,0,0,-9.9999999999999995e-07,-10,0,-10,30.001000860019595,0,-10.000999860019597,
0,0,0,0,0,0,0,0,0,0,-9.9999999999999995e-07,0,5.635391250774866,-5.6353902507748659,
0,0,0,0,0,0,0,0,0,0,0,-10.000999860019597,-5.6353902507748659,15.636390110794462,
};

int main(int argc, char **argv)
{
        const int n = 14;
        int ipiv[14];
        int info = 0;
        LAPACK_dgetrf(&n, &n, M, &n, ipiv, &info);
        printf("Info %d\n", info);
        return 0;
}

...it prints info 0. However, if I set the environment variable OPENBLAS_NUM_THREADS to 1, it prints 14.

So it incorrectly detects the matrix as decomposable if I run it with many threads, and only detects it correctly as singular if I run it with one thread. The separator seems to be 3 vs 4 threads: with 3 threads it prints info 14 but with 4 threads it prints info 0.

I have OpenBLAS 0.3.8. I understand that in newer versions of OpenBLAS, small matrices are always handled with only one thread so it improves performance, so testing the issue with a new OpenBLAS isn't even possible. However, since the calculation happens differently depending on the number of threads, I think there's an issue somewhere that should be tracked down. I would expect OpenBLAS to give identical results no matter how many threads are in use.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions