Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DGELSD gives incorrect results on v0.3.6 with SKYLAKEX #2187

Closed
apataki opened this issue Jul 17, 2019 · 7 comments
Closed

DGELSD gives incorrect results on v0.3.6 with SKYLAKEX #2187

apataki opened this issue Jul 17, 2019 · 7 comments

Comments

@apataki
Copy link

apataki commented Jul 17, 2019

It seems like the DGELSD least squares solver via SVD produces incorrect results when built with TARGET=SKYLAKEX or DYNAMIC_ARCH=1 and run on a SKYLAKEX node. The processor we use is the Xeon 6148.

Here is a sample code trying to recover an exact result via fitting a small polynomial:

#include <stdio.h>

#define N 512
#define M 8

double X[N];
double Y[N];
double A[M][N];
double S[M];
double work[N*100];
int lwork = N*100;
int iwork[N*100];

void dgelsd_(int *MM, int *NN, int *NRHS, double *A, int *LDA, double *B, int *LDB,
             double *S, double *RCOND, int *RANK, double *WORK, int *LWORK, int *IWORK, int *INFO);

int main()
{
    int i, j;
    double x, dx, xn;

    dx = 1.0 / (N+1);
    for (i = 0; i < N; i++) {
        x = i*dx;
        Y[i] = 2*x + 3*x*x*x;

        // Generate the Vandermonde matrix
        xn = 1.0;
        for (j = 0; j < M; j++) {
            A[j][i] = xn;
            xn *= x;
        }
    }

    int NN = N;
    int MM = M;
    int NRHS = 1;
    int LDB = N;
    double RCOND = 1e-10;
    int r;
    int info;
    dgelsd_(&NN, &MM, &NRHS, A, &NN, Y, &LDB, S, &RCOND, &r, work, &lwork, iwork, &info);

    printf("info: %d\n", info);

    for (int i = 0; i < M; i++) {
        printf("    %9.6f\n", Y[i]);
    }

    return 0;
}

The correct output (on Broadwell for example):

info: 0
    -0.000000
     2.000000
    -0.000000
     3.000000
    -0.000000
     0.000000
    -0.000000
     0.000000

On the 6148 with TARGET=SKYLAKEX I get:

info: 0
    -1.842238
    -1.542870
     1.389274
    -1.695241
     0.884713
    -0.855344
    -0.331920
    -0.538012

If I reduce the polynomial so that M < 8, the relevant matrix dimension becomes < 8 and the code works correctly. 8 being the magic number for AVX-512 and doubles, I wonder if the AVX-512 implementation of some subfunction is incorrect. Interestingly enough, the singular values computed are correct (not printed here).

@martin-frbg
Copy link
Collaborator

This is almost certainly another manifestation of the DGEMM bug (#2029), hopefully this new testcase will help to track it down. (Current develop branch simply removes the AVX512 dgemm and helper functions so should be correct again).

@apataki
Copy link
Author

apataki commented Jul 18, 2019

Indeed - confirmed that the development branch doesn't show this problem.

@apataki
Copy link
Author

apataki commented Jul 18, 2019

I'm a bit confused about how the DGEMM+Skylake issue has been outstanding for almost half a year, and how the current stable release version has such a huge unpatched bug in it for the processor that is currently actively sold for new servers (and the current stable release has a release date that is months newer than the discovery date of this bug). While it is nice that the AVX-512 DGEMM is disabled on the development branch, I doubt that most people would build the development branch for general use.

Almost the most trivial DGEMM example reproduces the problem for me on a Skylake CPU. Note: all the indexing is done in a FORTRAN manner in this test:

#include <stdio.h>
#include <math.h>

#define L  8
#define M  8
#define N  8

double A[M][L];
double B[N][M];

double C1[N][L];
double C2[N][L];

void dgemm_(char *TRANSA, char *TRANSB,
            int *MM, int *NN, int *KK,
            double *ALPHA, double *A, int *LDA, double *B, int *LDB,
            double *BETA, double *C, int *LDC);


int main()
{
    int i, j, k;
    double tmp, err;

    // Fill the inputs with something
    for (i = 0; i < M; i++) {
        for (j = 0; j < L; j++) {
            A[i][j] = i+2*j;
        }
    }
    for (i = 0; i < N; i++) {
        for (j = 0; j < M; j++) {
            B[i][j] = i+2*j;
        }
    }

    // Manual matrix multiplication (FORTRAN indexing order!)
    for (j = 0; j < N; j++) {
        for (i = 0; i < L; i++) {
            tmp = 0;
            for (k = 0; k < M; k++) {
                tmp += A[k][i] * B[j][k];
            }
            C2[j][i] = tmp;
        }
    }

    // DGEMM multiplication
    double alpha = 1.0;
    double beta = 0.0;
    int LL = L;
    int MM = M;
    int NN = N;
    dgemm_("N", "N", &LL, &NN, &MM, &alpha, A, &LL, B, &MM, &beta, C1, &LL);

    // Comparison
    err = 0.0;
    for (i = 0; i < N; i++) {
        for (j = 0; j < L; j++) {
            tmp = C1[i][j] - C2[i][j];
            err += tmp*tmp;
        }
    }
    printf("Error: %10.2e\n", sqrt(err));

    return 0;
}

The output on Broadwell:

Error:   0.00e+00

on Skylake (Xeon Gold 6148):

Error:   1.38e+03

@martin-frbg
Copy link
Collaborator

Simply put, there is currently no sizeable team or organization behind OpenBLAS (and never has been), despite the importance that the project appears to have gained over the years. The AVX512 code was contributed by someone (working for Intel no less) who has not commented here since april, the bug reports available at that time were from big packages (R, computational chemistry and the like) with no simple reproducer and based on them I had mistakenly assumed that disabling only parts of the AVX512 code was sufficient to work around the problem in 0.3.6. I had meant to release
0.3.7 about two months ago, but as I am doing this in my spare time that date has unfortunately slipped.

@aaichsmn
Copy link

I'm happy to wait and want to commend you for the quality of the distributions I've installed. The fact that they nearly drop in and pass their self tests up to this point has saved me inestimable time that I can use on other projects.

Again, thanks for your effort and good luck getting this one figured out. My assembly coding days ended with the 808x or I'd try to spot the problem myself :-).

@martin-frbg
Copy link
Collaborator

@apataki the pure DGEMM bug demonstrated by your latest reproducer may actually be caused by my misguided attempt to fix things (at least I cannot reproduce it in SDE with all parts of the original contribution active). One confusing datapoint from this whole mess is that the AVX512 code appeared to fail when called as DSYMM although it seemed to work fine in the general (DGEMM) case.

@martin-frbg
Copy link
Collaborator

Fixed by wjc404's new AVX512 DGEMM kernel from #2286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants