Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add two-stage algorithms #25

Open
dmbates opened this issue Oct 16, 2020 · 25 comments
Open

Add two-stage algorithms #25

dmbates opened this issue Oct 16, 2020 · 25 comments

Comments

@dmbates
Copy link
Contributor

dmbates commented Oct 16, 2020

Motivated in part by https://discourse.julialang.org/t/wraping-a-mkl-handle-in-julia/48241 I looked in more detail at the two-stage algorithms for recent MKL versions, They are of interest to me, specifically some of the multiplication and solutions. I could create a PR on this repository or create a separate Julia package to wrap these.

Because the two-stage algorithms go beyond some of the BLAS functions I am leaning toward creating a separate package. However, I can be convinced that they would fit in with this package. Any opinions?

@pablosanjose
Copy link
Member

I don't have a strong opinion either way, but would very much love to see progress towards interfacing Julia with MKL's Inspector-executor routines. I think I would lean towards integrating them in this package, which was actually meant to go beyond BLAS, according to the README, right?

@pablosanjose
Copy link
Member

Also, see #17

@dmbates
Copy link
Contributor Author

dmbates commented Oct 18, 2020

To avoid cross-talk between the two formulations I will create a new package MKLsparseinspector to experiment with the inspector-executor routines.

@dmbates
Copy link
Contributor Author

dmbates commented Oct 19, 2020

Having looked at this a bit, I think it may be possible to re-use much of the structure of the existing package and that it would be worthwhile working on this because there are many advantages to the new formulation.

Many of the execution routines can use the MKL CSC format so that there is no need to keep track of the adjoint when going from Julia's SparseMatrixCSC format to the MKL CSR format. (Unfortunately, mkl_sparse_?_syrkd, which is one that I am interested in, does not allow MKL CSC.)

The way I am approaching the easy cases is to define a mutable empty struct in Julia (MKLcsc) parameterized by element type with a conversion function that takes a SparseMatrixCSC and returns a pointer to a struct. The simple routines just create the pointer, pass the pointer to, e.g. mkl_sparse_d_mv and call :mkl_sparse_destroy on the pointer (to free the memory allocated in MKL) before returning.

I have taken the liberty of creating a branch of this repository to implement this approach.

@dmbates
Copy link
Contributor Author

dmbates commented Oct 20, 2020

I have pushed some changes to the dmb/matrix_sparse branch implementing what I had suggested for the mv, mm, sv, and sm operations. Some of the tests in the triangular test set are failing and I am looking into that but I thought it would be best to get something on the repository for others to look at.

@dmbates
Copy link
Contributor Author

dmbates commented Oct 20, 2020

The test failures are very weird. Multiplications of a vector by a triangular sparse matrix on the left that go through cscmv! and, hence the C function mkl_sparse_d_mv give incorrect answers whereas converting the vector to an n by 1 matrix so that the calculation goes through cscmm! and mkl_sparse_d_mm gives the correct answer. But in the matrix-vector multiplication test set the cscmv! -> mkl_sparse_d_mv path works. I'm having difficulty deciding what could be different in the triangular matrix case.

@pablosanjose
Copy link
Member

I'm not familiar with MKL's mechanism of inspector-executor. Perhaps ping some of the other linalg Julia folk?

@dmbates
Copy link
Contributor Author

dmbates commented Oct 21, 2020

General call for help given in https://discourse.julialang.org/t/help-with-debugging-some-code-for-the-mklsparse-package/48764

@andreasnoack
Copy link
Member

I think there might be a bug in MKL here although it seems unlikely. This program

#include "mkl_spblas.h"
#include "stdio.h"

#define M 2
#define N 2
#define NNZ 3

int main(int argc, char const *argv[])
{
    struct matrix_descr descrA;
    sparse_matrix_t cscA;
    sparse_status_t err;

    MKL_INT cscColPtr[] = {0, 2, 3};
    MKL_INT cscRowInd[] = {0, 1, 1};
    double  cscVal[] = {1.0, 1.0, 1.0};

    double x[] = {1.0, 1.0};
    double y[] = {0.0, 0.0};

    mkl_sparse_d_create_csc (
        &cscA,
        SPARSE_INDEX_BASE_ZERO,
        N,  // number of rows
        M,  // number of cols
        cscColPtr,
        cscColPtr + 1,
        cscRowInd,
        cscVal );
    /* code */

    descrA.type = SPARSE_MATRIX_TYPE_GENERAL;

    err = mkl_sparse_d_mv (
        SPARSE_OPERATION_NON_TRANSPOSE,
        1.0,
        cscA,
        descrA,
        x,
        0.0,
        y );

    printf("general sparse-dense matvec\n");
    printf("%f %f\n", y[0], y[1]);

    descrA.type = SPARSE_MATRIX_TYPE_TRIANGULAR;
    descrA.mode = SPARSE_FILL_MODE_LOWER;
    descrA.diag = SPARSE_DIAG_NON_UNIT;

    err = mkl_sparse_d_mv (
        SPARSE_OPERATION_NON_TRANSPOSE,
        1.0,
        cscA,
        descrA,
        x,
        0.0,
        y );

    printf("\ntriangular sparse-dense matvec with SPARSE_FILL_MODE_LOWER\n");
    printf("%f %f\n", y[0], y[1]);


    descrA.mode = SPARSE_FILL_MODE_UPPER;

    err = mkl_sparse_d_mv (
        SPARSE_OPERATION_NON_TRANSPOSE,
        1.0,
        cscA,
        descrA,
        x,
        0.0,
        y );

    printf("\ntriangular sparse-dense matvec with SPARSE_FILL_MODE_UPPER\n");
    printf("%f %f\n", y[0], y[1]);


    descrA.mode = SPARSE_FILL_MODE_LOWER;

    mkl_sparse_d_mm (
        SPARSE_OPERATION_NON_TRANSPOSE,
        1.0,
        cscA,
        descrA,
        SPARSE_LAYOUT_COLUMN_MAJOR,
        x,
        1,
        1,
        0.0,
        y,
        1 );

    printf("\ntriangular sparse-dense matmat with SPARSE_FILL_MODE_LOWER\n");
    printf("%f %f\n", y[0], y[1]);

    return 0;
}

gives

general sparse-dense matvec
1.000000 2.000000

triangular sparse-dense matvec with SPARSE_FILL_MODE_LOWER
1.000000 1.000000

triangular sparse-dense matvec with SPARSE_FILL_MODE_UPPER
1.000000 2.000000

triangular sparse-dense matmat with SPARSE_FILL_MODE_LOWER
1.000000 2.000000

so it looks like they might have flipped the storage branches for the mv routine.

@dmbates
Copy link
Contributor Author

dmbates commented Oct 22, 2020

Thanks for checking that by writing the C code, @andreasnoack . I was beginning to convince myself that the problem may be in MKL but I kept thinking that this was such a basic routine that it couldn't possibly have a bug that had gotten past their tests undetected. It may be that they haven't checked the CSC version as closely as the CSR version.

Would you be willing to write to the MKL discussion forums about this? Information is at
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-post-in-the-Intel-oneAPI-Math-Kernel-Library-forum/td-p/1209982

@andreasnoack
Copy link
Member

Yes. I can do that once Intel's registration system works again. Just tried and it currently isn't working.

@dmbates
Copy link
Contributor Author

dmbates commented Oct 22, 2020

Thank you very much for making that report, @andreasnoack . I haven't been able to view that link yet but I imagine that is because the moderator must release it.

I am rather disappointed in the quality of this release of MKL. I have been working on incorporating mkl_sparse_?_syrkd into this branch as the rankUpdate operation is where I was hoping to gain speed in a blocked Cholesky factorization. For some of my models one of the off-diagonal blocks will be a large, sparse, rectangular matrix and I need to downdate the diagonal block from it. I did get it working and my timings show that mkl_sparse_?_syrkd is about 3 times slower than the single-threaded, non-SIMD Julia code I am using. Even worse, the answers are not correct in that it evaluates the entire symmetric update but only multiplies the upper triangular part of C by β. So repeated calls end up with junk in the lower triangle of C.

@dmbates
Copy link
Contributor Author

dmbates commented Nov 11, 2020

@andreasnoack Has there been any response from the MKL maintainers about the bug report? I have not succeeded in viewing that issue.

@andreasnoack
Copy link
Member

Not really. On 30 October, I was told that

We will check the issue ASAP and get back to you.

and I haven't heard more. I've just asked for an update in the support request.

@andreasnoack
Copy link
Member

Intel has just gotten back to me to tell me that they were sorry for not replying earlier. No further details. To be continued.

@pablosanjose
Copy link
Member

No further news?

@andreasnoack
Copy link
Member

Unfortunately not. Actually, I asked for an update four days ago but haven't heard back from Intel.

@andreasnoack
Copy link
Member

Some progress. They just replied that they have run the reproducer I shared, that they are able to reproduce the bug, and that

The issue has been escalated to the MKL engineering and will be addressed.

I'll follow up.

@andreasnoack
Copy link
Member

Just got the following update from Intel

The issue is in progress.

@andreasnoack
Copy link
Member

Today's news from Intel

The Issue has been fixed and will be available with MKL 2021.U3

@BacAmorim
Copy link

MKL_jll.jl currently ships MKL v2021.1.1+2. For this issue to be solved, MKL_jll would have to ship the latest MKL version, correct?

@BacAmorim
Copy link

MKL_jll.jl has been updated in the lattest realease to MKL v2022.0.0+0. So this issue can now be solved, correct?

@dmbates
Copy link
Contributor Author

dmbates commented Jan 31, 2022

By the way, I recently checked again the speed of their mkl_sparse_?_syrkd versus our single threaded code and the single threaded code is still faster. It appears that their inspector-executor code for sparse matrices didn't have quite the same level of effort invested in it as did the dense matrix code.

@amontoison
Copy link
Member

I regenerated src.libmklsparse.jl with MKL_jll.jl v2024.1.0. Do you need additional routines?
I can easily add other headers if needed: https://github.com/JuliaSparse/MKLSparse.jl/blob/master/gen/wrapper.jl#L43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants