Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement a new routine CLBlastSDgemm() to get the float and double type of Matrix C #427

Open
TaihuLight opened this issue Nov 10, 2021 · 4 comments
Labels

Comments

@TaihuLight
Copy link

TaihuLight commented Nov 10, 2021

As the requirement of our development, we need to obtain another higher precision matrix mutiplication of float A and float B, i.e., double type of C (cd_buffer). The compute procedure is shown as following:

CLBlastSDgemm(..., const cl_mem a_buffer, const size_t a_offset, const size_t a_ld,
                               const cl_mem b_buffer, const size_t b_offset, const size_t b_ld,
                               const float beta,
                               cl_mem c_buffer, const size_t c_offset, const size_t c_ld,
                               cl_mem cd_buffer, ...)

# First, use double type of CD[i][i] to store the sum of elements multiplications of matrix A and B
CD (cd_buffer) = alpha * A * B + beta * CD (cd_buffer) 
# Sencond, convert double type of matrix CD into float type of matrix C, 
#  which is stored in the float type buffer of matrix C (c_buffer).
C (c_buffer) = CD (cd_buffer) 

How to add the new code to implement the newly introducted routine CLBlastSDgemm in CLBlast?
@kpot @matze @umar456 @tholu @CNugteren

@CNugteren
Copy link
Owner

I'm not sure what you mean exactly. Do you ask how to add the proposed computation to the CLBlast library (with the purpose of upstreaming it and making it available to all users) or do you ask how to modify your own copy of CLBlast locally to achieve this computation?

If you ask for the first, I would say it is not general enough to benefit other users so I would rather leave it out. If you ask the latter, then you'll need to introduce new code in both the indirect GEMM here and the direct GEMM here. And of course modify the data-types and function arguments. Let me know if you need more help and for what specifically.

@TaihuLight
Copy link
Author

TaihuLight commented Nov 11, 2021

I mean the latter case, i.e., How to modify my own copy of CLBlast locally to achieve this new computation.
Could you help me and provide demo code for implementing this new computation?
Which functions are required to copy and modify?
First, I need to modify the following code in https://github.com/CNugteren/CLBlast/blob/ef5176dd968dbf9da7c94506fc0d5f8bd463b293/src/kernels/common.opencl
to implement double c += float a * float b.

// The scalar multiply-add function
#if PRECISION == 3232 || PRECISION == 6464
  #define MultiplyAdd(c,a,b) c.x += MulReal(a,b); c.y += MulImag(a,b)
#else
  #if USE_CL_MAD == 1
    #define MultiplyAdd(c,a,b) c = mad(a, b, c)
  #else
    #define MultiplyAdd(c,a,b) c += a * b
  #endif
#endif

But, How to modify the above to perform double c += float a * float b?
Then, how to add a double Cpm to store the accumulation (Cpm += Apm * Bpm) for
Second, I add a double type of cpm to store the value of the double Cpm, which is located next to code at Line 87 in the function https://github.com/CNugteren/CLBlast/blob/master/src/kernels/level3/xgemm_part3.opencl#L306
'''
double16 cpm[NWI*(MWI/VWM)]; // NWI * MWI
'''
Third, the function StoreResults() and StoreResultsChecked() are modify to store double type value to cgm.
Forth, what code are need to copy and modify for implementing the new function CLBlastSDgemm() ?
Thank you for your kind help.

@CNugteren

@CNugteren
Copy link
Owner

CNugteren commented Nov 11, 2021

Hmm lots of detailed questions, not sure if I can answer them all. If I were you I would just hack it in and not really create a new function, assuming the regular single-precision case as a starting point. Then use a separate CLBlast for the regular functions. Would that work?

How to modify the above to perform double c += float a * float b?

That depends a bit on what you want to achieve. You could do either of:

#define MultiplyAdd(c,a,b) c += (double)a * (double)b
#define MultiplyAdd(c,a,b) c += (double)(a * b)

Note that I'm not sure how expensive this cast is and what exactly your precision requirements are. You could choose to accumulate a few values and only then cast to doubles.

Then, how to add a double Cpm to store the accumulation (Cpm += Apm * Bpm) for

I would just replace all definitions of Cpm with the double data-type. So e.g. here just replace realM with doubleM (with whatever M is, obtained by the tuners result). Note that this will break all other cases. You could also put all your changes inside some new #ifdef .... block. You might have to do a similar change in other places as well. If you want to do it better you'll probably need to introduce a new doubleM data-type, similarly defined to the realM here.

Third, the function StoreResults() and StoreResultsChecked() are modify to store double type value to cgm.

Indeed, as I said above, also there you might have to change the data-types. Perhaps introducing this doubleM above and replacing realM with doubleM at the right places will solve this.

Forth, what code are need to copy and modify for implementing the new function CLBlastSDgemm() ?

I wouldn't do that, just replace the original one. If you really want to you'll need to modify https://github.com/CNugteren/CLBlast/blob/master/src/routines/level3/xgemm.cpp to support that new case, which isn't easy, because it is currently templated on a single data-type. And you'll mix two, so that won't work in the current design. You can copy that whole thing of course and make a separate version specifically for your CLBlastSDgemm case. But much easier to hack it into either CLBlastSgemm (PRECISION=32) or CLBlastDgemm (PRECISION=64), depending on which one you don't use.

@TaihuLight
Copy link
Author

TaihuLight commented Nov 13, 2021

  1. For double c += float a * float b, I want to achieve
#define MultiplyAdd(c,a,b) c += (double)(a * b)
  1. Added GEMM temp buffer C API #251 adds a GEMM temp buffer, named temp_buffer. If the temporary buffer temp_buffer can be used to stored the values of Matrix C, it is easier to modify the data type of temp_buffer to obtain the double type of the result of GEMM.
    Does temp_buffer used to store the valued of Matrix C?
    @CNugteren

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants