Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New routine for the stride of 0 for C in CLBlastSgemmStridedBatched() is need #347

Open
TaihuLight opened this issue Jan 9, 2019 · 2 comments

Comments

@TaihuLight
Copy link

#346
However, the stride of 0 for C in CLBlastSgemmStridedBatched() is also useful in deep learning applicaiton such as the implementations of the backfoward of convolution layer. It can be implemented with two steps:
(1) use CLBlastSgemmStridedBatched() to compute batched matrices C;
(2) add a new routine (e.g., named StridedBatchedAddMatrix) to compute the sum of each batch of the computed matrix C in Step (1).
Thus, the new routine (e.g., named StridedBatchedAddMatrix) for adding all batched matrices with element-wised to reducing the results of CLBlastSgemmStridedBatched(). That is, the new routine is used to compute the sum of batched matrices as following:

for(int i=0; i< batch_count; i++){
SUM = SUM + β* ( C + i * c_stride ) 
} 

where c_stride is the stride between two batches of the C matrix, and batched C matrices is computed with CLBlastSgemmStridedBatched(). For instance,
_20190105200133
Therefore, the new routine for adding matrices is similar with the routine xAXPYStridedBATCHED: StridedBatched version of AXPY for adding vectors.

@TaihuLight TaihuLight changed the title New routine for the stride of 0 for C in CLBlastSgemmStridedBatched() is need New routine for the stride of 0 for C in CLBlastSgemmStridedBatched() is need Jan 9, 2019
@CNugteren
Copy link
Owner

Sorry for my late reply.

I don't think 'batched' is the right wording here. That is typically used to indicate an operation that is repeated multiple times but on independent data. In your case the SUM variable is shared, right? So the iterations of the 'batched' loop are not actually independent of each other.

I think what you are looking for is perhaps something like the XSUM routine from CLBlast, but than from 3D (batches of 2D matrices) to 2D (a single 2D matrix) rather than 1D (a vector) to 0D (a scalar). Perhaps if you see your 2D matrices as a flat vector and you re-organize your data, something like in #349 could fit your need?

@CNugteren
Copy link
Owner

Could you have a look at the latest reply in #349 regarding a solution with GEMV? i think this solves your issue as well, since you can just use GEMV, set the x vector to all 1's equal in size to the amount of values you want to sum, and use the other dimension (either m or n depending on how the data is currently layed-out in memory) as the size of the matrix. For example, set a_transposed = true, m = num_batches (the number of sums you want to do), and n = height_of_C * width_of_C (the matrix C flattened).

Could you let me know if this works for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants