You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#346
However, the stride of 0 for C in CLBlastSgemmStridedBatched() is also useful in deep learning applicaiton such as the implementations of the backfoward of convolution layer. It can be implemented with two steps:
(1) use CLBlastSgemmStridedBatched() to compute batched matrices C;
(2) add a new routine (e.g., named StridedBatchedAddMatrix) to compute the sum of each batch of the computed matrix C in Step (1).
Thus, the new routine (e.g., named StridedBatchedAddMatrix) for adding all batched matrices with element-wised to reducing the results of CLBlastSgemmStridedBatched(). That is, the new routine is used to compute the sum of batched matrices as following:
for(int i=0; i< batch_count; i++){
SUM = SUM + β* ( C + i * c_stride )
}
where c_stride is the stride between two batches of the C matrix, and batched C matrices is computed with CLBlastSgemmStridedBatched(). For instance,
Therefore, the new routine for adding matrices is similar with the routine xAXPYStridedBATCHED: StridedBatched version of AXPY for adding vectors.
The text was updated successfully, but these errors were encountered:
TaihuLight
changed the title
New routine for the stride of 0 for C in CLBlastSgemmStridedBatched() is need
New routine for the stride of 0 for C in CLBlastSgemmStridedBatched() is need
Jan 9, 2019
I don't think 'batched' is the right wording here. That is typically used to indicate an operation that is repeated multiple times but on independent data. In your case the SUM variable is shared, right? So the iterations of the 'batched' loop are not actually independent of each other.
I think what you are looking for is perhaps something like the XSUM routine from CLBlast, but than from 3D (batches of 2D matrices) to 2D (a single 2D matrix) rather than 1D (a vector) to 0D (a scalar). Perhaps if you see your 2D matrices as a flat vector and you re-organize your data, something like in #349 could fit your need?
Could you have a look at the latest reply in #349 regarding a solution with GEMV? i think this solves your issue as well, since you can just use GEMV, set the x vector to all 1's equal in size to the amount of values you want to sum, and use the other dimension (either m or n depending on how the data is currently layed-out in memory) as the size of the matrix. For example, set a_transposed = true, m = num_batches (the number of sums you want to do), and n = height_of_C * width_of_C (the matrix C flattened).
#346
However, the stride of 0 for
C
in CLBlastSgemmStridedBatched() is also useful in deep learning applicaiton such as the implementations of the backfoward of convolution layer. It can be implemented with two steps:(1) use CLBlastSgemmStridedBatched() to compute batched matrices
C
;(2) add a new routine (e.g., named StridedBatchedAddMatrix) to compute the sum of each batch of the computed matrix
C
in Step (1).Thus, the new routine (e.g., named StridedBatchedAddMatrix) for adding all batched matrices with element-wised to reducing the results of CLBlastSgemmStridedBatched(). That is, the new routine is used to compute the sum of batched matrices as following:
where c_stride is the stride between two batches of the
C
matrix, and batchedC
matrices is computed with CLBlastSgemmStridedBatched(). For instance,Therefore, the new routine for adding matrices is similar with the routine
xAXPYStridedBATCHED: StridedBatched version of AXPY
for adding vectors.The text was updated successfully, but these errors were encountered: