You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to this comment, the current SpGEMM implementation may issue CUSPARSE_STATUS_INSUFFICIENT_RESOURCES for some specific input. Hence, I tried the cusparseScsrgemm2 method. However, I find that cusparseScsrgemm2 is quite slow. For example, for two 600,000 x 600,000 matrices A and B, where A contains 40,000,000 entries and B is a diagonal matrix, cusparseScsrgemm2 took several seconds to compute the multiplication of A and B, much slower than SpGEMM, which took only tens of milliseconds. I used CUDA11.3 and Tesla V100. The input matrices can be downloaded here. The program is as follows.
No, it is not related to the specific GPU architecture. Low performance for cusparseScsrgemm2 is due to the sparsity pattern of the input matrix
There are no alternatives. You may try with cusparseSpGEMMreuse. This routine makes sense when the cost of preprocessing steps can be amortized over multiple runs as cusparseSpGEMMreuse_compute is very fast
According to this comment, the current
SpGEMM
implementation may issueCUSPARSE_STATUS_INSUFFICIENT_RESOURCES
for some specific input. Hence, I tried thecusparseScsrgemm2
method. However, I find thatcusparseScsrgemm2
is quite slow. For example, for two600,000 x 600,000
matricesA
andB
, whereA
contains40,000,000
entries andB
is a diagonal matrix,cusparseScsrgemm2
took several seconds to compute the multiplication ofA
andB
, much slower thanSpGEMM
, which took only tens of milliseconds. I used CUDA11.3 and Tesla V100. The input matrices can be downloaded here. The program is as follows.I have the following questions.
cusparseScsrgemm2
caused by not being able to exploit the architecture of V100?cusparseScsrgemm2
andSpGEMM
?Thanks.
The text was updated successfully, but these errors were encountered: