-
Notifications
You must be signed in to change notification settings - Fork 244
fix conversion of 0x0 CuSparseMatrixCSC <-> CSR #2806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/lib/cusparse/conversions.jl b/lib/cusparse/conversions.jl
index e477e049d..511ad2b81 100644
--- a/lib/cusparse/conversions.jl
+++ b/lib/cusparse/conversions.jl
@@ -332,7 +332,7 @@ for elty in (:Float32, :Float64, :ComplexF32, :ComplexF64)
@eval begin
function CuSparseMatrixCSC{$elty}(csr::CuSparseMatrixCSR{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
m,n = size(csr)
- colPtr = (index == 'O') ? CUDA.ones(Cint, n+1) : CUDA.zeros(Cint, n+1)
+ colPtr = (index == 'O') ? CUDA.ones(Cint, n + 1) : CUDA.zeros(Cint, n + 1)
rowVal = CUDA.zeros(Cint, nnz(csr))
nzVal = CUDA.zeros($elty, nnz(csr))
function bufferSize()
@@ -352,7 +352,7 @@ for elty in (:Float32, :Float64, :ComplexF32, :ComplexF64)
function CuSparseMatrixCSR{$elty}(csc::CuSparseMatrixCSC{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
m,n = size(csc)
- rowPtr = (index == 'O') ? CUDA.ones(Cint, m+1) : CUDA.zeros(Cint, m+1)
+ rowPtr = (index == 'O') ? CUDA.ones(Cint, m + 1) : CUDA.zeros(Cint, m + 1)
colVal = CUDA.zeros(Cint,nnz(csc))
nzVal = CUDA.zeros($elty,nnz(csc))
function bufferSize()
@@ -379,7 +379,7 @@ for (elty, welty) in ((:Float16, :Float32),
@eval begin
function CuSparseMatrixCSC{$elty}(csr::CuSparseMatrixCSR{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
m,n = size(csr)
- colPtr = (index == 'O') ? CUDA.ones(Cint, n+1) : CUDA.zeros(Cint, n+1)
+ colPtr = (index == 'O') ? CUDA.ones(Cint, n + 1) : CUDA.zeros(Cint, n + 1)
rowVal = CUDA.zeros(Cint, nnz(csr))
nzVal = CUDA.zeros($elty, nnz(csr))
if $elty == Float16 #broken for ComplexF16?
@@ -405,7 +405,7 @@ for (elty, welty) in ((:Float16, :Float32),
function CuSparseMatrixCSR{$elty}(csc::CuSparseMatrixCSC{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
m,n = size(csc)
- rowPtr = (index == 'O') ? CUDA.ones(Cint, m+1) : CUDA.zeros(Cint, m+1)
+ rowPtr = (index == 'O') ? CUDA.ones(Cint, m + 1) : CUDA.zeros(Cint, m + 1)
colVal = CUDA.zeros(Cint,nnz(csc))
nzVal = CUDA.zeros($elty,nnz(csc))
if $elty == Float16 #broken for ComplexF16?
diff --git a/test/libraries/cusparse/conversions.jl b/test/libraries/cusparse/conversions.jl
index f946fd196..171b30f10 100644
--- a/test/libraries/cusparse/conversions.jl
+++ b/test/libraries/cusparse/conversions.jl
@@ -119,13 +119,17 @@ for (n, bd, p) in [(100, 5, 0.02), (5, 1, 0.8), (4, 2, 0.5), (0, 1, 0.0)]
A = sprand(n, n, p)
blockdim = bd
for CuSparseMatrixType1 in (CuSparseMatrixCSC, CuSparseMatrixCSR, CuSparseMatrixCOO, CuSparseMatrixBSR)
- if CuSparseMatrixType1 == CuSparseMatrixBSR && n == 0 continue end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
+ if CuSparseMatrixType1 == CuSparseMatrixBSR && n == 0
+ continue
+ end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
dA1 = CuSparseMatrixType1 == CuSparseMatrixBSR ? CuSparseMatrixType1(A, blockdim) : CuSparseMatrixType1(A)
@testset "conversion $CuSparseMatrixType1 --> SparseMatrixCSC" begin
@test SparseMatrixCSC(dA1) ≈ A
end
for CuSparseMatrixType2 in (CuSparseMatrixCSC, CuSparseMatrixCSR, CuSparseMatrixCOO, CuSparseMatrixBSR)
- if CuSparseMatrixType2 == CuSparseMatrixBSR && n == 0 continue end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
+ if CuSparseMatrixType2 == CuSparseMatrixBSR && n == 0
+ continue
+ end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
CuSparseMatrixType1 == CuSparseMatrixType2 && continue
dA2 = CuSparseMatrixType2 == CuSparseMatrixBSR ? CuSparseMatrixType2(dA1, blockdim) : CuSparseMatrixType2(dA1)
@testset "conversion $CuSparseMatrixType1 --> $CuSparseMatrixType2" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: f565385 | Previous: 66ab572 | Ratio |
---|---|---|---|
latency/precompile |
43389661483 ns |
43493459423 ns |
1.00 |
latency/ttfp |
7120385666 ns |
7080947094 ns |
1.01 |
latency/import |
3451201809 ns |
3436925241 ns |
1.00 |
integration/volumerhs |
9608460 ns |
9607544 ns |
1.00 |
integration/byval/slices=1 |
147671 ns |
147107 ns |
1.00 |
integration/byval/slices=3 |
426306 ns |
425588 ns |
1.00 |
integration/byval/reference |
145509 ns |
145074 ns |
1.00 |
integration/byval/slices=2 |
286705 ns |
286272 ns |
1.00 |
integration/cudadevrt |
103894 ns |
103409 ns |
1.00 |
kernel/indexing |
14361 ns |
14466 ns |
0.99 |
kernel/indexing_checked |
15017.5 ns |
15124 ns |
0.99 |
kernel/occupancy |
741.0542635658915 ns |
706.0551724137931 ns |
1.05 |
kernel/launch |
2219.5555555555557 ns |
2284.6666666666665 ns |
0.97 |
kernel/rand |
14947 ns |
14928 ns |
1.00 |
array/reverse/1d |
19835 ns |
19689 ns |
1.01 |
array/reverse/2d |
24990.5 ns |
23897 ns |
1.05 |
array/reverse/1d_inplace |
10753 ns |
10329 ns |
1.04 |
array/reverse/2d_inplace |
12581 ns |
11801 ns |
1.07 |
array/copy |
21065 ns |
21231 ns |
0.99 |
array/iteration/findall/int |
158829 ns |
159142 ns |
1.00 |
array/iteration/findall/bool |
139575 ns |
139322.5 ns |
1.00 |
array/iteration/findfirst/int |
163076.5 ns |
164128.5 ns |
0.99 |
array/iteration/findfirst/bool |
163490 ns |
161608.5 ns |
1.01 |
array/iteration/scalar |
71836 ns |
73479 ns |
0.98 |
array/iteration/logical |
218120.5 ns |
218722 ns |
1.00 |
array/iteration/findmin/1d |
46885 ns |
46880 ns |
1.00 |
array/iteration/findmin/2d |
97393.5 ns |
97416 ns |
1.00 |
array/reductions/reduce/1d |
36047 ns |
36474 ns |
0.99 |
array/reductions/reduce/2d |
41436 ns |
41172 ns |
1.01 |
array/reductions/mapreduce/1d |
34404 ns |
34723 ns |
0.99 |
array/reductions/mapreduce/2d |
41474 ns |
51622 ns |
0.80 |
array/broadcast |
20978 ns |
21035 ns |
1.00 |
array/copyto!/gpu_to_gpu |
12614 ns |
11372 ns |
1.11 |
array/copyto!/cpu_to_gpu |
216953 ns |
219501 ns |
0.99 |
array/copyto!/gpu_to_cpu |
283758 ns |
286844 ns |
0.99 |
array/accumulate/1d |
109582 ns |
109787 ns |
1.00 |
array/accumulate/2d |
80457 ns |
80534 ns |
1.00 |
array/construct |
1295.7 ns |
1267.7 ns |
1.02 |
array/random/randn/Float32 |
44809 ns |
48385.5 ns |
0.93 |
array/random/randn!/Float32 |
25203 ns |
25222 ns |
1.00 |
array/random/rand!/Int64 |
27351.5 ns |
27316 ns |
1.00 |
array/random/rand!/Float32 |
8914.333333333334 ns |
8940.666666666666 ns |
1.00 |
array/random/rand/Int64 |
30147 ns |
30144 ns |
1.00 |
array/random/rand/Float32 |
13285.5 ns |
13204 ns |
1.01 |
array/permutedims/4d |
60803 ns |
61641 ns |
0.99 |
array/permutedims/2d |
55537 ns |
55255 ns |
1.01 |
array/permutedims/3d |
56341.5 ns |
56276 ns |
1.00 |
array/sorting/1d |
2777460 ns |
2779172 ns |
1.00 |
array/sorting/by |
3368859 ns |
3370605.5 ns |
1.00 |
array/sorting/2d |
1085660 ns |
1087247 ns |
1.00 |
cuda/synchronization/stream/auto |
1025.5714285714287 ns |
1019.1 ns |
1.01 |
cuda/synchronization/stream/nonblocking |
7665.5 ns |
8133.4 ns |
0.94 |
cuda/synchronization/stream/blocking |
849.1919191919192 ns |
829.5851063829788 ns |
1.02 |
cuda/synchronization/context/auto |
1148.5 ns |
1202 ns |
0.96 |
cuda/synchronization/context/nonblocking |
8148.1 ns |
7569.299999999999 ns |
1.08 |
cuda/synchronization/context/blocking |
884.5 ns |
910.7058823529412 ns |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
Can you elaborate where the |
Why should the [1] or [0] be there?By definition of the CSR/CSC format: the colPtr/rowPtr always has (number of col/rows) + 1 elements. The last element is always set to
Why is it not initialized by the API?From the CUSPARSE manual (https://docs.nvidia.com/cuda/cusparse/index.html?highlight=CUSPARSE_CSR2CSC_ALG1#cusparsecsr2cscex2):
So the API does not do anything for 0x0 matrices. If we initialize the colPtr/rowPtr with zeros for one based indexing, the resulting CSC datastructure is: |
Thanks.
I interpreted this as not reading the input pointers, while still properly initializing the output offset array. Anyhow, I'm not too familiar with the intricacies of sparse array representations, so LGTM. Let's see if anybody else wants to chime in. |
m,n = size(coo) | ||
nnz(coo) == 0 && return CuSparseMatrixCSR{Tv}(CUDA.ones(Cint, m+1), coo.colInd, nonzeros(coo), size(coo)) | ||
m, n = size(coo) | ||
csrRowPtr = (index == 'O') ? CUDA.ones(Cint, m + 1) : CUDA.zeros(Cint, m + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just stumbled across these lines. For consistency, the initialization should probably be treated similar as in the CSC/CSR case.
Should we open a ticket? |
Yeah I'll file a bug report. |
The conversion between
CuSparseMatrixCSC
andCuSparseMatrixCSR
creates an invalidcolPtr
/rowPtr
field for the edge case of 0x0 matrices.MWE:
I fixed this by initializing with
CUDA.ones
instead ofCUDA.zeros
in the conversion methods.I also added this edgecase to the conversion tests, but had to exclude the BSR format for the
n=0
case, where already the conversion errors.The error for BSR is okay, IMO. But for CSC and CSR, where the conversion works, the final object should be a valid sparse matrix.