Skip to content

fix conversion of 0x0 CuSparseMatrixCSC <-> CSR #2806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tam724
Copy link

@tam724 tam724 commented Jul 1, 2025

The conversion between CuSparseMatrixCSC and CuSparseMatrixCSR creates an invalid colPtr/rowPtr field for the edge case of 0x0 matrices.

MWE:

julia> using SparseArrays, CUDA, CUDA.CUSPARSE

julia> A = cu(sprand(0, 0, 1.0))
0×0 CuSparseMatrixCSC{Float32, Int32} with 0 stored entries

julia> A_CSR = CUDA.CUSPARSE.CuSparseMatrixCSR(A)
0×0 CuSparseMatrixCSR{Float32, Int32} with 0 stored entries

julia> collect(A_CSR)
ERROR: ArgumentError: 1 == colptr[1] != 1
[...]

julia> A_CSC = CUDA.CUSPARSE.CuSparseMatrixCSC(A_CSR)
0×0 CuSparseMatrixCSC{Float32, Int32} with 0 stored entries

julia> collect(A_CSC)
ERROR: ArgumentError: 1 == colptr[1] != 1
[...]

julia> A_CSC.colPtr[:] .= 1 # fix by hand
1-element CuArray{Int32, 1, CUDA.DeviceMemory}:
 1

julia> collect(A_CSC)
0×0 Matrix{Float32}

I fixed this by initializing with CUDA.ones instead of CUDA.zeros in the conversion methods.

I also added this edgecase to the conversion tests, but had to exclude the BSR format for the n=0 case, where already the conversion errors.

julia> A_BSR = CUDA.CUSPARSE.CuSparseMatrixBSR(A, 1)
ERROR: CUSPARSEError: invalid value (code 3, CUSPARSE_STATUS_INVALID_VALUE)
[...]

The error for BSR is okay, IMO. But for CSC and CSR, where the conversion works, the final object should be a valid sparse matrix.

Copy link
Contributor

github-actions bot commented Jul 1, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/lib/cusparse/conversions.jl b/lib/cusparse/conversions.jl
index e477e049d..511ad2b81 100644
--- a/lib/cusparse/conversions.jl
+++ b/lib/cusparse/conversions.jl
@@ -332,7 +332,7 @@ for elty in (:Float32, :Float64, :ComplexF32, :ComplexF64)
     @eval begin
         function CuSparseMatrixCSC{$elty}(csr::CuSparseMatrixCSR{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
             m,n = size(csr)
-            colPtr = (index == 'O') ? CUDA.ones(Cint, n+1) : CUDA.zeros(Cint, n+1)
+            colPtr = (index == 'O') ? CUDA.ones(Cint, n + 1) : CUDA.zeros(Cint, n + 1)
             rowVal = CUDA.zeros(Cint, nnz(csr))
             nzVal = CUDA.zeros($elty, nnz(csr))
             function bufferSize()
@@ -352,7 +352,7 @@ for elty in (:Float32, :Float64, :ComplexF32, :ComplexF64)
 
         function CuSparseMatrixCSR{$elty}(csc::CuSparseMatrixCSC{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
             m,n    = size(csc)
-            rowPtr = (index == 'O') ? CUDA.ones(Cint, m+1) : CUDA.zeros(Cint, m+1)
+            rowPtr = (index == 'O') ? CUDA.ones(Cint, m + 1) : CUDA.zeros(Cint, m + 1)
             colVal = CUDA.zeros(Cint,nnz(csc))
             nzVal  = CUDA.zeros($elty,nnz(csc))
             function bufferSize()
@@ -379,7 +379,7 @@ for (elty, welty) in ((:Float16, :Float32),
     @eval begin
         function CuSparseMatrixCSC{$elty}(csr::CuSparseMatrixCSR{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
             m,n = size(csr)
-            colPtr = (index == 'O') ? CUDA.ones(Cint, n+1) : CUDA.zeros(Cint, n+1)
+            colPtr = (index == 'O') ? CUDA.ones(Cint, n + 1) : CUDA.zeros(Cint, n + 1)
             rowVal = CUDA.zeros(Cint, nnz(csr))
             nzVal = CUDA.zeros($elty, nnz(csr))
             if $elty == Float16 #broken for ComplexF16?
@@ -405,7 +405,7 @@ for (elty, welty) in ((:Float16, :Float32),
 
         function CuSparseMatrixCSR{$elty}(csc::CuSparseMatrixCSC{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
             m,n    = size(csc)
-            rowPtr = (index == 'O') ? CUDA.ones(Cint, m+1) : CUDA.zeros(Cint, m+1)
+            rowPtr = (index == 'O') ? CUDA.ones(Cint, m + 1) : CUDA.zeros(Cint, m + 1)
             colVal = CUDA.zeros(Cint,nnz(csc))
             nzVal  = CUDA.zeros($elty,nnz(csc))
             if $elty == Float16 #broken for ComplexF16?
diff --git a/test/libraries/cusparse/conversions.jl b/test/libraries/cusparse/conversions.jl
index f946fd196..171b30f10 100644
--- a/test/libraries/cusparse/conversions.jl
+++ b/test/libraries/cusparse/conversions.jl
@@ -119,13 +119,17 @@ for (n, bd, p) in [(100, 5, 0.02), (5, 1, 0.8), (4, 2, 0.5), (0, 1, 0.0)]
         A = sprand(n, n, p)
         blockdim = bd
         for CuSparseMatrixType1 in (CuSparseMatrixCSC, CuSparseMatrixCSR, CuSparseMatrixCOO, CuSparseMatrixBSR)
-            if CuSparseMatrixType1 == CuSparseMatrixBSR && n == 0 continue end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
+            if CuSparseMatrixType1 == CuSparseMatrixBSR && n == 0
+                continue
+            end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
             dA1 = CuSparseMatrixType1 == CuSparseMatrixBSR ? CuSparseMatrixType1(A, blockdim) : CuSparseMatrixType1(A)
             @testset "conversion $CuSparseMatrixType1 --> SparseMatrixCSC" begin
                 @test SparseMatrixCSC(dA1) ≈ A
             end
             for CuSparseMatrixType2 in (CuSparseMatrixCSC, CuSparseMatrixCSR, CuSparseMatrixCOO, CuSparseMatrixBSR)
-                if CuSparseMatrixType2 == CuSparseMatrixBSR && n == 0 continue end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
+                if CuSparseMatrixType2 == CuSparseMatrixBSR && n == 0
+                    continue
+                end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
                 CuSparseMatrixType1 == CuSparseMatrixType2 && continue
                 dA2 = CuSparseMatrixType2 == CuSparseMatrixBSR ? CuSparseMatrixType2(dA1, blockdim) : CuSparseMatrixType2(dA1)
                 @testset "conversion $CuSparseMatrixType1 --> $CuSparseMatrixType2" begin

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: f565385 Previous: 66ab572 Ratio
latency/precompile 43389661483 ns 43493459423 ns 1.00
latency/ttfp 7120385666 ns 7080947094 ns 1.01
latency/import 3451201809 ns 3436925241 ns 1.00
integration/volumerhs 9608460 ns 9607544 ns 1.00
integration/byval/slices=1 147671 ns 147107 ns 1.00
integration/byval/slices=3 426306 ns 425588 ns 1.00
integration/byval/reference 145509 ns 145074 ns 1.00
integration/byval/slices=2 286705 ns 286272 ns 1.00
integration/cudadevrt 103894 ns 103409 ns 1.00
kernel/indexing 14361 ns 14466 ns 0.99
kernel/indexing_checked 15017.5 ns 15124 ns 0.99
kernel/occupancy 741.0542635658915 ns 706.0551724137931 ns 1.05
kernel/launch 2219.5555555555557 ns 2284.6666666666665 ns 0.97
kernel/rand 14947 ns 14928 ns 1.00
array/reverse/1d 19835 ns 19689 ns 1.01
array/reverse/2d 24990.5 ns 23897 ns 1.05
array/reverse/1d_inplace 10753 ns 10329 ns 1.04
array/reverse/2d_inplace 12581 ns 11801 ns 1.07
array/copy 21065 ns 21231 ns 0.99
array/iteration/findall/int 158829 ns 159142 ns 1.00
array/iteration/findall/bool 139575 ns 139322.5 ns 1.00
array/iteration/findfirst/int 163076.5 ns 164128.5 ns 0.99
array/iteration/findfirst/bool 163490 ns 161608.5 ns 1.01
array/iteration/scalar 71836 ns 73479 ns 0.98
array/iteration/logical 218120.5 ns 218722 ns 1.00
array/iteration/findmin/1d 46885 ns 46880 ns 1.00
array/iteration/findmin/2d 97393.5 ns 97416 ns 1.00
array/reductions/reduce/1d 36047 ns 36474 ns 0.99
array/reductions/reduce/2d 41436 ns 41172 ns 1.01
array/reductions/mapreduce/1d 34404 ns 34723 ns 0.99
array/reductions/mapreduce/2d 41474 ns 51622 ns 0.80
array/broadcast 20978 ns 21035 ns 1.00
array/copyto!/gpu_to_gpu 12614 ns 11372 ns 1.11
array/copyto!/cpu_to_gpu 216953 ns 219501 ns 0.99
array/copyto!/gpu_to_cpu 283758 ns 286844 ns 0.99
array/accumulate/1d 109582 ns 109787 ns 1.00
array/accumulate/2d 80457 ns 80534 ns 1.00
array/construct 1295.7 ns 1267.7 ns 1.02
array/random/randn/Float32 44809 ns 48385.5 ns 0.93
array/random/randn!/Float32 25203 ns 25222 ns 1.00
array/random/rand!/Int64 27351.5 ns 27316 ns 1.00
array/random/rand!/Float32 8914.333333333334 ns 8940.666666666666 ns 1.00
array/random/rand/Int64 30147 ns 30144 ns 1.00
array/random/rand/Float32 13285.5 ns 13204 ns 1.01
array/permutedims/4d 60803 ns 61641 ns 0.99
array/permutedims/2d 55537 ns 55255 ns 1.01
array/permutedims/3d 56341.5 ns 56276 ns 1.00
array/sorting/1d 2777460 ns 2779172 ns 1.00
array/sorting/by 3368859 ns 3370605.5 ns 1.00
array/sorting/2d 1085660 ns 1087247 ns 1.00
cuda/synchronization/stream/auto 1025.5714285714287 ns 1019.1 ns 1.01
cuda/synchronization/stream/nonblocking 7665.5 ns 8133.4 ns 0.94
cuda/synchronization/stream/blocking 849.1919191919192 ns 829.5851063829788 ns 1.02
cuda/synchronization/context/auto 1148.5 ns 1202 ns 0.96
cuda/synchronization/context/nonblocking 8148.1 ns 7569.299999999999 ns 1.08
cuda/synchronization/context/blocking 884.5 ns 910.7058823529412 ns 0.97

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt maleadt requested review from kshyatt and amontoison July 3, 2025 06:19
@maleadt maleadt added bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers. labels Jul 3, 2025
@maleadt
Copy link
Member

maleadt commented Jul 3, 2025

Can you elaborate where the [1] comes from? Shouldn't the API initialize those arrays?

@tam724
Copy link
Author

tam724 commented Jul 3, 2025

Why should the [1] or [0] be there?

By definition of the CSR/CSC format: the colPtr/rowPtr always has (number of col/rows) + 1 elements. The last element is always set to (last index in nzVal) + 1. For 0x0 matrices (so without elements) that is:

  • for zero based indexing: (last index in nzVal) + 1 = -1 + 1 = 0
  • for one based indexing: (last index in nzVal) + 1 = 0 + 1 = 1

Why is it not initialized by the API?

From the CUSPARSE manual (https://docs.nvidia.com/cuda/cusparse/index.html?highlight=CUSPARSE_CSR2CSC_ALG1#cusparsecsr2cscex2):

If m == 0 or n == 0, the pointers are not checked and the routine returns CUSPARSE_STATUS_SUCCESS.

So the API does not do anything for 0x0 matrices. If we initialize the colPtr/rowPtr with zeros for one based indexing, the resulting CSC datastructure is: (colPtr: [0], rowVal = [], nzVal = []) which is not valid.

@maleadt
Copy link
Member

maleadt commented Jul 3, 2025

Thanks.

If m == 0 or n == 0, the pointers are not checked and the routine returns CUSPARSE_STATUS_SUCCESS.

I interpreted this as not reading the input pointers, while still properly initializing the output offset array.

Anyhow, I'm not too familiar with the intricacies of sparse array representations, so LGTM. Let's see if anybody else wants to chime in.

m,n = size(coo)
nnz(coo) == 0 && return CuSparseMatrixCSR{Tv}(CUDA.ones(Cint, m+1), coo.colInd, nonzeros(coo), size(coo))
m, n = size(coo)
csrRowPtr = (index == 'O') ? CUDA.ones(Cint, m + 1) : CUDA.zeros(Cint, m + 1)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just stumbled across these lines. For consistency, the initialization should probably be treated similar as in the CSC/CSR case.

@amontoison
Copy link
Member

Thanks.

If m == 0 or n == 0, the pointers are not checked and the routine returns CUSPARSE_STATUS_SUCCESS.

I interpreted this as not reading the input pointers, while still properly initializing the output offset array.

Anyhow, I'm not too familiar with the intricacies of sparse array representations, so LGTM. Let's see if anybody else wants to chime in.

Should we open a ticket?
AMD is doing correctly the initialization with rocSPARSE.

@maleadt
Copy link
Member

maleadt commented Jul 9, 2025

Yeah I'll file a bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants