fix conversion of 0x0 CuSparseMatrixCSC <-> CSR #2806

tam724 · 2025-07-01T07:31:00Z

The conversion between CuSparseMatrixCSC and CuSparseMatrixCSR creates an invalid colPtr/rowPtr field for the edge case of 0x0 matrices.

MWE:

julia> using SparseArrays, CUDA, CUDA.CUSPARSE

julia> A = cu(sprand(0, 0, 1.0))
0×0 CuSparseMatrixCSC{Float32, Int32} with 0 stored entries

julia> A_CSR = CUDA.CUSPARSE.CuSparseMatrixCSR(A)
0×0 CuSparseMatrixCSR{Float32, Int32} with 0 stored entries

julia> collect(A_CSR)
ERROR: ArgumentError: 1 == colptr[1] != 1
[...]

julia> A_CSC = CUDA.CUSPARSE.CuSparseMatrixCSC(A_CSR)
0×0 CuSparseMatrixCSC{Float32, Int32} with 0 stored entries

julia> collect(A_CSC)
ERROR: ArgumentError: 1 == colptr[1] != 1
[...]

julia> A_CSC.colPtr[:] .= 1 # fix by hand
1-element CuArray{Int32, 1, CUDA.DeviceMemory}:
 1

julia> collect(A_CSC)
0×0 Matrix{Float32}

I fixed this by initializing with CUDA.ones instead of CUDA.zeros in the conversion methods.

I also added this edgecase to the conversion tests, but had to exclude the BSR format for the n=0 case, where already the conversion errors.

julia> A_BSR = CUDA.CUSPARSE.CuSparseMatrixBSR(A, 1)
ERROR: CUSPARSEError: invalid value (code 3, CUSPARSE_STATUS_INVALID_VALUE)
[...]

The error for BSR is okay, IMO. But for CSC and CSR, where the conversion works, the final object should be a valid sparse matrix.

github-actions · 2025-07-01T07:31:40Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.

diff --git a/lib/cusparse/conversions.jl b/lib/cusparse/conversions.jl
index e477e049d..511ad2b81 100644
--- a/lib/cusparse/conversions.jl
+++ b/lib/cusparse/conversions.jl
@@ -332,7 +332,7 @@ for elty in (:Float32, :Float64, :ComplexF32, :ComplexF64)
     @eval begin
         function CuSparseMatrixCSC{$elty}(csr::CuSparseMatrixCSR{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
             m,n = size(csr)
-            colPtr = (index == 'O') ? CUDA.ones(Cint, n+1) : CUDA.zeros(Cint, n+1)
+            colPtr = (index == 'O') ? CUDA.ones(Cint, n + 1) : CUDA.zeros(Cint, n + 1)
             rowVal = CUDA.zeros(Cint, nnz(csr))
             nzVal = CUDA.zeros($elty, nnz(csr))
             function bufferSize()
@@ -352,7 +352,7 @@ for elty in (:Float32, :Float64, :ComplexF32, :ComplexF64)
 
         function CuSparseMatrixCSR{$elty}(csc::CuSparseMatrixCSC{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
             m,n    = size(csc)
-            rowPtr = (index == 'O') ? CUDA.ones(Cint, m+1) : CUDA.zeros(Cint, m+1)
+            rowPtr = (index == 'O') ? CUDA.ones(Cint, m + 1) : CUDA.zeros(Cint, m + 1)
             colVal = CUDA.zeros(Cint,nnz(csc))
             nzVal  = CUDA.zeros($elty,nnz(csc))
             function bufferSize()
@@ -379,7 +379,7 @@ for (elty, welty) in ((:Float16, :Float32),
     @eval begin
         function CuSparseMatrixCSC{$elty}(csr::CuSparseMatrixCSR{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
             m,n = size(csr)
-            colPtr = (index == 'O') ? CUDA.ones(Cint, n+1) : CUDA.zeros(Cint, n+1)
+            colPtr = (index == 'O') ? CUDA.ones(Cint, n + 1) : CUDA.zeros(Cint, n + 1)
             rowVal = CUDA.zeros(Cint, nnz(csr))
             nzVal = CUDA.zeros($elty, nnz(csr))
             if $elty == Float16 #broken for ComplexF16?
@@ -405,7 +405,7 @@ for (elty, welty) in ((:Float16, :Float32),
 
         function CuSparseMatrixCSR{$elty}(csc::CuSparseMatrixCSC{$elty}; index::SparseChar='O', action::cusparseAction_t=CUSPARSE_ACTION_NUMERIC, algo::cusparseCsr2CscAlg_t=CUSPARSE_CSR2CSC_ALG1)
             m,n    = size(csc)
-            rowPtr = (index == 'O') ? CUDA.ones(Cint, m+1) : CUDA.zeros(Cint, m+1)
+            rowPtr = (index == 'O') ? CUDA.ones(Cint, m + 1) : CUDA.zeros(Cint, m + 1)
             colVal = CUDA.zeros(Cint,nnz(csc))
             nzVal  = CUDA.zeros($elty,nnz(csc))
             if $elty == Float16 #broken for ComplexF16?
diff --git a/test/libraries/cusparse/conversions.jl b/test/libraries/cusparse/conversions.jl
index f946fd196..171b30f10 100644
--- a/test/libraries/cusparse/conversions.jl
+++ b/test/libraries/cusparse/conversions.jl
@@ -119,13 +119,17 @@ for (n, bd, p) in [(100, 5, 0.02), (5, 1, 0.8), (4, 2, 0.5), (0, 1, 0.0)]
         A = sprand(n, n, p)
         blockdim = bd
         for CuSparseMatrixType1 in (CuSparseMatrixCSC, CuSparseMatrixCSR, CuSparseMatrixCOO, CuSparseMatrixBSR)
-            if CuSparseMatrixType1 == CuSparseMatrixBSR && n == 0 continue end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
+            if CuSparseMatrixType1 == CuSparseMatrixBSR && n == 0
+                continue
+            end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
             dA1 = CuSparseMatrixType1 == CuSparseMatrixBSR ? CuSparseMatrixType1(A, blockdim) : CuSparseMatrixType1(A)
             @testset "conversion $CuSparseMatrixType1 --> SparseMatrixCSC" begin
                 @test SparseMatrixCSC(dA1) ≈ A
             end
             for CuSparseMatrixType2 in (CuSparseMatrixCSC, CuSparseMatrixCSR, CuSparseMatrixCOO, CuSparseMatrixBSR)
-                if CuSparseMatrixType2 == CuSparseMatrixBSR && n == 0 continue end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
+                if CuSparseMatrixType2 == CuSparseMatrixBSR && n == 0
+                    continue
+                end # TODO: conversion to CuSparseMatrixBSR breaks with (0x0) matrices
                 CuSparseMatrixType1 == CuSparseMatrixType2 && continue
                 dA2 = CuSparseMatrixType2 == CuSparseMatrixBSR ? CuSparseMatrixType2(dA1, blockdim) : CuSparseMatrixType2(dA1)
                 @testset "conversion $CuSparseMatrixType1 --> $CuSparseMatrixType2" begin

github-actions

CUDA.jl Benchmarks

Benchmark suite	Current: `f565385`	Previous: `66ab572`	Ratio
`latency/precompile`	`43389661483` ns	`43493459423` ns	`1.00`
`latency/ttfp`	`7120385666` ns	`7080947094` ns	`1.01`
`latency/import`	`3451201809` ns	`3436925241` ns	`1.00`
`integration/volumerhs`	`9608460` ns	`9607544` ns	`1.00`
`integration/byval/slices=1`	`147671` ns	`147107` ns	`1.00`
`integration/byval/slices=3`	`426306` ns	`425588` ns	`1.00`
`integration/byval/reference`	`145509` ns	`145074` ns	`1.00`
`integration/byval/slices=2`	`286705` ns	`286272` ns	`1.00`
`integration/cudadevrt`	`103894` ns	`103409` ns	`1.00`
`kernel/indexing`	`14361` ns	`14466` ns	`0.99`
`kernel/indexing_checked`	`15017.5` ns	`15124` ns	`0.99`
`kernel/occupancy`	`741.0542635658915` ns	`706.0551724137931` ns	`1.05`
`kernel/launch`	`2219.5555555555557` ns	`2284.6666666666665` ns	`0.97`
`kernel/rand`	`14947` ns	`14928` ns	`1.00`
`array/reverse/1d`	`19835` ns	`19689` ns	`1.01`
`array/reverse/2d`	`24990.5` ns	`23897` ns	`1.05`
`array/reverse/1d_inplace`	`10753` ns	`10329` ns	`1.04`
`array/reverse/2d_inplace`	`12581` ns	`11801` ns	`1.07`
`array/copy`	`21065` ns	`21231` ns	`0.99`
`array/iteration/findall/int`	`158829` ns	`159142` ns	`1.00`
`array/iteration/findall/bool`	`139575` ns	`139322.5` ns	`1.00`
`array/iteration/findfirst/int`	`163076.5` ns	`164128.5` ns	`0.99`
`array/iteration/findfirst/bool`	`163490` ns	`161608.5` ns	`1.01`
`array/iteration/scalar`	`71836` ns	`73479` ns	`0.98`
`array/iteration/logical`	`218120.5` ns	`218722` ns	`1.00`
`array/iteration/findmin/1d`	`46885` ns	`46880` ns	`1.00`
`array/iteration/findmin/2d`	`97393.5` ns	`97416` ns	`1.00`
`array/reductions/reduce/1d`	`36047` ns	`36474` ns	`0.99`
`array/reductions/reduce/2d`	`41436` ns	`41172` ns	`1.01`
`array/reductions/mapreduce/1d`	`34404` ns	`34723` ns	`0.99`
`array/reductions/mapreduce/2d`	`41474` ns	`51622` ns	`0.80`
`array/broadcast`	`20978` ns	`21035` ns	`1.00`
`array/copyto!/gpu_to_gpu`	`12614` ns	`11372` ns	`1.11`
`array/copyto!/cpu_to_gpu`	`216953` ns	`219501` ns	`0.99`
`array/copyto!/gpu_to_cpu`	`283758` ns	`286844` ns	`0.99`
`array/accumulate/1d`	`109582` ns	`109787` ns	`1.00`
`array/accumulate/2d`	`80457` ns	`80534` ns	`1.00`
`array/construct`	`1295.7` ns	`1267.7` ns	`1.02`
`array/random/randn/Float32`	`44809` ns	`48385.5` ns	`0.93`
`array/random/randn!/Float32`	`25203` ns	`25222` ns	`1.00`
`array/random/rand!/Int64`	`27351.5` ns	`27316` ns	`1.00`
`array/random/rand!/Float32`	`8914.333333333334` ns	`8940.666666666666` ns	`1.00`
`array/random/rand/Int64`	`30147` ns	`30144` ns	`1.00`
`array/random/rand/Float32`	`13285.5` ns	`13204` ns	`1.01`
`array/permutedims/4d`	`60803` ns	`61641` ns	`0.99`
`array/permutedims/2d`	`55537` ns	`55255` ns	`1.01`
`array/permutedims/3d`	`56341.5` ns	`56276` ns	`1.00`
`array/sorting/1d`	`2777460` ns	`2779172` ns	`1.00`
`array/sorting/by`	`3368859` ns	`3370605.5` ns	`1.00`
`array/sorting/2d`	`1085660` ns	`1087247` ns	`1.00`
`cuda/synchronization/stream/auto`	`1025.5714285714287` ns	`1019.1` ns	`1.01`
`cuda/synchronization/stream/nonblocking`	`7665.5` ns	`8133.4` ns	`0.94`
`cuda/synchronization/stream/blocking`	`849.1919191919192` ns	`829.5851063829788` ns	`1.02`
`cuda/synchronization/context/auto`	`1148.5` ns	`1202` ns	`0.96`
`cuda/synchronization/context/nonblocking`	`8148.1` ns	`7569.299999999999` ns	`1.08`
`cuda/synchronization/context/blocking`	`884.5` ns	`910.7058823529412` ns	`0.97`

This comment was automatically generated by workflow using github-action-benchmark.

maleadt · 2025-07-03T06:35:57Z

Can you elaborate where the [1] comes from? Shouldn't the API initialize those arrays?

tam724 · 2025-07-03T09:16:37Z

Why should the [1] or [0] be there?

By definition of the CSR/CSC format: the colPtr/rowPtr always has (number of col/rows) + 1 elements. The last element is always set to (last index in nzVal) + 1. For 0x0 matrices (so without elements) that is:

for zero based indexing: (last index in nzVal) + 1 = -1 + 1 = 0
for one based indexing: (last index in nzVal) + 1 = 0 + 1 = 1

Why is it not initialized by the API?

From the CUSPARSE manual (https://docs.nvidia.com/cuda/cusparse/index.html?highlight=CUSPARSE_CSR2CSC_ALG1#cusparsecsr2cscex2):

If m == 0 or n == 0, the pointers are not checked and the routine returns CUSPARSE_STATUS_SUCCESS.

So the API does not do anything for 0x0 matrices. If we initialize the colPtr/rowPtr with zeros for one based indexing, the resulting CSC datastructure is: (colPtr: [0], rowVal = [], nzVal = []) which is not valid.

maleadt · 2025-07-03T09:36:11Z

Thanks.

If m == 0 or n == 0, the pointers are not checked and the routine returns CUSPARSE_STATUS_SUCCESS.

I interpreted this as not reading the input pointers, while still properly initializing the output offset array.

Anyhow, I'm not too familiar with the intricacies of sparse array representations, so LGTM. Let's see if anybody else wants to chime in.

tam724 · 2025-07-05T17:27:11Z

lib/cusparse/conversions.jl

-    m,n = size(coo)
-    nnz(coo) == 0 && return CuSparseMatrixCSR{Tv}(CUDA.ones(Cint, m+1), coo.colInd, nonzeros(coo), size(coo))
+    m, n = size(coo)
+    csrRowPtr = (index == 'O') ? CUDA.ones(Cint, m + 1) : CUDA.zeros(Cint, m + 1)


Just stumbled across these lines. For consistency, the initialization should probably be treated similar as in the CSC/CSR case.

amontoison · 2025-07-06T03:46:45Z

Thanks.

If m == 0 or n == 0, the pointers are not checked and the routine returns CUSPARSE_STATUS_SUCCESS.

I interpreted this as not reading the input pointers, while still properly initializing the output offset array.

Anyhow, I'm not too familiar with the intricacies of sparse array representations, so LGTM. Let's see if anybody else wants to chime in.

Should we open a ticket?
AMD is doing correctly the initialization with rocSPARSE.

maleadt · 2025-07-09T17:08:49Z

Yeah I'll file a bug report.

fix conversion of 0x0 CuSparseMatrixCSC <-> CSR

f565385

github-actions bot reviewed Jul 1, 2025

View reviewed changes

maleadt requested review from kshyatt and amontoison July 3, 2025 06:19

maleadt added bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers. labels Jul 3, 2025

tam724 and others added 2 commits July 5, 2025 15:11

Merge branch 'master' into bugfix_0x0_sparse_conversion

822437c

Respect zero/one based indexing in COO->CSC/CSR conversions

059eeef

tam724 commented Jul 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix conversion of 0x0 CuSparseMatrixCSC <-> CSR #2806

fix conversion of 0x0 CuSparseMatrixCSC <-> CSR #2806

tam724 commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

maleadt commented Jul 3, 2025

Uh oh!

tam724 commented Jul 3, 2025

Uh oh!

maleadt commented Jul 3, 2025

Uh oh!

tam724 Jul 5, 2025

Uh oh!

amontoison commented Jul 6, 2025

Uh oh!

maleadt commented Jul 9, 2025

Uh oh!

Uh oh!

fix conversion of 0x0 CuSparseMatrixCSC <-> CSR #2806

Are you sure you want to change the base?

fix conversion of 0x0 CuSparseMatrixCSC <-> CSR #2806

Conversation

tam724 commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

maleadt commented Jul 3, 2025

Uh oh!

tam724 commented Jul 3, 2025

Why should the [1] or [0] be there?

Why is it not initialized by the API?

Uh oh!

maleadt commented Jul 3, 2025

Uh oh!

tam724 Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

amontoison commented Jul 6, 2025

Uh oh!

maleadt commented Jul 9, 2025

Uh oh!

Uh oh!

github-actions bot commented Jul 1, 2025 •

edited

Loading