Missing implementation of right multiply for QR decomposition #1738

evelyne-ringoot · 2023-01-20T14:25:08Z

Hi, There appears to be missing an implementation of right multiply for QR decomposition.

using LinearAlgebra, CUDA
a=CUDA.randn(3,3)
b=CUDA.randn(3,3)
q=qr(a).Q
q * b # works fine
b * q #results in an error

It results in this error:

ERROR: ArgumentError: cannot take the CPU address of a CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
Stacktrace:
 [1] unsafe_convert(#unused#::Type{Ptr{Float32}}, x::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
 [2] ormqr!(side::Char, trans::Char, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, tau::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
 [3] rmul!(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::LinearAlgebra.QRPackedQ{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})
 [4] *(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Q::LinearAlgebra.QRPackedQ{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})

Version info

Julia Version 1.8.0
CUDA toolkit 11.7, artifact installation
NVIDIA driver 516.54.0, for CUDA 11.7
CUDA driver 11.7

Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.1
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+516.54
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.0
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3050 Laptop GPU (sm_86, 3.370 GiB / 4.000 GiB available)

The text was updated successfully, but these errors were encountered:

maleadt · 2023-01-20T15:10:27Z

Adding this seems easy enough, #1739, but this isn't my domain so I'm not sure how to test this (e.g. what are the properties of x * qr(y).Q). We already have a bunch of qr tests, maybe you can suggest how to cover the added rmul! functionality?

CUDA.jl/test/cusolver/dense.jl

Lines 404 to 469 in 72b0796

    
           @testset "qr" begin 
        
               tol = min(m, n)*eps(real(elty))*(1 + (elty <: Complex)) 
        
               A              = rand(elty, m, n) 
        
               qra            = qr(A) 
        
               d_A            = CuArray(A) 
        
               d_F            = qr(d_A) 
        
               d_RR           = d_F.Q'*d_A 
        
               @test d_RR[1:n,:] ≈ d_F.R atol=tol*norm(A) 
        
               @test norm(d_RR[n+1:end,:]) < tol*norm(A) 
        
               @test size(d_F) == size(A) 
        
               @test size(d_F.Q, 1) == size(A, 1) 
        
               @test det(d_F.Q) ≈ det(collect(d_F.Q * CuMatrix{elty}(I, size(d_F.Q)))) atol=tol*norm(A) 
        
               CUDA.@allowscalar begin 
        
                   qval = d_F.Q[1, 1] 
        
                   @test qval ≈ qra.Q[1, 1] 
        
                   qrstr = sprint(show, MIME"text/plain"(), d_F) 
        
                   if VERSION >= v"1.8-" 
        
                       @test qrstr == "$(typeof(d_F))\nQ factor:\n$(sprint(show, MIME"text/plain"(), d_F.Q))\nR factor:\n$(sprint(show, MIME"text/plain"(), d_F.R))" 
        
                   else 
        
                       @test qrstr == "$(typeof(d_F)) with factors Q and R:\n$(sprint(show, d_F.Q))\n$(sprint(show, d_F.R))" 
        
                   end 
        
               end 
        
               dQ, dR = d_F 
        
               @test collect(dQ*dR) ≈ A 
        
               A              = rand(elty, n, m) 
        
               d_A            = CuArray(A) 
        
               d_F            = qr(d_A) 
        
               @test d_F.Q'*d_A ≈ d_F.R atol=tol*norm(A) 
        
               @test det(d_F.Q) ≈ det(collect(d_F.Q * CuMatrix{elty}(I, size(d_F.Q)))) atol=tol*norm(A) 
        
               A              = rand(elty, m, n) 
        
               d_A            = CuArray(A) 
        
               h_q, h_r       = qr(d_A) 
        
               q, r           = qr(A) 
        
               @test Array(h_q) ≈ Array(q) 
        
               @test collect(CuArray(h_q)) ≈ Array(q) 
        
               @test Array(h_r) ≈ Array(r) 
        
               @test CuArray(h_q) ≈ convert(typeof(d_A), h_q) 
        
               A              = rand(elty, n, m) 
        
               d_A            = CuArray(A) 
        
               h_q, h_r       = qr(d_A) # FixMe! Use iteration protocol when implemented 
        
               q, r           = qr(A) 
        
               @test Array(h_q) ≈ Array(q) 
        
               @test Array(h_r) ≈ Array(r) 
        
               A              = rand(elty, n)  # A and B are vectors 
        
               d_A            = CuArray(A) 
        
               M              = qr(A) 
        
               h_M            = qr(d_A) 
        
               B              = rand(elty, n) 
        
               d_B            = CuArray(B) 
        
               @test Array(M \ B) ≈ Array(h_M \ d_B) 
        
               A              = rand(elty, m, n)  # A is a matrix and B is a vector 
        
               d_A            = CuArray(A) 
        
               M              = qr(A) 
        
               h_M            = qr(d_A) 
        
               B              = rand(elty, m) 
        
               d_B            = CuArray(B) 
        
               @test Array(M \ B) ≈ Array(h_M \ d_B) 
        
               A              = rand(elty, m, n)  # A and B are matrices 
        
               d_A            = CuArray(A) 
        
               M              = qr(A) 
        
               h_M            = qr(d_A) 
        
               B              = rand(elty, m, n) 
        
               d_B            = CuArray(B) 
        
               @test Array(M \ B) ≈ Array(h_M \ d_B) 
        
           end

evelyne-ringoot added the bug Something isn't working label Jan 20, 2023

maleadt mentioned this issue Jan 20, 2023

Implement rmul for qr. #1739

Merged

kshyatt added the cuda libraries Stuff about CUDA library wrappers. label Jan 22, 2023

evelyne-ringoot added a commit to evelyne-ringoot/CUDA.jl that referenced this issue Jan 24, 2023

Added tests for qr rmul and lmul JuliaGPU#1738

4cbfe49

maleadt pushed a commit that referenced this issue Jan 25, 2023

Added tests for qr rmul and lmul #1738

639e78c

maleadt closed this as completed in #1739 Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing implementation of right multiply for QR decomposition #1738

Missing implementation of right multiply for QR decomposition #1738

evelyne-ringoot commented Jan 20, 2023 •

edited

maleadt commented Jan 20, 2023

Missing implementation of right multiply for QR decomposition #1738

Missing implementation of right multiply for QR decomposition #1738

Comments

evelyne-ringoot commented Jan 20, 2023 • edited

maleadt commented Jan 20, 2023

evelyne-ringoot commented Jan 20, 2023 •

edited