Skip to content

Conversation

@jishnub
Copy link
Member

@jishnub jishnub commented Feb 8, 2022

Currently the real matrix is converted to complex, followed by a complex matmul. This is sub-optimal (given that the imaginary part is all zeros), and we may instead split it as A * (Br + im * Bi) = (A*Br) + im*(A*Bi), where we carry out two matrix multiplications, but both are real. This is what is implemented in this PR.

There are two versions that I could come up with, as listed below, but I've currently gone with the allocating but fastest version. Open to changing this, if the allocations seem excessive.

julia> using LinearAlgebra, BenchmarkTools

julia> function mul2(A::StridedMatrix{<:LinearAlgebra.BlasReal}, B::StridedMatrix{Complex{T}}) where {T<:LinearAlgebra.BlasReal}
           TS = promote_type(eltype(A), eltype(B))
           TSR = promote_type(eltype(A), T)
           C = similar(B, TS, (size(A, 1), size(B, 2)))
           Cr = reinterpret(reshape, TSR, C)
           AB = similar(A, TSR, size(A, 1), size(B, 2))
           BRI = real(B)
           mul!(AB, A, BRI)
           Cr[1, :, :] .= AB
           BRI .= imag.(B)
           mul!(AB, A, BRI)
           Cr[2, :, :] .= AB
           return C
       end
mul2 (generic function with 1 method)

julia> mul3(A, B) = (A * real(B)) .+ im .* (A * imag(B))
mul3 (generic function with 1 method)

julia> function multests(f, A, B)
           C1 = @btime $f($A, $B)
           C2 = @btime $f($A, view($B, :, :))
           C3 = @btime $f(view($A, :, :), $B)
           C4 = @btime $f(view($A, :, :), view($B, :, :))
           C5 = @btime $f(view($A, :, 1:2:size($A, 2)), view($B, 1:2:size($B,1), :))
           return C1, C2, C3, C4, C5
       end;

julia> const A = rand(1000, 1000); const B = complex.(A);


julia> AB1 = multests(*, A, B);
  176.926 ms (4 allocations: 30.52 MiB)
  174.206 ms (4 allocations: 30.52 MiB)
  173.648 ms (4 allocations: 30.52 MiB)
  171.779 ms (4 allocations: 30.52 MiB)
  987.113 ms (7 allocations: 22.92 MiB)

julia> AB2 = multests(mul2, A, B);
  145.858 ms (6 allocations: 30.52 MiB)
  144.687 ms (6 allocations: 30.52 MiB)
  144.297 ms (6 allocations: 30.52 MiB)
  145.443 ms (6 allocations: 30.52 MiB)
  95.931 ms (6 allocations: 26.70 MiB)

julia> AB3 = multests(mul3, A, B);
  106.106 ms (10 allocations: 45.78 MiB)
  105.187 ms (10 allocations: 45.78 MiB)
  106.942 ms (10 allocations: 45.78 MiB)
  121.555 ms (10 allocations: 45.78 MiB)
  65.900 ms (10 allocations: 38.15 MiB)

julia> mapreduce(isapprox, &, AB1, AB2)
true

julia> mapreduce(isapprox, &, AB1, AB3)
true

mul2 already gets us some speed-up, which is considerable (albeit with some allocations) if the matrix is non-contiguous. mul3 is significantly faster than both, so I've used it in this PR.

@dkarrasch dkarrasch added linear algebra Linear algebra performance Must go faster labels Feb 8, 2022
jishnub and others added 2 commits February 8, 2022 16:25
Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>
@dkarrasch dkarrasch merged commit f6def1a into JuliaLang:master Feb 8, 2022
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request Feb 17, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

linear algebra Linear algebra performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants