-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Performance improvement for Diagonal' * Vector|Matrix #21302
Conversation
First Julia PR - please be gentle :-)
|
base/linalg/diagonal.jl
Outdated
@@ -225,6 +225,9 @@ A_mul_B!(A::AbstractMatrix,B::Diagonal) = scale!(A,B.diag) | |||
A_mul_Bt!(A::AbstractMatrix,B::Diagonal) = scale!(A,B.diag) | |||
A_mul_Bc!(A::AbstractMatrix,B::Diagonal) = scale!(A,conj(B.diag)) | |||
|
|||
Ac_mul_B(A::Diagonal,B::AbstractVector) = conj.(A.diag) .* B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our (c)transpose
is recursive so it should be ctranspose.(A.diag) .* B
. It would be great if you could make the same fix for Ac/t_mul_B!
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, for block diagonal matrices? New PR tomorrow covering both points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You can find a couple of issues where the details have been discussed. Please just update this PR instead of opening a new one.
Fixed for block matrices and a new test added. For Ac/t_mul_B!, I notice that A_mul_B! is actually even slower: begin
D = Diagonal(randn(10000))
v = randn(10000)
vv = similar(v)
@time A_mul_B!(vv, D, v)
@time Ac_mul_B!(vv, D, v)
@time At_mul_B!(vv, D, v)
@time vv .= D * v
@time vv .= Ac_mul_B(D, v)
@time vv .= At_mul_B(D, v)
nothing
end
Should l do a change for that too? BTW should the following work? I wanted to use it in writing my test for transposes. julia> D = Diagonal([[1 2; 3 4], [1 2; 3 4]])
2×2 Diagonal{Array{Int64,2}}:
[1 2; 3 4] ⋅
⋅ [1 2; 3 4]
julia> full(D)
ERROR: MethodError: no method matching zero(::Type{Array{Int64,2}})
Closest candidates are:
zero(::Type{Base.LibGit2.GitHash}) at libgit2/oid.jl:106
zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuildItem}) at pkg/resolve/versionweight.jl:80
zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuild}) at pkg/resolve/versionweight.jl:120
...
Stacktrace:
[1] diagm(::Array{Array{Int64,2},1}, ::Int64) at ./linalg/dense.jl:251
[2] full(::Diagonal{Array{Int64,2}}) at ./linalg/diagonal.jl:56 |
Now with perf fixes for Ac/t_mul_B for Vector and Diagonal. All tests pass. Will push again with squashed commits if you're happy with this. |
base/linalg/diagonal.jl
Outdated
Ac_mul_B(A::Diagonal, v::AbstractVector) = ctranspose.(A.diag) .* v | ||
At_mul_B(A::Diagonal, v::AbstractVector) = transpose.(A.diag) .* v | ||
|
||
A_mul_B!(vout::AbstractVector, A::Diagonal, vin::AbstractVector) = vout .= A * vin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this just be vout .= A.diag .* vin
and thereby avoiding the allocation?
Indeed, and then you don't need to have separate non-! versions. I tried that briefly before but missed that I was measuring compilation time in there too :-( Notes on performance here: https://gist.github.com/georgemarrows/678876779e7292a954a344aa3addeaf1 |
Did you confirm that this version still fixes #21286? I'm not sure that |
Latest push covers D' * Matrix as well as D' * Vector, and adds tests for this combination too. It doesn't address sparse matrices, which were also raised in #21286 - I think that's better as a separate PR (and perhaps a separate bug report). I ran tests/linalg/diagonal.jl, and this gist shows that performance is hugely increased in both Vector and Matrix cases so that transpose and ctranspose cases are now similar to vanilla multiplication. |
Nudging for review when time is available... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a brief review! Thanks for the great first PR @georgemarrows! :)
Includes tests Only addresses diagonal part of 21286, not sparse matrices
Additional thoughts @andreasnoack? Mergeworthy? Best! |
@@ -225,6 +225,16 @@ A_mul_B!(A::AbstractMatrix,B::Diagonal) = scale!(A,B.diag) | |||
A_mul_Bt!(A::AbstractMatrix,B::Diagonal) = scale!(A,B.diag) | |||
A_mul_Bc!(A::AbstractMatrix,B::Diagonal) = scale!(A,conj(B.diag)) | |||
|
|||
# Get ambiguous method if try to unify AbstractVector/AbstractMatrix here using AbstractVecOrMat | |||
A_mul_B!(out::AbstractVector, A::Diagonal, in::AbstractVector) = out .= A.diag .* in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we really shouldn't be syntax-highlighting in
...
Thanks @andreasnoack, I appreciate your guidance and reviews while I thrashed towards a solution. |
You are welcome. Thanks for the contribution. |
Added tests
Only addresses diagonal part of #21286, not sparse matrices
from