Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Performance improvement for Diagonal' * Vector|Matrix #21302

Merged
merged 1 commit into from
Apr 18, 2017

Conversation

georgemarrows
Copy link
Contributor

Added tests
Only addresses diagonal part of #21286, not sparse matrices

julia> include("test.jl")
  0.000014 seconds (6 allocations: 78.359 KiB)
  0.000012 seconds (6 allocations: 78.359 KiB)
  0.000012 seconds (6 allocations: 78.359 KiB)

from

begin
	D = Diagonal(randn(10000))
	v = randn(10000)
	@time D * v
	@time D' * v
	@time D.' * v
end

@georgemarrows
Copy link
Contributor Author

First Julia PR - please be gentle :-)
Potential problem, so far as I can see:

  • tests A'*B, not Ac/t_mul_B directly
  • no performance tests
  • lack of generality? - Diagonal' * Vector only

@@ -225,6 +225,9 @@ A_mul_B!(A::AbstractMatrix,B::Diagonal) = scale!(A,B.diag)
A_mul_Bt!(A::AbstractMatrix,B::Diagonal) = scale!(A,B.diag)
A_mul_Bc!(A::AbstractMatrix,B::Diagonal) = scale!(A,conj(B.diag))

Ac_mul_B(A::Diagonal,B::AbstractVector) = conj.(A.diag) .* B
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our (c)transpose is recursive so it should be ctranspose.(A.diag) .* B. It would be great if you could make the same fix for Ac/t_mul_B!.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, for block diagonal matrices? New PR tomorrow covering both points.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You can find a couple of issues where the details have been discussed. Please just update this PR instead of opening a new one.

@georgemarrows
Copy link
Contributor Author

Fixed for block matrices and a new test added.

For Ac/t_mul_B!, I notice that A_mul_B! is actually even slower:

begin
	D = Diagonal(randn(10000))
	v = randn(10000)
	vv = similar(v)
	@time A_mul_B!(vv, D, v)
	@time Ac_mul_B!(vv, D, v)
	@time At_mul_B!(vv, D, v)

	@time vv .= D * v
	@time vv .= Ac_mul_B(D, v)
	@time vv .= At_mul_B(D, v)
	nothing
end
 1.570805 seconds (4 allocations: 160 bytes)
  1.095022 seconds (4 allocations: 160 bytes)
  1.093878 seconds (4 allocations: 160 bytes)
  0.000030 seconds (6 allocations: 78.359 KiB)
  0.000019 seconds (6 allocations: 78.359 KiB)
  0.000017 seconds (6 allocations: 78.359 KiB)

Should l do a change for that too?
Are the vv .= Ac_mul_B(D, v) style implementations ok? They're way faster but allocate more.
Should block diagonals work here too?

BTW should the following work? I wanted to use it in writing my test for transposes.

julia> D = Diagonal([[1 2; 3 4], [1 2; 3 4]])
2×2 Diagonal{Array{Int64,2}}:
 [1 2; 3 4]           
            [1 2; 3 4]

julia> full(D)
ERROR: MethodError: no method matching zero(::Type{Array{Int64,2}})
Closest candidates are:
  zero(::Type{Base.LibGit2.GitHash}) at libgit2/oid.jl:106
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuildItem}) at pkg/resolve/versionweight.jl:80
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuild}) at pkg/resolve/versionweight.jl:120
  ...
Stacktrace:
 [1] diagm(::Array{Array{Int64,2},1}, ::Int64) at ./linalg/dense.jl:251
 [2] full(::Diagonal{Array{Int64,2}}) at ./linalg/diagonal.jl:56

@georgemarrows
Copy link
Contributor Author

Now with perf fixes for Ac/t_mul_B for Vector and Diagonal. All tests pass. Will push again with squashed commits if you're happy with this.

Ac_mul_B(A::Diagonal, v::AbstractVector) = ctranspose.(A.diag) .* v
At_mul_B(A::Diagonal, v::AbstractVector) = transpose.(A.diag) .* v

A_mul_B!(vout::AbstractVector, A::Diagonal, vin::AbstractVector) = vout .= A * vin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this just be vout .= A.diag .* vin and thereby avoiding the allocation?

@georgemarrows
Copy link
Contributor Author

Indeed, and then you don't need to have separate non-! versions. I tried that briefly before but missed that I was measuring compilation time in there too :-(

Notes on performance here: https://gist.github.com/georgemarrows/678876779e7292a954a344aa3addeaf1

@andreasnoack
Copy link
Member

Did you confirm that this version still fixes #21286? I'm not sure that D::Diagonal'*B::Matrix dispatches to the two argument Ac_mul_B!s in the current version.

@georgemarrows
Copy link
Contributor Author

georgemarrows commented Apr 11, 2017

Latest push covers D' * Matrix as well as D' * Vector, and adds tests for this combination too. It doesn't address sparse matrices, which were also raised in #21286 - I think that's better as a separate PR (and perhaps a separate bug report).

I ran tests/linalg/diagonal.jl, and this gist shows that performance is hugely increased in both Vector and Matrix cases so that transpose and ctranspose cases are now similar to vanilla multiplication.

@georgemarrows
Copy link
Contributor Author

Nudging for review when time is available...

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a brief review! Thanks for the great first PR @georgemarrows! :)

Includes tests
Only addresses diagonal part of 21286, not sparse matrices
@georgemarrows
Copy link
Contributor Author

Thanks @Sacha0. I've rebased and squashed commits.
make test all good locally, and we still have the performance improvements in this gist.
PR title changed to "RFC:"

@georgemarrows georgemarrows changed the title WIP: Fix #21286, improve perf of Diagonal' * Vector RFC: Performance improvement for Diagonal' * Vector|Matrix Apr 18, 2017
@Sacha0
Copy link
Member

Sacha0 commented Apr 18, 2017

Additional thoughts @andreasnoack? Mergeworthy? Best!

@@ -225,6 +225,16 @@ A_mul_B!(A::AbstractMatrix,B::Diagonal) = scale!(A,B.diag)
A_mul_Bt!(A::AbstractMatrix,B::Diagonal) = scale!(A,B.diag)
A_mul_Bc!(A::AbstractMatrix,B::Diagonal) = scale!(A,conj(B.diag))

# Get ambiguous method if try to unify AbstractVector/AbstractMatrix here using AbstractVecOrMat
A_mul_B!(out::AbstractVector, A::Diagonal, in::AbstractVector) = out .= A.diag .* in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really shouldn't be syntax-highlighting in...

@andreasnoack andreasnoack merged commit b0c7084 into JuliaLang:master Apr 18, 2017
@georgemarrows
Copy link
Contributor Author

Thanks @andreasnoack, I appreciate your guidance and reviews while I thrashed towards a solution.

@andreasnoack
Copy link
Member

You are welcome. Thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:linear algebra Linear algebra performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants