Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add in-place kron #31069

Merged
merged 1 commit into from
May 19, 2020
Merged

add in-place kron #31069

merged 1 commit into from
May 19, 2020

Conversation

Roger-luo
Copy link
Contributor

This is useful to avoid extra allocation when A, B is large in kron(A, B). Since the implementation itself is exactly the same with original kron, I guess it's better to have it directly in stdlibs

stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved
end

function kron(A::AbstractMatrix{T}, B::Diagonal{S}) where {T<:Number, S<:Number}
@inline function kron!(C::AbstractMatrix, A::AbstractMatrix, B::Diagonal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also have a @propagate_inbounds, I suppose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, yes!


Like [`kron`](@ref), but stores the results in `C`.

!!! tip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a recognized Documenter thing? I might go with !!! info to be safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tip will be a green block, but info is good as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, I didn't know that. Either is fine then.

stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved
stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved
stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved
@dkarrasch
Copy link
Member

It would be nice to get this finished. @Roger-luo Could you please rebase this onto current master and resolve the merge conflicts? One thing I noticed is that there are occurrences of @inline @Base.propagate_inbounds. If I'm not mistaken, then the second macro already implies that the function is inlined, at least according to its docstring, so the first one would be redundant.

@Roger-luo Roger-luo changed the title add in-place kron WIP: add in-place kron May 6, 2020
Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments that I noticed. Will take a more careful look later.

ptr_range = (1:lB) .+ (colptrC[col]-1)
colptrC[col+1] = colptrC[col] + lA*lB
ptr_range = (1:lB) .+ (C.colptr[col]-1)
C.colptr[col+1] = C.colptr[col] + lA*lB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, there is no C defined in this function...? Or did you intend to rename this to kron! and add a first C argument here? Independent from that, I think one is no longer supposed to refer to colptr via the dot notation, but rather via getcolptr(C), in case you'll need it. Otherwise, CI will throw and fail.

return z
end

kron!(C::SparseMatrixCSC, A::SparseMatrixCSC, x::SparseVector) = kron!(C, A, SparseMatrixCSC(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, I think all the SparseMatrixCSC can be relaxed to AbstractSparseMatrixCSC.

Bounds checking can be disabled by `@inbounds`, but you need to take care of the shape
of `C`, `A`, `B` yourself.
"""
@inline Base.@propagate_inbounds function kron!(C::AbstractMatrix, A::AbstractMatrix, B::AbstractMatrix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@inline Base.@propagate_inbounds function kron!(C::AbstractMatrix, A::AbstractMatrix, B::AbstractMatrix)
Base.@propagate_inbounds function kron!(C::AbstractMatrix, A::AbstractMatrix, B::AbstractMatrix)

should inline already, same below.

@Roger-luo
Copy link
Contributor Author

Hi sorry, I found the kron seems to changed a lot since last year, thus I rewrite this PR and made force push to this branch (so I don't need to rebase I guess) Let me know if this looks good now.

I also polished the Base.@propagate_inbounds part of the kron!, but I'm actually not sure this is right way to do this, since other in place functions such as generic_matmul! etc. actually just leave the bounds checks inside the implementation. Maybe I should just remove all the Base.@propagate_inbounds in this PR?

@Roger-luo Roger-luo changed the title WIP: add in-place kron add in-place kron May 7, 2020
@elisno
Copy link

elisno commented May 7, 2020

This is impressive work!

I've been looking for ways to repeatedly compute large Kronecker sums with a fixed structure. kron! will definitely help with this task.

Below are some benchmarks showing the speedup kron! offers when calculating a single term in a Kronecker sum.

Below are some benchmarks for calculating a single term in a Kronecker sum.
kron!(C::AbstractMatrix, A::AbstractMatrix, B::Diagonal) gives a 36x speedup for dense products!

Dense

using BenchmarkTools
using LinearAlgebra
using SparseArrays

for N in 2 .^ [0,1,2,3,4,5,6,7]
    A = rand(N,N); B = Diagonal(oneunit(A)); C = kron(A,B);
    println("Out-of-place N=$N")
    @btime kron($A,$B)
    println("In-place N=$N")
    @btime kron!($C,$A,$B)
    println("")
end
Click to expand timings
Out-of-place N=1
  45.244 ns (1 allocation: 96 bytes)
In-place N=1
  15.453 ns (0 allocations: 0 bytes)

Out-of-place N=2
  60.918 ns (1 allocation: 208 bytes)
In-place N=2
  25.499 ns (0 allocations: 0 bytes)

Out-of-place N=4
  569.928 ns (1 allocation: 2.13 KiB)
In-place N=4
  87.182 ns (0 allocations: 0 bytes)

Out-of-place N=8
  4.079 μs (2 allocations: 32.08 KiB)
In-place N=8
  617.090 ns (0 allocations: 0 bytes)

Out-of-place N=16
  64.967 μs (2 allocations: 512.08 KiB)
In-place N=16
  13.821 μs (0 allocations: 0 bytes)

Out-of-place N=32
  1.149 ms (2 allocations: 8.00 MiB)
In-place N=32
  162.917 μs (0 allocations: 0 bytes)

Out-of-place N=64
  53.302 ms (2 allocations: 128.00 MiB)
In-place N=64
  2.081 ms (0 allocations: 0 bytes)

Out-of-place N=128
  754.902 ms (2 allocations: 2.00 GiB)
In-place N=128
  20.843 ms (0 allocations: 0 bytes)

Sparse

p = 1.0
for N in 2 .^ [0,1,2,3,4,5,6,7]
    A = sprand(N,N,p); B = oneunit(A); C = kron(A,B);
    println("Out-of-place N=$N")
    @btime kron($A,$B)
    println("In-place N=$N")
    @btime kron!($C,$A,$B)
    println("")
end
Click to expand timings
Out-of-place N=1
  132.470 ns (3 allocations: 288 bytes)
In-place N=1
  27.849 ns (0 allocations: 0 bytes)

Out-of-place N=2
  200.434 ns (3 allocations: 416 bytes)
In-place N=2
  83.813 ns (0 allocations: 0 bytes)

Out-of-place N=4
  659.133 ns (3 allocations: 1.44 KiB)
In-place N=4
  435.384 ns (0 allocations: 0 bytes)

Out-of-place N=8
  4.371 μs (3 allocations: 8.86 KiB)
In-place N=8
  2.997 μs (0 allocations: 0 bytes)

Out-of-place N=16
  29.972 μs (5 allocations: 66.34 KiB)
In-place N=16
  22.710 μs (0 allocations: 0 bytes)

Out-of-place N=32
  226.747 μs (5 allocations: 520.34 KiB)
In-place N=32
  182.200 μs (0 allocations: 0 bytes)

Out-of-place N=64
  1.797 ms (6 allocations: 4.03 MiB)
In-place N=64
  1.420 ms (0 allocations: 0 bytes)

Out-of-place N=128
  14.451 ms (6 allocations: 32.13 MiB)
In-place N=128
  11.239 ms (0 allocations: 0 bytes)

@Roger-luo
Copy link
Contributor Author

Roger-luo commented May 8, 2020

Hi, @elisno I'm glad this helps.

Regarding large Kronecker sum, I'm not sure about your problem, but if your matrices are more special (e.g general permutation matrices) we have an even faster implementation (as a summation of quantum circuits) in Yao. You may want to check that out.

@MichielStock
Copy link

I would love to use this in my Kronecker package, as suggested in this issue. What would be the best approach to having access to this before it is shipped in stable Julia version?

@dkarrasch
Copy link
Member

I would love to use this in my Kronecker package, as suggested in this issue. What would be the best approach to having access to this before it is shipped in stable Julia version?

First this needs to get merged, and then maybe someone could add those parts of this PR that are a new feature to Compat.jl? Or you have a if VERSION < "" ... end block in which you define what you need.

@Roger-luo
Copy link
Contributor Author

bump. I think I've fixed all the tests. any chance to review this? @dkarrasch

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice work, indeed, and consistent with all the *-mul! business. I only have a couple of non-functional comments, so this should be quick. I'd only like to ask not to overwrite the PR by push-forced, so that I can quickly review afterwards.

Comment on lines +545 to +546
function kron! end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this good for? For stack traces or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because kron is defined in Base (https://github.com/JuliaLang/julia/blob/master/base/operators.jl#L533), thus all other stdlibs need to import Base: kron, I think it'd be more consistent to define the generic function in Base instead of LinearAlgebra.

stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved
stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved
stdlib/LinearAlgebra/src/diagonal.jl Show resolved Hide resolved
stdlib/LinearAlgebra/src/diagonal.jl Outdated Show resolved Hide resolved
stdlib/SparseArrays/src/linalg.jl Outdated Show resolved Hide resolved
@dkarrasch
Copy link
Member

Some changes in strings/search probably unintentionally leaked into this PR, and there is a trailing whitespace issue.

base/strings/search.jl Outdated Show resolved Hide resolved
@Roger-luo
Copy link
Contributor Author

the failed test seems unrelated to this PR. I've reverted the extra commit from this PR.

@dkarrasch dkarrasch closed this May 12, 2020
@dkarrasch
Copy link
Member

Let's try again.

@dkarrasch dkarrasch reopened this May 12, 2020
@Roger-luo
Copy link
Contributor Author

looks good now, all tests pass.

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. I think we may want to announce this in NEWS.md, since this is of interest to external packages as well. I'll leave this open for a few days, hoping that another pair of eyes takes a look.

@dkarrasch
Copy link
Member

@Roger-luo Could you please add an item to NEWS.md, announcing this feature? You may need to rebase the PR, but I'm not sure, to add to the Julia v1.6 version of NEWS.md.

@Roger-luo Roger-luo force-pushed the in-place-kron branch 4 times, most recently from 5d08842 to 23bfba1 Compare May 14, 2020 11:10
add sparse arrays
@Roger-luo
Copy link
Contributor Author

The failed test on MacOS seems unrelated.

@Roger-luo
Copy link
Contributor Author

bump, anyone else could take a look at this PR maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants