Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operations under which block-matrices return block-matrices #31

Closed
willtebbutt opened this issue Aug 24, 2017 · 19 comments
Closed

Operations under which block-matrices return block-matrices #31

willtebbutt opened this issue Aug 24, 2017 · 19 comments

Comments

@willtebbutt
Copy link

I was wondering whether there is interest in someone (me?) implementing operations under which block-matrices are potentially closed (i.e. perhaps return another block-matrix), such as multiplication, inversion, transposition and certain factorisations, at the level of the AbstractBlockMatrix and / or AbstractBlockVector. For example, it would be really nice if in the following

B = BlockArray(Matrix{Float64}, [2, 2], [2, 2])
# fill in the blocks with something.
A = B.'B

both A and B.' were also a block matrices. At the moment these operations appear to return regular dense matrices.

This would be very useful for block matrices who's blocks have special structures where the same structure pops out the other side of such operations (I'm really interested in being able to operate on block-matrices whose blocks are circulant and have sensible things happen).

@KristofferC
Copy link
Collaborator

Yeah, operations that are not specialized will fall back to the AbstractArray versions and will return standard Arrays. One thing that really needs to be done is to implement the broadcasting interface (but that is a bit tricky I think) and the reduction interface (mapreduce & co).

@willtebbutt
Copy link
Author

Hmm yeah. It's not even clear to me what should define broadcast-able arrays in this context. Do we view a block-arrays as a special representation of a regular arrays, meaning that you could broadcast a 4 x 1 matrix with a block matrix like

B = BlockArray(Matrix{Float64}, [2, 2], [2, 2])

or is it more appropriate to say that it would be valid to broadcast B with something like

A = BlockArray(Matrix{Float64}, [2, 2], [2])

since in some sense it's a 2 x 1 block-matrix. Or is it reasonable for both to be valid?

I guess there are similar questions to be asked about the reduction interface: for example, does a reduction over a block matrix whose blocks are all of size 2 x 2 return a 2 x 2 block? Or does it return a scalar? Or, again, is it sensible to somehow support both?

I think certain unary linear algebra operations are quite simple to do e.g. the transpose of a block matrix is always a block matrix, so it seems pretty clear-cut what should happen there. Similarly with matrix multiplication; the result will be a block matrix if each of the blocks that you have to multiply are conformal, but otherwise the result should probably be dense (if you start with dense matrices, I'm not sure about sparse).

@KristofferC
Copy link
Collaborator

WIth broadcast I meant stuff like sin.(B) using the dot notation.https://julialang.org/blog/2017/01/moredots

Linear Algebra would be nice to have.

@willtebbutt
Copy link
Author

Oh okay, so you're just interested in the unary case for now? I was referring the more general case, so things like A .+ sin.(B) if A and B are different sizes?

If you have no objections, I'll put together a PR for transposition to start with.

@KristofferC
Copy link
Collaborator

Sounds good!

@dlfivefifty
Copy link
Member

I'd add matrix algebra to this: for example:

julia> A = BlockArray(rand(4,4), [2,2], [2,2])
2×2-blocked 4×4 BlockArrays.BlockArray{Float64,2,Array{Float64,2}}:
 0.302873  0.1933580.411812   0.830986
 0.658837  0.9324530.0245708  0.415494
 ──────────────────────┼─────────────────────
 0.973492  0.001939050.731272   0.953296
 0.292634  0.3432030.105658   0.417385

julia> B = BlockArray(rand(3,4), [1,1,1], [2,2])
3×2-blocked 3×4 BlockArrays.BlockArray{Float64,2,Array{Float64,2}}:
 0.506225  0.7522940.534993  0.347013
 ────────────────────┼────────────────────
 0.398445  0.5910690.699823  0.783644
 ────────────────────┼────────────────────
 0.189798  0.7120260.485311  0.626346

julia> B*A # expected BlockArray(..., [1,1,1], [2,2])
3×4 Array{Float64,2}:
 1.27132  0.919495  0.654844  1.38808
 1.42069  0.898492  0.773167  1.57091
 1.18233  0.916535  0.516729  1.17764

I'm not sure what to do if the block sizes are not compatible. Maybe make them compatible by taking the "union" of the block sizes? That is, add blocks to get the "least common divisor"-analogue of the block sizes?

@dlfivefifty
Copy link
Member

Here's another one:
A[Block.(1:3)]
should probably also return a BlockArray.

@JimMajor
Copy link

What is the status of this work? Especially I am interested in efficient BlockMatrix*BlockMatrix multiplication.
I need it in my work, so I will write the code nevertheless. Thus, if you are not in middle of implementing it now, I think I can contribute.

@dlfivefifty
Copy link
Member

BlockBandedMatrices.jl has a nice template for how to implement this. It would be good to have it here.

@dlfivefifty
Copy link
Member

Do you require 5-argument variants? (E.g., analogues to BLAS.gemv!?) Do you need Adjoints of block banded matrices as well? This is all implemented in BlockBandedMatrices.jl, but requires LazyArrays.jl, and we should hesitate before that dependency is added here.

The alternative is to overload mul!(::BlockMatrix, ::BlockMatrix, ::BlockMatrix), but the catch is that you ended up with hundreds of overloads for Adjoint for full functionality.

@JimMajor
Copy link

I need to have 5-arguments. Judging from your previous comment I thought that you want LazyArrays.jl to be used. I started reading this package but it is still not really clear for me, so if it is ok with you I will prefer using mul! .
I cannot see what is the problem with adjoint.

@dlfivefifty
Copy link
Member

The reason we would want LazyArrays.jl is that it is much more flexible. For example, the high performance implementation of mul! would automatically work for things like views of block ranges:

A = BlockMatrix(...)
B = BlockMatrix(...)
V = view(A, Block.(1:3), Block.(3:4)) # take a subrange of blocks, then transpose them
C .= α.*Mul(V, B) .+ β.*C # alternative to mul!(C, V, B, α, β), automatically calls fast block multiply (BLAS) routines
C .= Mul(V',B) # alternative to mul!(C, V', B), automatically calls fast block multiply (BLAS) routines

Accomplishing this with standard mul! is virtually impossible as the number of combinations of view, Adjoint, etc. are combinatorially large. Note with LazyArrays.jl, there would still be support for the standard syntax mul!(y, A::BlockBandedMatrix, x::BlockBandedMatrix), so this is an implementation detail hidden from users.

LazyArrays.jl is a bit complicated if you are not familiar with Broadcasted style of code. But it would only take me an hour or so to implement, which I'm happy to do. The bigger issue is that LazyArrays.jl has not "stabilised" yet, so it may be premature.

If you only need mul!(::BlockMatrix, ::BlockMatrix, ::BlockMatrix) (that is, you don't need to work with subviews or adjoints), then I'm happy to accept a PR that adds that, which can be replaced with the more general LazyArrays.jl version down the line.

@tkf
Copy link
Member

tkf commented Nov 25, 2018

I also need some related operations like mul!(::BlockVector, ::BlockMatrix, ::BlockVector) and block-aware (per-block) broadcast on BlockVectors.

I think another benefit of going toward LazyArrays.jl way is that we can "amortize" block size analysis if the same multiply-add is computed repeatedly, that is:

ma = MulAdd(α, A, B, β, C)  # block sizes are analyzed here
for _ in a_lot
    my_compute_input!(ma.B)
    materialize!(ma)
end

Similar trick can be used with BroadcastArray-of-BlockVectors. (Updated: actually this works with standard Broadcasted as well)

But it would only take me an hour or so to implement, which I'm happy to do.

That would be great!

The bigger issue is that LazyArrays.jl has not "stabilised" yet, so it may be premature.

It would be a good bait for getting alpha testers :)

@dlfivefifty
Copy link
Member

"amortize" block size analysis

I don't quite see what you want to analyse. The expensive part of block size creation is done when the matrix A is created, so the only cost in multiplying blocks is to check that the sizes are compatible. This should be negligible compared to multiplication (and essentially zero cost once lazy block sizes are supported).

It would be a good bait for getting alpha testers :)

Maybe I'll prepare a PR and we can have a conversation whether it's premature or not. The only serious outstanding issue that I can think of is whether MulAdd(α, A, B, β, C) should actually by MulAdd(C, A, B, α, β). My logic for the current order is that it's a lazy representation of α * A * B + β *C, not a lazy representation of the assignment operation mul!(C, A, B, α, Β).

@tkf
Copy link
Member

tkf commented Nov 26, 2018

"amortize" block size analysis

I don't quite see what you want to analyse.

I was referring to "least common divisor"-analogue you mentioned in #31 (comment)

This should be negligible compared to multiplication

That would depend on the ratio of number of blocks and average block size, right? Especially when it's block diagonal (which is actually what I have in mind). I know we shouldn't be worrying about performance before any benchmarks, but I just thought it is worth mentioning that there is an optimization opportunity.

whether MulAdd(α, A, B, β, C) should actually by MulAdd(C, A, B, α, β).

It seems MulAdd is not exported so I guess you don't need to worry about API compatibility?

If you want to expose an API, how about a macro (say) @lair to build "LazyArray Intermediate Representation" so that user don't need to explicitly construct MulAdd? What I mean is that @lair α .* A * B .+ β * C is evaluated to MulAdd(α, A, B, β, C). The same macro can also build a BroadcastArray.

@dlfivefifty
Copy link
Member

block diagonal

BlockDiagonal could also live in BlockBandedMatrices.jl.

It seems MulAdd is not exported so I guess you don't need to worry about API compatibility?

It still means to change it you have to put upper bounds in METADATA on dependent packages, but it's not that big of a deal.

how about a macro

There already is the shorthand α .* Mul(A,B) .+ β .* C that lowers to MulAdd, so a macro doesn't add much. A unicode version of Mul(A,B) is preferable to me.

@dlfivefifty
Copy link
Member

FYI it looks like @jagot just wrote a "block join" multiplication code: JuliaLinearAlgebra/BlockBandedMatrices.jl#21

@tkf
Copy link
Member

tkf commented Nov 26, 2018

There already is the shorthand α .* Mul(A,B) .+ β .* C that lowers to MulAdd

But isn't it impossible to evaluate such an expression without calling materialize or materialize!? I thought you were talking about MulAdd API for exposing the way to construct the lazy representation object that can be materialized later.

@dlfivefifty
Copy link
Member

I just added support for broadcasting. Let's get that merged before moving on to Mul/mul!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants