Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured opnorm #30138

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

mcognetta
Copy link
Contributor

opnorm falls back to generic methods for structured matrices (Diagonal, Bidiagonal, Tridiagonal, SymTridiagonal). For the p=1 and p=Inf cases, this is usually just a regular sum over the rows and columns which can be sped up. The p=2 case falls back to svdvals! for which there are specialized methods for each of the types except for Tridiagonal and SymTridiagonal (see #30137).

This implements faster p=1 and p=Inf opnorms for structured matrices but leaves the p=2 case broken for Tridiagonal and SymTridiagonal.

Some times:

v1.0

julia> D
5000×5000 Diagonal{Float64,Array{Float64,1}}

julia> B
5000×5000 Bidiagonal{Float64,Array{Float64,1}}

julia> T
5000×5000 Tridiagonal{Float64,Array{Float64,1}}

julia> S
5000×5000 SymTridiagonal{Float64,Array{Float64,1}}

# DM, BM, TM, SM are the Matrix(A) versions

julia> @time opnorm(D, 1)
  0.036138 seconds (5 allocations: 176 bytes)
0.9999918087169

julia> @time opnorm(DM, 1)
  0.043063 seconds (5 allocations: 176 bytes)
0.9999918087169

julia> @time opnorm(D, Inf)
  0.047056 seconds (5 allocations: 176 bytes)
0.9999918087169

julia> @time opnorm(DM, Inf)
  0.220518 seconds (5 allocations: 176 bytes)
0.9999918087169

julia> @time opnorm(B, 1)
  0.124054 seconds (5 allocations: 176 bytes)
1.9697821341527078

julia> @time opnorm(BM, 1)
  0.060772 seconds (5 allocations: 176 bytes)
1.9697821341527078

julia> @time opnorm(B, Inf)
  0.119326 seconds (5 allocations: 176 bytes)
1.9814976295253526

julia> @time opnorm(BM, Inf)
  0.221310 seconds (5 allocations: 176 bytes)
1.9814976295253526

julia> @time opnorm(T, 1)
  0.041897 seconds (5 allocations: 176 bytes)
2.937180967575177

julia> @time opnorm(TM, 1)
  0.041170 seconds (5 allocations: 176 bytes)
2.937180967575177

julia> @time opnorm(T, Inf)
  0.044132 seconds (5 allocations: 176 bytes)
2.926084489740691

julia> @time opnorm(TM, Inf)
  0.217616 seconds (5 allocations: 176 bytes)
2.926084489740691

julia> @time opnorm(S, 1)
  0.050122 seconds (5 allocations: 176 bytes)
2.93492627844363

julia> @time opnorm(SM, 1)
  0.042291 seconds (5 allocations: 176 bytes)
2.93492627844363

julia> @time opnorm(S, Inf)
  0.045469 seconds (5 allocations: 176 bytes)
2.93492627844363

julia> @time opnorm(SM, Inf)
  0.218678 seconds (5 allocations: 176 bytes)
2.93492627844363

This PR

julia> @time opnorm(D, 1)
  0.000024 seconds (6 allocations: 192 bytes)
0.9998363448571781

julia> @time opnorm(DM, 1)
  0.039525 seconds (5 allocations: 176 bytes)
0.9998363448571781

julia> @time opnorm(D, Inf)
  0.000019 seconds (6 allocations: 192 bytes)
0.9998363448571781

julia> @time opnorm(DM, Inf)
  0.213459 seconds (5 allocations: 176 bytes)
0.9998363448571781

julia> @time opnorm(B, 1)
  0.000020 seconds (5 allocations: 176 bytes)
1.9953172509905421

julia> @time opnorm(BM, 1)
  0.036928 seconds (5 allocations: 176 bytes)
1.9953172509905421

julia> @time opnorm(B, Inf)
  0.000018 seconds (5 allocations: 176 bytes)
1.9847639686427956

julia> @time opnorm(BM, Inf)
  0.212399 seconds (5 allocations: 176 bytes)
1.9847639686427956

julia> @time opnorm(T, 1)
  0.000056 seconds (5 allocations: 176 bytes)
2.8767832379755576

julia> @time opnorm(TM, 1)
  0.042825 seconds (5 allocations: 176 bytes)
2.8767832379755576

julia> @time opnorm(T, Inf)
  0.000028 seconds (5 allocations: 176 bytes)
2.874610513400908

julia> @time opnorm(TM, Inf)
  0.212551 seconds (5 allocations: 176 bytes)
2.874610513400908

julia> @time opnorm(S, 1)
  0.000020 seconds (5 allocations: 176 bytes)
2.8414045244513093

julia> @time opnorm(SM, 1)
  0.036743 seconds (5 allocations: 176 bytes)
2.8414045244513093

julia> @time opnorm(S, Inf)
  0.000017 seconds (5 allocations: 176 bytes)
2.8414045244513093

julia> @time opnorm(SM, Inf)
  0.212119 seconds (5 allocations: 176 bytes)
2.8414045244513093

return convert(Tnorm, nrm)
end

function _loopnorm1(A::Bidiagonal{T}) where T
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower bidiagonal p=1 norm is the same code as upper bidiagonal p=Inf (and the same with upper p=1 and lower p=Inf). These can be combined but that would reduce readability in my opinion. Thoughts on how to make this cleaner?

One option is to do _upnormInf(A::Bidiagonal) = _loopnorm1(A) and _lonormInf(A::Bidiagonal) = _upnorm1(A) and document it well.

@andreasnoack
Copy link
Member

Thanks for adding these optimizations. However, I think it would be good to write them with a mapreduce since it avoids some code replication. It might also make it possible for these to work on e.g. DistributedArrays without too much work. An example for the Bidiagonal case would be something like

function _opnorm1Inf(B::Bidiagonal, p)
    case = xor(p == 1, istriu(B))
    normd1, normdend = norm(first(B.dv)), norm(last(B.dv))
    normd1, normdend = case ? (zero(normd1), normdend) : (normd1, zero(normdend))
    return max(mapreduce(t -> sum(norm, t), max, zip(view(B.dv, (1:length(B.ev)) .+ !case), B.ev)), normdend)
end

What do you think? Currently, there is a slight overhead from using this formulation but it's not much and nothing if you compare to the current situation. It is also likely that the different will vanish over time as more optimizations will be added to the language.

@mcognetta
Copy link
Contributor Author

Thanks for adding these optimizations. However, I think it would be good to write them with a mapreduce since it avoids some code replication. It might also make it possible for these to work on e.g. DistributedArrays without too much work. An example for the Bidiagonal case would be something like

function _opnorm1Inf(B::Bidiagonal, p)
    case = xor(p == 1, istriu(B))
    normd1, normdend = norm(first(B.dv)), norm(last(B.dv))
    normd1, normdend = case ? (zero(normd1), normdend) : (normd1, zero(normdend))
    return max(mapreduce(t -> sum(norm, t), max, zip(view(B.dv, (1:length(B.ev)) .+ !case), B.ev)), normdend)
end

What do you think? Currently, there is a slight overhead from using this formulation but it's not much and nothing if you compare to the current situation. It is also likely that the different will vanish over time as more optimizations will be added to the language.

That is quite nice, thanks for the suggestion. There are some edge cases (it fails for a 1x1 matrix) but these can be worked around.

One question that I have is how it will be better for DistributedArrays? Is it solely due to the use of mapreduce? The readability of this is (in my opinion) lower than before and if the primary concern is reducing code with a slight overhead an easy method could be made that simply loops over A[i,i-1], A[i,i], A[i, i+1] (and likewise for the other norm) that would work for all of the structured matrices.

@andreasnoack
Copy link
Member

One question that I have is how it will be better for DistributedArrays? Is it solely due to the use of mapreduce?

Yes. Using explicit loops and indexing is often only useful for <:StridedMatrix. If an algorithm can be expressed in terms of higher order functions or, say BLAS operations, then there is a chance that it will work for a much broader class of matrix types. E.g. explicit index operations won't fly for DistributedArrays and CuArrays.

The readability of this is (in my opinion) lower than before

I agree to some extend. A loop is very easy to read but all the (necessary) promotion nonsense (that I've written a lot of myself) is a distraction and is nice to hide in other functions when possible. It is also a lot of repetition of almost the same code. As you mention, it is possible to reuse some of the methods by flipping the upper/lower and 1 to Inf. However, the main argument for using mapreduce is the potential for broader applicability.

@andreasnoack
Copy link
Member

Bump. It would be great to get this merged in some form. Do you think it can be rewritten to use higher-order functions? Otherwise please try to reuse as much as possible as you suggested.

@mcognetta
Copy link
Contributor Author

@andreasnoack Hi, yes I will try to rewrite it. Are you aiming for this to be included in the upcoming backports? I am a bit busy until the end of this month..

@andreasnoack
Copy link
Member

The plan is to have another minor release in about four months so you can just finish this when time allows. It would just be a shame if it stalls completely.

@StefanKarpinski
Copy link
Sponsor Member

Having it in January would be greaat—then it will be easily included in 1.2 with plenty of time for testing.

@mcognetta
Copy link
Contributor Author

@andreasnoack Do you mind taking a look at this?

@mcognetta
Copy link
Contributor Author

Bump

@StefanKarpinski
Copy link
Sponsor Member

@andreasnoack: do you have bandwidth to take a look at this? @mcognetta, can you rebase this and fix the conflict so that we can merge if it passes CI?

@mcognetta
Copy link
Contributor Author

Got it.

There is a stylistic change that may be nice to consider. There are currently 4 nearly identical methods like:

function opnorm(B::Bidiagonal, p::Real=2)
    if p == 2
        return opnorm2(B)
    elseif p == 1 || p == Inf
        return _opnorm1Inf(B, p)
    else
        throw(ArgumentError("invalid p-norm p=$p. Valid: 1, 2, Inf"))
    end
end

This could be reduced to 2 (or even 1, if we change the arguments for the symmetric structured matrices) by making something similar to:

function opnorm(A::Union{Tridiagonal, Bidiagonal}, p::Real=2)
    if p == 2
        return opnorm2(A)
    elseif p == 1 || p == Inf
        return _opnorm1Inf(A, p)
    else
        throw(ArgumentError("invalid p-norm p=$p. Valid: 1, 2, Inf"))
    end
end

Not sure where I would place those methods though.

@andreasnoack
Copy link
Member

It would be great to combine the methods when possible. You can put them in special.jl.

@mcognetta
Copy link
Contributor Author

I combined the callers for Bidiagonal, Tridiagonal, and SymTridigaonal, but left the one for Diagonal separate so that I could handle block diagonal matrices specially. The previous commits did not support block diagonal matrices.

making a test case use `isapprox` instead of `==`.
@mcognetta
Copy link
Contributor Author

sorry for another bump

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented May 1, 2019

Not at all. Thank you for sticking with it! Can we get a final review on this and merge it?

@mcognetta
Copy link
Contributor Author

Bumping this. It seems #32093 added svd (and thus opnorm for p=2) for SymTridiagonal matrices. I know this has been discussed before, but maybe we can just add the same thing for Tridiagonal also for completeness. Then I will remove all of the test_skips.

@StefanKarpinski
Copy link
Sponsor Member

Needs conflicts resolved as well.

@mcognetta
Copy link
Contributor Author

I resolved the merge conflicts. I will start a PR for the svd method (as I mentioned in
#30138) so that this one can go through.

@dgleich
Copy link

dgleich commented Jun 3, 2021

Anything I can do to help here? this bit me today... A = SymTridiagonal(randn(1000000),randn(999999)); @time opnorm(A,1) is a fun command to run...

@mcognetta
Copy link
Contributor Author

Oh I thought that this was done. I'll finish it up and get it submitted. @dgleich any other optimizations you can see here? Open to suggestions.

@dgleich
Copy link

dgleich commented Jun 4, 2021

  1. I don't advocate for any of the following changes, I'm merely pointing them out.
    (In my own work, I struggle with 'good enough to ship' vs. 'as good as I can make it')

Just some high level comments ...

  1. the 'norm' commands are also equally slow and fall-back to the generic implementations. [Most important comment.]

  2. Note that I believe opnorm for diagonal matrices can be defined for general p >= 1. It's just the
    largest norm value of the diagonal. [But if you intend to do that, let me check far more closely.]

  3. To improve code reuse, it seems like the SymTridiagonal case could be unified with the
    Tridiagonal case by just passing the internal arrays. (e.g. _opnorm1Inf(d, l, u))

  4. The generic code defers to opnorm1, opnorm2, opnormInf, so you could consider simply
    implementing these routines directly instead of the entire opnorm function (and then
    deferring to yours) e.g.

     opnorm1(A::Union{SymTridiagonal,Bidiagonal,SymTridiagonal}) = _opnorm1Inf(A,1)
     opnormInf(A::Union{SymTridiagonal,Bidiagonal,SymTridiagonal}) = _opnorm1Inf(A,Inf)
    

    So this would eliminate some of your code. [2nd most important comment...]

@mbauman mbauman added the domain:linear algebra Linear algebra label Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants