Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cbrt(A::AbstractMatrix{<:Real}) #50661

Merged
merged 35 commits into from
Dec 5, 2023

Conversation

aravindh-krishnamoorthy
Copy link
Contributor

@aravindh-krishnamoorthy aravindh-krishnamoorthy commented Jul 24, 2023

This PR resolves #47513.
Baseline: 1.10.0-alpha1 (rebased to master)

Summary of the Implementation

  1. cbrt(A::AbstractMatrix) is delegated to A^(1//3). (Method removed.)
  2. For real-valued Diagonal and Symmetric matrices, straightforward implementations based on scalar cbrt and eigen are provided.
  3. For real-valued arbitrary matrices, a schur decomposition and [1, Algorithm 4.3]-based algorithm (_cbrt_quasi_triu!) is provided.
  4. For real-valued upper and lower triangular matrices, straightforward solutions based on _cbrt_quasi_triu! are provided.

Note: See comment below for details on the algorithm.

References

[1]   Smith, M. I. (2003). A Schur Algorithm for Computing Matrix pth Roots. In SIAM Journal on Matrix Analysis and Applications (Vol. 24, Issue 4, pp. 971–989). https://doi.org/10.1137/s0895479801392697

Pending Items

  • Memory and MIPS optimisations (trade-off) for _cbrt_quasi_triu!.

Tests

File: stdlib/LinearAlgebra/test/dense.jl
Testset: @testset "cbrt(A::AbstractMatrix{T})"

Test Summary:              | Pass  Total  Time
cbrt(A::AbstractMatrix{T}) |   17     17  4.1s

Memory and MIPS

julia> A = randn(1000,1000) ;
julia> @btime cbrt(A) ;
  836.035 ms (132377 allocations: 42.71 MiB)
julia> @btime A^(1//3) ;
  6.878 s (180 allocations: 1.07 GiB)
julia> @btime sqrt(A) ;
  888.409 ms (24 allocations: 91.86 MiB)
  • Target was to get lower than sqrt wrt MIPS and memory consumption for large $N,$ which is done now 🎉🎉🎉

@aravindh-krishnamoorthy
Copy link
Contributor Author

aravindh-krishnamoorthy commented Jul 24, 2023

Algorithm overview: Cube root of a quasi triangular matrix

Cube root of a quasi triangular matrix is the crucial part of this PR. The implementation consists of three functions in stdlib/LinearAlgebra/src/triangular.jl. See "Files Changed" tab for details.

  • _cbrt_2x2!: Cube root of a 2x2 real-valued matrix with complex conjugate eigenvalues and equal diagonal values. Permalink to method.
  • _cbrt_blkdiag_1x1_2x2!: Cube root of a block diagonal matrix consisting of $1\times1$ and $2\times2$ blocks (output of Schur decomposition) -- first step of Algorithm 4.3 in [2]. Permalink to method.
  • _cbrt_quasi_triu!: Cube root of a quasi-triangular matrix, i.e., remainder of Algorithm 4.3 in [1]. Permalink to method.

A first version of the _cbrt_quasi_triu! function for real-valued quasi-triangular matrices per Algorithm 4.3 in [1] (see PR text), can be accessed via the permalinks above or via the "Files Changed" tab. The algorithm can be optimised further for both execution time and memory usage. Please see my implementation notes below. An example run is as follows:

julia> A = randn(1000,1000) ;
julia> T = schur(A).T ;
julia> U = LinearAlgebra._cbrt_quasi_triu!(copy(T)) ;
julia> typeof(U)
Matrix{Float64} (alias for Array{Float64, 2})
julia> norm(U*U*U-T)
1.9153193181259216e-11
julia> @btime LinearAlgebra._cbrt_quasi_triu!(copy(T)) ;
  289.430 ms (205971 allocations: 22.91 MiB)

julia> V = T^(1/3) ;
julia> typeof(V)
Matrix{ComplexF64} (alias for Array{Complex{Float64}, 2})
julia> norm(V*V*V-T)
2.787726919534373e-10
julia> @btime V = T^(1/3) evals=1 samples=100 ;
  4.754 s (155 allocations: 900.61 MiB)

Implementation Notes:

  • In principle, the algorithm is in place because the intermediate matrices $\boldsymbol{R}_{ij}$ are stored in the lower-triangular parts.
  • The present memory overheads are due to allocation of intermediate values. These can likely be optimised away to some extent. (Presently about 25 MiB.)
  • The current version is for <:BlasFloat and uses gemm in two places. Perhaps this part can also be rewritten in Julia efficiently. See pending items in the PR text. (Removed BLAS dependency)
  • Although [2] reports $p$-th root of a matrix, the present implementation is unfortunately? optimised for cube roots. A general solution will require a lot more memory and operation, but can return real-valued matrices whenever the $p$-th root of the block diagonal matrix (analogous to _cbrt_blkdiag_1x1_2x2!) is real valued.
  • Presently, the algorithm works on $1\times 1$ or $2\times 2$ diagonal blocks. Contiguous $1\times 1$ blocks can be combined together to form larger upper triangular blocks, which may be handled more efficiently than $1\times 1$ and $2\times 2$ blocks. (Only advantageous in a few specialised cases. In general, the present algorithm reads well and also has an acceptable performance in terms of MIPS and memory utilisation.)

@aravindh-krishnamoorthy aravindh-krishnamoorthy marked this pull request as ready for review July 25, 2023 12:14
@aravindh-krishnamoorthy aravindh-krishnamoorthy marked this pull request as draft July 25, 2023 22:36
@aravindh-krishnamoorthy aravindh-krishnamoorthy marked this pull request as ready for review July 25, 2023 23:04
@dkarrasch dkarrasch added the domain:linear algebra Linear algebra label Jul 26, 2023
@aravindh-krishnamoorthy
Copy link
Contributor Author

aravindh-krishnamoorthy commented Jul 26, 2023

Hello @dkarrasch, at your convenience, could you please suggest a solution for the following. I have a $4\times 4$ chunk of memory in

A = zeros(T, 4, 4)

What would be a good way to get views into it with sizes $\{1,2,4\} \times \{1,2,4\},$ i.e., any combination of two of three numbers (e.g., $1\times 2,$ $4\times 1,$ $2\times 4,$ $4\times 4$), such that:

  • The view is contiguous (column major) and can be passed to mul! (or LAPACK),
  • The view is (almost nearly) overhead free,
  • Bounds check works throughout,
  • The view can be zero initialized with minimal overheads, if necessary.

Such a solution will help me reduce the memory usage of _cbrt_quasi_triu! even further. As you see now, the fastest solution seems to be to allocate these small matrices every loop:

S₁ = zeros(T,s₁,s₁)
S₂ = zeros(T,s₂,s₂)
Bᵢⱼ⁽⁰⁾ = zeros(T,s₁,s₂)
Bᵢⱼ⁽¹⁾ = zeros(T,s₁,s₂)

GC doesn't seem to run at the end of every loop so they seem to accumulate. Maximum memory utilization is about $70$ MiB (for $1000\times 1000$ input matrices), so I've left it as is for now.

Note: The solution in @views L₀ = L₀ᴹ[1:s₁*s₂,1:s₁*s₂] is non-contiguous in some cases, but is Ok temporarily since kron! is implemented in Julia. I'd like to change this to contiguous memory locations as well.


Update: I used the following. Comments and suggestions are welcome.

@views S₁ = reshape(M_S₁, s₁, :)[:, 1:s₁]
@views S₂ = reshape(M_S₂, s₂, :)[:, 1:s₂]
@views L₀ = reshape(M_L₀, s₁*s₂, :)[:,1:s₁*s₂]
@views L₁ = reshape(M_L₁, s₁*s₂, :)[:,1:s₁*s₂]
@views Bᵢⱼ⁽⁰⁾ = reshape(M_Bᵢⱼ⁽⁰⁾, s₁, :)[:, 1:s₂]
@views Bᵢⱼ⁽¹⁾ = reshape(M_Bᵢⱼ⁽¹⁾, s₁, :)[:, 1:s₂]

stdlib/LinearAlgebra/src/triangular.jl Outdated Show resolved Hide resolved
stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved
stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved
stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved
stdlib/LinearAlgebra/src/diagonal.jl Outdated Show resolved Hide resolved
stdlib/LinearAlgebra/src/symmetric.jl Outdated Show resolved Hide resolved
@aravindh-krishnamoorthy
Copy link
Contributor Author

It might be a good idea to add similar docs as we have for sqrt?

Hello @barucden, kindly let me know if you feel I should copy the format from sqrt and adapt it to cbrt already as a part of this PR? Or, we let this live a while and add it later? Or, is there a team doing the documentation?

@barucden
Copy link
Contributor

barucden commented Aug 3, 2023

I've been under the impression that a documentation is (or should be) an integral part of any exported functionality. So I'd say it should be part of this PR. BUT I am not a maintainer! I just noticed that the documentation is missing here, so I thought I should point it out. In any case, there is no special team working on the documentation.

@aravindh-krishnamoorthy
Copy link
Contributor Author

I've been under the impression that a documentation is (or should be) an integral part of any exported functionality. So I'd say it should be part of this PR. BUT I am not a maintainer! I just noticed that the documentation is missing here, so I thought I should point it out. In any case, there is no special team working on the documentation.

Ok. Thank you. Then, I'll wait for the opinion of one of the members before making any change.

@stevengj stevengj added the needs docs Documentation for this change is required label Aug 3, 2023
@stevengj
Copy link
Member

stevengj commented Aug 4, 2023

Base.Math.cbrt
should be changed to Base.Math.cbrt(::Number) so it only documents the scalar version, similar to sqrt a few lines above.

And a reference to cbrt(::AbstractMatrix) should be added around here so that it shows up in the LinearAlgebra manual:

LinearAlgebra.sqrt(::StridedMatrix)

@aravindh-krishnamoorthy
Copy link
Contributor Author

Base.Math.cbrt

should be changed to Base.Math.cbrt(::Number) so it only documents the scalar version, similar to sqrt a few lines above.
And a reference to cbrt(::AbstractMatrix) should be added around here so that it shows up in the LinearAlgebra manual:

LinearAlgebra.sqrt(::StridedMatrix)

Thank you for the tip. This is now done and the docs build well on my PC.

@aravindh-krishnamoorthy aravindh-krishnamoorthy changed the title Implement cbrt(A::AbstractMatrix{T}) Implement cbrt(A::AbstractMatrix{<:Real}) Aug 5, 2023
@stevengj stevengj removed the needs docs Documentation for this change is required label Aug 7, 2023
@aravindh-krishnamoorthy
Copy link
Contributor Author

@stevengj Please let me know if anything more is required for this PR from my side. Is this PR good to merge? I was just wondering if I missed something as this was still hanging around and unmerged since a few months...

@stevengj
Copy link
Member

@dkarrasch, I think this is ready to merge?

@dkarrasch dkarrasch merged commit 8d0eec9 into JuliaLang:master Dec 5, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

missing cbrt(A::AbstractMatrix)
5 participants