LinearAlgebra: improve type-inference in Symmetric/Hermitian matmul #54303

jishnub · 2024-04-29T11:54:51Z

Matrix multiplication for wrapper types such as Hermitian currently uses the unwrapping mechanism that assigns a character based on the type of the wrapper. However, this isn't always unique, as for Hermitian/Symmetric types, this also looks as the uplo field, which isn't usually known at compile time.

julia/stdlib/LinearAlgebra/src/LinearAlgebra.jl

Lines 523 to 525 in 6023ad6

    
           wrapper_char(A::Hermitian) = A.uplo == 'U' ? 'H' : 'h' 
        
           wrapper_char(A::Hermitian{<:Real}) = A.uplo == 'U' ? 'S' : 's' 
        
           wrapper_char(A::Symmetric) = A.uplo == 'U' ? 'S' : 's'

An example of a badly inferred function call because of this:

julia> @descend_code_warntype (A -> LinearAlgebra.wrap(parent(A), LinearAlgebra.wrapper_char(A)))(Symmetric(rand(2,2)))
(::var"#1#2")(A) @ Main REPL[2]:1
┌ Warning: couldn't retrieve source of (::var"#1#2")(A) @ Main REPL[2]:1
└ @ TypedSyntax ~/.julia/packages/TypedSyntax/cH1Nu/src/node.jl:36
Variables
  #self#::Core.Const(var"#1#2"())
  A::Symmetric{Float64, Matrix{Float64}}

Body::Union{Adjoint{Float64, Matrix{Float64}}, Hermitian{Float64, Matrix{Float64}}, Symmetric{Float64, Matrix{Float64}}, Transpose{Float64, Matrix{Float64}}, Matrix{Float64}}
    @ REPL[2]:1 within `#1`
1 ─ %1 = LinearAlgebra.wrap::Core.Const(LinearAlgebra.wrap)
│   %2 = Main.parent::Core.Const(parent)
│   %3 = (%2)(A)::Matrix{Float64}
│   %4 = LinearAlgebra.wrapper_char::Core.Const(LinearAlgebra.wrapper_char)
│   %5 = (%4)(A)::Char
│   %6 = (%1)(%3, %5)::Union{Adjoint{Float64, Matrix{Float64}}, Hermitian{Float64, Matrix{Float64}}, Symmetric{Float64, Matrix{Float64}}, Transpose{Float64, Matrix{Float64}}, Matrix{Float64}}
└──      return %6
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
 • %3 = parent(::Symmetric{Float64, Matrix{Float64}})::Matrix{Float64}
   %5 = wrapper_char(::Symmetric{Float64, Matrix{Float64}})::Char
   %6 = wrap(::Matrix{Float64},::Char)::…

The output type is inferred as a large union, which complicates further type-inference downstream. Often, the impact of the runtime dispatch is minimal due to function barriers. However, we may avoid the runtime dispatch altogether.

This PR separates the uplo character from that for the type, storing them both in a newly defined struct. Using this approach, the type information may be constant-propagated even if the uplo isn't, and the return type may be concretely inferred. After this,

julia> @descend_code_warntype (A -> LinearAlgebra.wrap(parent(A), LinearAlgebra.wrapper_char(A)))(Symmetric(rand(2,2)))
(::var"#3#4")(A) @ Main REPL[3]:1
┌ Warning: couldn't retrieve source of (::var"#3#4")(A) @ Main REPL[3]:1
└ @ TypedSyntax ~/.julia/packages/TypedSyntax/cH1Nu/src/node.jl:36
Variables
  #self#::Core.Const(var"#3#4"())
  A::Symmetric{Float64, Matrix{Float64}}

Body::Symmetric{Float64, Matrix{Float64}}
    @ REPL[3]:1 within `unknown scope`
1 ─ %1 = LinearAlgebra.wrap::Core.Const(LinearAlgebra.wrap)
│   %2 = Main.parent::Core.Const(parent)
│   %3 = (%2)(A)::Matrix{Float64}
│   %4 = LinearAlgebra.wrapper_char::Core.Const(LinearAlgebra.wrapper_char)
│   %5 = (%4)(A)::Core.PartialStruct(LinearAlgebra.WrapperChar, Any[Core.Const('S'), Bool])
│   %6 = (%1)(%3, %5)::Symmetric{Float64, Matrix{Float64}}
└──      return %6
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
 • %3 = parent(::Symmetric{Float64, Matrix{Float64}})::Matrix{Float64}
   %5 = wrapper_char(::Symmetric{Float64, Matrix{Float64}})::Core.PartialStruct(LinearAlgebra.WrapperChar, Any[Core.Const('S'), Bool])
   %6 = < constprop > wrap(::Matrix{Float64},::Core.PartialStruct(LinearAlgebra.WrapperChar, Any[Core.Const('S'), Bool]))::…
   ↩

This change should be compatible with existing codes, as the new struct subtypes an AbstractChar, and it may be converted and compared to a Char like before.

Fixes #53951.

dkarrasch · 2024-04-29T14:40:09Z

Should we even backport this to v1.10?

dkarrasch · 2024-04-29T14:43:23Z

So, the main idea is not to confuse the compiler unnecessarily with uppercase/lowercase H or S when, for the result type, that distinction is irrelevant anyway, right? Because that distinction is just the value of a field, and not encoded into the type?

jishnub · 2024-04-29T16:06:30Z

Yes, that's the idea.

Backporting to v1.10 might require some manual intervention, but should be a good idea.

jishnub · 2024-04-30T14:52:23Z

The last few commits improve type-stability and ensure constant propagation in various checks in the matmul functions. Introduces a new function _in that parallels in, but uses 2-value logic and is defined recursively. This allows checks like tA in ('T', 'N', 'C') to be evaluated at compile time, which should remove branches in the code. Ideally, in should already be doing this, but I don't know enough about the compile-time implications in the general case. For our specific case, this shouldn't matter much. Also, defines a function all_in that acts like all(in(..)), but ensures constant propagation by unrolling the loop over the arguments. This isn't used anymore, as all(map(in(..), ...)) achieves constant propagation without the need for special helper functions.

Fixes #53951 after the recent set of commits. After this,

julia> using LinearAlgebra

julia> using BenchmarkTools

julia> A = Hermitian([1.0 2.0; 2.0 3.0])
2×2 Hermitian{Float64, Matrix{Float64}}:
 1.0  2.0
 2.0  3.0

julia> B = [4.0 5.0; 6.0 7.0]
2×2 Matrix{Float64}:
 4.0  5.0
 6.0  7.0

julia> Y = similar(B)
2×2 Matrix{Float64}:
   6.0e-323  6.4e-323
 NaN         0.0

julia> @btime mul!($Y, $A, $B)
  127.843 ns (0 allocations: 0 bytes)
2×2 Matrix{Float64}:
 16.0  19.0
 26.0  31.0

julia> Badj = B'
2×2 adjoint(::Matrix{Float64}) with eltype Float64:
 4.0  6.0
 5.0  7.0

julia> @btime mul!($Y, $A, $Badj)
  44.311 ns (0 allocations: 0 bytes)
2×2 Matrix{Float64}:
 14.0  20.0
 23.0  33.0

stdlib/LinearAlgebra/src/matmul.jl

stdlib/LinearAlgebra/src/LinearAlgebra.jl

dkarrasch · 2024-05-07T08:33:47Z

stdlib/LinearAlgebra/src/matmul.jl

+    # We convert the chars to uppercase to potentially unwrap a WrapperChar,
+    # and extract the char corresponding to the wrapper type
+    tA_uc, tB_uc = uppercase(tA), uppercase(tB)
+    # the map in all ensures constprop by acting on tA and tB individually, instead of looping over them.


If this is true, this should be a giant contribution to the reduction of compile times, right? If we land in this branch, then we don't need to compile symm and hemm, or in the other case syrk/herk/gemm_wrapper.

There is some compile-time improvement indeed, although it's not dramatic.
Each execution is in a separate session in the following:

julia> A = rand(2,2); B = rand(2,2); C = zeros(2,2); julia> @time mul!(C, A, B); 0.847057 seconds (3.39 M allocations: 171.963 MiB, 24.97% gc time, 100.00% compilation time) # nightly 0.757433 seconds (3.94 M allocations: 202.922 MiB, 4.65% gc time, 100.00% compilation time) # This PR julia> A = rand(2,2); B = Symmetric(rand(2,2)); C = zeros(2,2); julia> @time mul!(C, A, B); 1.098831 seconds (3.68 M allocations: 189.159 MiB, 24.52% gc time, 99.99% compilation time) # nightly 0.687847 seconds (4.72 M allocations: 238.864 MiB, 7.04% gc time, 99.99% compilation time) # This PR

Descending into generic_matmatmul! using Cthulhu does seem to indicate that unused branches are eliminated, and e.g. in the first case, only gemm_wrapper! is being compiled, and in the second, only BLAS.symm! is compiled.

The code_typed for the first case (gemm) is identical between this PR and nightly:

julia> A = rand(2,2); B = rand(2,2); C = zeros(2,2); julia> @code_typed mul!(C, A, B) CodeInfo( 1 ─ %1 = invoke LinearAlgebra.gemm_wrapper!(C::Matrix{Float64}, 'N'::Char, 'N'::Char, A::Matrix{Float64}, B::Matrix{Float64}, $(QuoteNode(LinearAlgebra.MulAddMul{true, true, Bool, Bool}(true, false)))::LinearAlgebra.MulAddMul{true, true, Bool, Bool})::Matrix{Float64} └── return %1 ) => Matrix{Float64}

I'm not certain why there's a compile-time improvement here. (perhaps noise?) In this case, the all is already being folded (despite the loop over the characters). I suspect the loop is being unrolled entirely, as the characters are all Chars that are fully known at compile time.

The second case (symm) is where the major improvement comes in:

julia> A = rand(2,2); B = Symmetric(rand(2,2)); C = zeros(2,2); julia> @code_typed mul!(C, A, B) CodeInfo( 1 ── %1 = Base.getfield(B, :uplo)::Char │ %2 = Base.bitcast(Base.UInt32, %1)::UInt32 │ %3 = Base.bitcast(Base.UInt32, 'U')::UInt32 │ %4 = (%2 === %3)::Bool │ %5 = Base.getfield(B, :data)::Matrix{Float64} └─── goto #3 if not %4 2 ── goto #4 3 ── goto #4 4 ┄─ %9 = φ (#2 => 'S', #3 => 's')::Char │ %10 = Base.bitcast(Base.UInt32, %9)::UInt32 │ %11 = Base.bitcast(Base.UInt32, 'S')::UInt32 │ %12 = (%10 === %11)::Bool └─── goto #5 5 ── goto #7 if not %12 6 ── goto #8 7 ── nothing::Nothing 8 ┄─ %17 = φ (#6 => 'U', #7 => 'L')::Char │ %18 = invoke LinearAlgebra.BLAS.symm!('R'::Char, %17::Char, 1.0::Float64, %5::Matrix{Float64}, A::Matrix{Float64}, 0.0::Float64, C::Matrix{Float64})::Matrix{Float64} └─── goto #9 9 ── goto #10 10 ─ goto #11 11 ─ return %18 ) => Matrix{Float64}

The BLAS.symm! branch that is being followed is "inlined" now. This is the case where the loop is not unrolled ordinarily, but using the all(map(..)) combination permits constant propagation.

dkarrasch · 2024-05-07T08:38:30Z

I love it. I always dreamt of the day when that character stuff be inferred, or constant-propagated far enough. Is this ready to go now? I think we should first merge this, and then "stabilize MulAddMul strategically" PR, to give this one a chance for backport to v1.10, though I'm not sure if this is a bit too ambitious.

With this, `isuppercase`/`islowercase` are evaluated at compile-time for `Char` arguments: ```julia julia> @code_typed (() -> isuppercase('A'))() CodeInfo( 1 ─ return true ) => Bool julia> @code_typed (() -> islowercase('A'))() CodeInfo( 1 ─ return false ) => Bool ``` This would be useful in #54303, where the case of the character indicates which triangular half of a matrix is filled, and may be constant-propagated downstream. --------- Co-authored-by: Shuhei Kadowaki <40514306+aviatesk@users.noreply.github.com>

jishnub · 2024-05-07T10:31:14Z

Yes, this is ready from my side.

jishnub added domain:linear algebra Linear algebra backport 1.11 Change should be backported to release-1.11 labels Apr 29, 2024

jishnub added the backport 1.10 Change should be backported to the 1.10 release label Apr 29, 2024

jishnub commented Apr 30, 2024

View reviewed changes

stdlib/LinearAlgebra/src/matmul.jl Show resolved Hide resolved

jishnub commented May 1, 2024

View reviewed changes

stdlib/LinearAlgebra/src/LinearAlgebra.jl Show resolved Hide resolved

jishnub added 13 commits May 2, 2024 23:52

LinearAlgebra: improve type-inference in Symmetric/Hermitian matmul

fda8c87

Add inference tests

3073352

LinearAlgbebra: constant propagate character in generic_matmatmul checks

54a4038

Remove unused variable

108db04

Use wrapperchar in checks

a2dd315

Aggressive constprop annotation in gemm_wrapper

5db39fb

uppercase(::WrapperChar) instead of accessor function

7a8bb75

Consistent variable name

1e06a1e

Constprop in syrk_wrapper/herk_wrapper

cefa1b1

Fix typo

28d4fd8

Update comment

b24cc1e

Remove some unnecessary type conversions

9e66d1a

Use all(map(...)) instead of all_in

541a76d

jishnub force-pushed the jishnub/linalgwrapperchar branch from ae7fc6c to 541a76d Compare May 2, 2024 19:13

WrapperChar Constructor from Char

31692d5

jishnub mentioned this pull request May 3, 2024

Assume :foldable in isuppercase/islowercase for Char #54346

Merged

KristofferC mentioned this pull request May 6, 2024

Backports for 1.11.0-beta2 #54112

Open

57 tasks

Preserve case in WrapperChar constructor

311382b

dkarrasch reviewed May 7, 2024

View reviewed changes

dkarrasch merged commit c77671a into master May 7, 2024
7 checks passed

dkarrasch deleted the jishnub/linalgwrapperchar branch May 7, 2024 16:19

KristofferC mentioned this pull request May 8, 2024

Backports for 1.10.4 #54416

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LinearAlgebra: improve type-inference in Symmetric/Hermitian matmul #54303

LinearAlgebra: improve type-inference in Symmetric/Hermitian matmul #54303

jishnub commented Apr 29, 2024 •

edited by dkarrasch

dkarrasch commented Apr 29, 2024

dkarrasch commented Apr 29, 2024

jishnub commented Apr 29, 2024 •

edited

jishnub commented Apr 30, 2024 •

edited

dkarrasch May 7, 2024

jishnub May 7, 2024

jishnub May 7, 2024 •

edited

dkarrasch commented May 7, 2024

jishnub commented May 7, 2024

	wrapper_char(A::Hermitian) = A.uplo == 'U' ? 'H' : 'h'
	wrapper_char(A::Hermitian{<:Real}) = A.uplo == 'U' ? 'S' : 's'
	wrapper_char(A::Symmetric) = A.uplo == 'U' ? 'S' : 's'

LinearAlgebra: improve type-inference in Symmetric/Hermitian matmul #54303

LinearAlgebra: improve type-inference in Symmetric/Hermitian matmul #54303

Conversation

jishnub commented Apr 29, 2024 • edited by dkarrasch

dkarrasch commented Apr 29, 2024

dkarrasch commented Apr 29, 2024

jishnub commented Apr 29, 2024 • edited

jishnub commented Apr 30, 2024 • edited

dkarrasch May 7, 2024

Choose a reason for hiding this comment

jishnub May 7, 2024

Choose a reason for hiding this comment

jishnub May 7, 2024 • edited

Choose a reason for hiding this comment

dkarrasch commented May 7, 2024

jishnub commented May 7, 2024

jishnub commented Apr 29, 2024 •

edited by dkarrasch

jishnub commented Apr 29, 2024 •

edited

jishnub commented Apr 30, 2024 •

edited

jishnub May 7, 2024 •

edited