fix perf in exp(::Matrix) (for smallish matrices) #29116

KristofferC · 2018-09-10T14:22:38Z

Before:

julia> @btime exp([1. 0; 2 0])
  18.912 μs (73 allocations: 4.80 KiB)

After:

julia> @btime exp([1. 0; 2 0])
  3.529 μs (25 allocations: 2.30 KiB)

Edit: The text below is no longer relevant:

Should the broadcasting be able to do better here? cc @mbauman Here is a MWE

f(A, A2, A4, A6, CC, Inn)      =  A * (A6 * (CC[14]*A6  + CC[12]*A4  + CC[10]*A2)  + CC[8]*A6  + CC[6]*A4  + CC[4]*A2  + CC[2]*Inn)
f_dots(A, A2, A4, A6, CC, Inn) =  A * (A6 * (CC[14]*A6 .+ CC[12]*A4 .+ CC[10]*A2) .+ CC[8]*A6 .+ CC[6]*A4 .+ CC[4]*A2 .+ CC[2]*Inn)
CC = rand(100); A = rand(2,2); A2 = A*A; A4 = A2*A2; A6 = A2 * A4; Inn = rand(2,2);

@btime f($A, $A2, $A4, $A6, $CC, $Inn)
  6.949 μs (28 allocations: 1.69 KiB)

@btime f_dots($A, $A2, $A4, $A6, $CC, $Inn)
  565.278 ns (11 allocations: 1.20 KiB)

Update since it is being discussed: using full fusion:

f_ddots(A, A2, A4, A6, CC, Inn) =  A * (A6 * @.((CC[14]*A6  + CC[12]*A4  + CC[10]*A2)  + CC[8]*A6  + CC[6]*A4  + CC[4]*A2  + CC[2]*Inn))

@btime f_ddots($A, $A2, $A4, $A6, $CC, $Inn)
  1.787 μs (7 allocations: 464 bytes)

stevengj · 2018-09-10T14:30:38Z

stdlib/LinearAlgebra/src/dense.jl

-                  CC[8]*A6 + CC[6]*A4 + CC[4]*A2 + CC[2]*Inn)
-        V  = A6 * (CC[13]*A6 + CC[11]*A4 + CC[9]*A2) +
-                   CC[7]*A6 + CC[5]*A4 + CC[3]*A2 + CC[1]*Inn
+        U  = A * (A6 * (CC[14]*A6 .+ CC[12]*A4 .+ CC[10]*A2) .+


CC[14]*A6 etcetera should be CC[14].*A6 in order to fuse. Probably cleaner to just do @.(CC[14]*A6 + CC[12]*A4 + …)

(I'm not sure why CC is an array rather than a bunch of scalars. Not that any of these micro-optimizations matter for large arrays.)

That makes the code three times slower so I rather not do that.

Like, I put a MWE on the top so we can talk about actual performance instead of theoretical performance :P

That looks like #28126, particularly since when I tried it I saw a 2x gain. :-\

julia> f_dots2(A, A2, A4, A6, CC, Inn) = A * (A6 * (CC[14].*A6 .+ CC[12].*A4 .+ CC[10].*A2) .+ CC[8]*A6 .+ CC[6].*A4 .+ CC[4].*A2 .+ CC[2].*Inn) f_dots2 (generic function with 1 method) julia> @btime f_dots2($A, $A2, $A4, $A6, $CC, $Inn) 254.430 ns (5 allocations: 560 bytes) 2×2 Array{Float64,2}: 0.516647 0.311652 0.186972 0.116884

Ok, I also get that, but not when using the macro... Did I mess up?

Anyway, just pushed it with explicit dots.

Ah, the difference appears to be in the n-ary operations. The parser doesn't use special n-ary parsing for .+ but the @. macro does.

stevengj · 2018-09-10T14:33:58Z

stdlib/LinearAlgebra/src/dense.jl

+        U  = A * (A6 * (CC[14]*A6 .+ CC[12]*A4 .+ CC[10]*A2) .+
+                  CC[8]*A6 .+ CC[6]*A4 .+ CC[4]*A2 .+ CC[2]*Inn)
+        V  = A6 * (CC[13]*A6 .+ CC[11]*A4 .+ CC[9]*A2) .+
+                   CC[7]*A6 .+ CC[5]*A4 .+ CC[3]*A2 .+ CC[1]*Inn


If we really wanted to we could split out the + CC[2]*Inn operation into a separate loop that just updates the diagonals in-place.

In general, it looks like there are lots of opportunities for avoiding temporary array allocations in this routine if we care, both by re-using arrays and by using things like rmul!.

It will only matter for tiny arrays like the one in #29113, however, since for larger arrays the Θ(n³) operations will dominate.

Keno · 2018-09-10T16:48:10Z

Do we have this as a nanosoldier benchmark and didn't notice? If not, add it?

KristofferC · 2018-09-10T16:52:28Z

Benchmark label is on there so anyone feel free to add it.

(cherry picked from commit 6acaa10)

KristofferC added performance Must go faster domain:linear algebra Linear algebra labels Sep 10, 2018

stevengj reviewed Sep 10, 2018

View reviewed changes

fix perf in exp(::Matrix)

603f4b6

KristofferC force-pushed the kc/perf_exp branch from b48ebb2 to 603f4b6 Compare September 10, 2018 15:04

KristofferC changed the title ~~fix perf in exp(::Matrix)~~ fix perf in exp(::Matrix) (for smallish matrices) Sep 10, 2018

KristofferC added the kind:potential benchmark Could make a good benchmark in BaseBenchmarks label Sep 10, 2018

KristofferC added the backport pending 1.0 label Sep 10, 2018

KristofferC merged commit 6acaa10 into master Sep 12, 2018

mbauman deleted the kc/perf_exp branch September 12, 2018 14:49

KristofferC added a commit that referenced this pull request Sep 12, 2018

fix perf in exp(::Matrix) (#29116)

04643c3

(cherry picked from commit 6acaa10)

KristofferC mentioned this pull request Sep 12, 2018

Backports to 1.0.1 #28764

Merged

KristofferC removed the backport pending 1.0 label Sep 27, 2018

KristofferC mentioned this pull request Oct 10, 2018

add more regression benchmark tests JuliaCI/BaseBenchmarks.jl#233

Merged

KristofferC removed the kind:potential benchmark Could make a good benchmark in BaseBenchmarks label Oct 16, 2018

KristofferC added a commit that referenced this pull request Feb 11, 2019

fix perf in exp(::Matrix) (#29116)

13d989b

(cherry picked from commit 6acaa10)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix perf in exp(::Matrix) (for smallish matrices) #29116

fix perf in exp(::Matrix) (for smallish matrices) #29116

KristofferC commented Sep 10, 2018 •

edited

Loading

stevengj Sep 10, 2018

stevengj Sep 10, 2018

KristofferC Sep 10, 2018

KristofferC Sep 10, 2018

mbauman Sep 10, 2018 •

edited

Loading

KristofferC Sep 10, 2018

KristofferC Sep 10, 2018

mbauman Sep 10, 2018

stevengj Sep 10, 2018 •

edited

Loading

Keno commented Sep 10, 2018

KristofferC commented Sep 10, 2018

fix perf in exp(::Matrix) (for smallish matrices) #29116

fix perf in exp(::Matrix) (for smallish matrices) #29116

Conversation

KristofferC commented Sep 10, 2018 • edited Loading

stevengj Sep 10, 2018

Choose a reason for hiding this comment

stevengj Sep 10, 2018

Choose a reason for hiding this comment

KristofferC Sep 10, 2018

Choose a reason for hiding this comment

KristofferC Sep 10, 2018

Choose a reason for hiding this comment

mbauman Sep 10, 2018 • edited Loading

Choose a reason for hiding this comment

KristofferC Sep 10, 2018

Choose a reason for hiding this comment

KristofferC Sep 10, 2018

Choose a reason for hiding this comment

mbauman Sep 10, 2018

Choose a reason for hiding this comment

stevengj Sep 10, 2018 • edited Loading

Choose a reason for hiding this comment

Keno commented Sep 10, 2018

KristofferC commented Sep 10, 2018

KristofferC commented Sep 10, 2018 •

edited

Loading

mbauman Sep 10, 2018 •

edited

Loading

stevengj Sep 10, 2018 •

edited

Loading