-
Notifications
You must be signed in to change notification settings - Fork 152
Utilize lu for det, inv, solve, expm #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yeah, I'm generally a little worried about the compile times for the LU code; testing the LU stuff probably takes about 80% of total test time. |
Indeed. It was a nice exercise and it's cool that it's working, but it might not be too practical. What I think we can do on 0.7, with the better allocation elision, is to convert the input to an |
|
Cool! Any analysis of speed and/or accuracy? Yes, I do agree with the comments above. If it's true that
doesn't
Yes - I'd really love to see proof of this working. It would be really cool and would be the "correct" way of dealing with larger structures. In fact, we could use BLAS and LAPACK a lot more - we just pass stack pointers to inputs and outputs. In the original code (way back before StaticArrays was even released) I had |
Not a particularly exhaustive study (especially in terms of accuracy), but here's a quick/dirty look: using StaticArrays, BenchmarkTools, DataFrames, DataStructures
Nmax = 20
unary = (det, inv, expm)
binary = (\,)
data = OrderedDict{Symbol,Any}()
data[:SIZE] = vcat(([i, "", "", ""] for i in 1:Nmax)...)
data[:STAT] = [stat for sz in 1:Nmax for stat in ("compile time (s)", "StaticArrays (μs)", "Base (μs)", "max error")]
for f in unary
f_data = Float64[]
for N in 1:Nmax
print("\r$((f,N))")
SA = @SMatrix rand(N,N)
A = Array(SA)
push!(f_data, @elapsed f(SA))
push!(f_data, 1e6*@belapsed $f($SA))
push!(f_data, 1e6*@belapsed $f($A))
push!(f_data, maximum([begin
SA = @SMatrix rand(N,N)
A = Array(SA)
norm(f(A) - f(SA))
end for i in 1:1000]))
end
data[Symbol(f)] = f_data
end
for f in binary
f_data = Float64[]
for N in 1:Nmax
print("\r$((f,N))")
SA = @SMatrix rand(N,N)
A = Array(SA)
SB = @SMatrix rand(N,N)
B = Array(SB)
push!(f_data, @elapsed f(SA,SB))
push!(f_data, 1e6*@belapsed $f($SA,$SB))
push!(f_data, 1e6*@belapsed $f($A,$B))
push!(f_data, maximum([begin
SA = @SMatrix rand(N,N)
A = Array(SA)
SB = @SMatrix rand(N,N)
B = Array(SB)
norm(f(A,B) - f(SA,SB))
end for i in 1:1000]))
end
data[Symbol(f)] = f_data
end
df = DataFrame(data...)The compile time does continue to increase for
Oh yes, that's correct: https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/matrix_multiply.jl#L126. |
|
Little experiment concerning the LU implementation (and sorry for hijacking this PR): function mylu(A::StaticMatrix{T}, ::Val{Pivot}=Val(true)) where {T,Pivot}
A = MMatrix(A)
m, n = size(A)
ipiv = MVector{m,Int}()
minmn = min(m,n)
@inbounds begin # this whole loop copied from stdlib/LinearAlgebra/src/lu.jl
for k = 1:minmn
# find index max
kp = k
if Pivot
amax = abs(zero(T))
for i = k:m
absi = abs(A[i,k])
if absi > amax
kp = i
amax = absi
end
end
end
ipiv[k] = kp
if !iszero(A[kp,k])
if k != kp
# Interchange
for i = 1:n
tmp = A[k,i]
A[k,i] = A[kp,i]
A[kp,i] = tmp
end
end
# Scale first column
Akkinv = inv(A[k,k])
for i = k+1:m
A[i,k] *= Akkinv
end
elseif info == 0
info = k
end
# Update the rest
for j = k+1:n
for i = k+1:m
A[i,j] -= A[i,k]*A[k,j]
end
end
end
end
return SMatrix(A), SVector(ipiv)
endThen I get (on 0.7.0-alpha.0): julia> @btime mylu($(SMatrix{10,10}(rand(10,10))));
606.908 ns (0 allocations: 0 bytes)Compare with (on 0.6.3; cannot meaningfully bechmark on 0.7 due to deprecations): julia> @btime StaticArrays.__lu($(SMatrix{10,10}(rand(10,10))), Val{true});
652.778 ns (0 allocations: 0 bytes)It's not a 100% fair comparison (results in slightly different formats, different Julia versions, runtime depends on random values (different pivoting), ...), but they're definitely close. And approximately ten times faster than However, on 0.6.3, i.e. without the allocation elimination: julia> @btime mylu($(SMatrix{10,10}(rand(10,10))));
138.708 μs (2505 allocations: 57.20 KiB) |
andyferris
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like some solid work. Thanks @schmrlng :)
I think we should merge this and play with the idea of mutating algorithms more when v0.7 is released around the time we drop v0.6 support.
I'll leave this open for another day or so for any more feedback before merging.
Finally - @schmrlng feel free to add your benchmarking and/or fuzz testing code the repository, such as in the perf directory - it is very useful, and even if this stuff does tend to go stale after a while it can always be revived.
7289dd3 to
da34ded
Compare
Notes:
Baseversions of functions, inherited fromlu) may be worth canonizing somewhere instead of appearing repeatedly as a magic number.solveand\currently only work for square matrices on the LHS, though the same was true of theinv(a) * bstopgap previously in place (i.e., no functionality has been lost). Maybesolveshould be folded into theldivfamily of functions in some future update.Baseinexpm. Compile times can get into the minutes beyond 15 x 15; I think this is a consequence ofmul!never being dispatched to itsBaseversion.