You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a couple of things I have noticed when doing some timings that could be rewritten to speed things up a bit.
When fitting a linear mixed model the bulk of the time is spent in calls to updateL! defined in src/linearmixedmodel.jl. For a model with crossed vector-valued random effects some timings show quite a bit of the time being spent in the lmulΛ!.method defined around line 192 of src/remat.jl (line positions are those in the explicitloops branch). I think this occurs in the mulitplication on the left by Λ₂' of the (2,2) block of L. The point is that this block has a block diagonal structure at the time of the multiplication even though it is stored as a dense matrix. It will become a dense lower-triangular matrix after the next stage of the Cholesky factorization because crossed random-effects terms create fill-in in the (2,2) block.
Probably the best way around this is to combine the copyto! operation with the left- and right Λ multiplications, for the diagonal blocks at least. If A₂,₂ is uniform block diagonal then Λ₂'A₂,₂Λ₂ will also be uniform block diagonal and those update operations can be done much faster by exploiting this structure.
Another place where I think things can be speeded up is in the rmul! and rdiv! operations that were converted to explicit loops in the explicitloops branch. In all of these operations there is a small triangular matrix operating in-place on a block of columns in a dense matrix. This is a natural application for multi-threading. I tried the naive approach of wrapping the outer loop with Threads.@threads but that just managed to make things much, much slower - I suspect because I don't understand how @threads works.
If anyone wants to attack either of these please assign yourself. I do plan to get at these but I am not sure when.
The text was updated successfully, but these errors were encountered:
Hmm! It seems I may already have done the first one. The diagonal block is handled by a call to a scaleinflate! method so it must be the lmulΛ! multiplication of the (1,2) block that is taking up time.
Another possible enhancement is to define the λ matrix in the ReMat struct to be Union{Diagonal,LowerTriangular}. Using a Diagonal λ when appropriate may make the lmulΛ! and rmulΛ! methods cleaner and faster. It may be worthwhile incorporating the type of λ in the type of the ReMat so that methods can dispatch on it.
There are a couple of things I have noticed when doing some timings that could be rewritten to speed things up a bit.
When fitting a linear mixed model the bulk of the time is spent in calls to
updateL!
defined insrc/linearmixedmodel.jl
. For a model with crossed vector-valued random effects some timings show quite a bit of the time being spent in thelmulΛ!
.method defined around line 192 ofsrc/remat.jl
(line positions are those in theexplicitloops
branch). I think this occurs in the mulitplication on the left by Λ₂' of the (2,2) block of L. The point is that this block has a block diagonal structure at the time of the multiplication even though it is stored as a dense matrix. It will become a dense lower-triangular matrix after the next stage of the Cholesky factorization because crossed random-effects terms create fill-in in the (2,2) block.Probably the best way around this is to combine the copyto! operation with the left- and right Λ multiplications, for the diagonal blocks at least. If A₂,₂ is uniform block diagonal then Λ₂'A₂,₂Λ₂ will also be uniform block diagonal and those update operations can be done much faster by exploiting this structure.
Another place where I think things can be speeded up is in the
rmul!
andrdiv!
operations that were converted to explicit loops in theexplicitloops
branch. In all of these operations there is a small triangular matrix operating in-place on a block of columns in a dense matrix. This is a natural application for multi-threading. I tried the naive approach of wrapping the outer loop withThreads.@threads
but that just managed to make things much, much slower - I suspect because I don't understand how@threads
works.If anyone wants to attack either of these please assign yourself. I do plan to get at these but I am not sure when.
The text was updated successfully, but these errors were encountered: