Speeding things up #234

dmbates · 2019-12-27T19:52:08Z

There are a couple of things I have noticed when doing some timings that could be rewritten to speed things up a bit.

When fitting a linear mixed model the bulk of the time is spent in calls to updateL! defined in src/linearmixedmodel.jl. For a model with crossed vector-valued random effects some timings show quite a bit of the time being spent in the lmulΛ!.method defined around line 192 of src/remat.jl (line positions are those in the explicitloops branch). I think this occurs in the mulitplication on the left by Λ₂' of the (2,2) block of L. The point is that this block has a block diagonal structure at the time of the multiplication even though it is stored as a dense matrix. It will become a dense lower-triangular matrix after the next stage of the Cholesky factorization because crossed random-effects terms create fill-in in the (2,2) block.

Probably the best way around this is to combine the copyto! operation with the left- and right Λ multiplications, for the diagonal blocks at least. If A₂,₂ is uniform block diagonal then Λ₂'A₂,₂Λ₂ will also be uniform block diagonal and those update operations can be done much faster by exploiting this structure.

Another place where I think things can be speeded up is in the rmul! and rdiv! operations that were converted to explicit loops in the explicitloops branch. In all of these operations there is a small triangular matrix operating in-place on a block of columns in a dense matrix. This is a natural application for multi-threading. I tried the naive approach of wrapping the outer loop with Threads.@threads but that just managed to make things much, much slower - I suspect because I don't understand how @threads works.

If anyone wants to attack either of these please assign yourself. I do plan to get at these but I am not sure when.

The text was updated successfully, but these errors were encountered:

dmbates · 2019-12-27T20:06:34Z

Hmm! It seems I may already have done the first one. The diagonal block is handled by a call to a scaleinflate! method so it must be the lmulΛ! multiplication of the (1,2) block that is taking up time.

dmbates · 2019-12-28T17:37:03Z

Another possible enhancement is to define the λ matrix in the ReMat struct to be Union{Diagonal,LowerTriangular}. Using a Diagonal λ when appropriate may make the lmulΛ! and rmulΛ! methods cleaner and faster. It may be worthwhile incorporating the type of λ in the type of the ReMat so that methods can dispatch on it.

dmbates · 2021-08-19T21:08:46Z

Done.

palday added the enhancement label Jan 23, 2020

palday added this to the Big Future milestone Oct 15, 2020

dmbates closed this as completed Aug 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding things up #234

Speeding things up #234

dmbates commented Dec 27, 2019

dmbates commented Dec 27, 2019

dmbates commented Dec 28, 2019

dmbates commented Aug 19, 2021

Speeding things up #234

Speeding things up #234

Comments

dmbates commented Dec 27, 2019

dmbates commented Dec 27, 2019

dmbates commented Dec 28, 2019

dmbates commented Aug 19, 2021