Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding things up #234

Closed
dmbates opened this issue Dec 27, 2019 · 3 comments
Closed

Speeding things up #234

dmbates opened this issue Dec 27, 2019 · 3 comments

Comments

@dmbates
Copy link
Collaborator

dmbates commented Dec 27, 2019

There are a couple of things I have noticed when doing some timings that could be rewritten to speed things up a bit.

When fitting a linear mixed model the bulk of the time is spent in calls to updateL! defined in src/linearmixedmodel.jl. For a model with crossed vector-valued random effects some timings show quite a bit of the time being spent in the lmulΛ!.method defined around line 192 of src/remat.jl (line positions are those in the explicitloops branch). I think this occurs in the mulitplication on the left by Λ₂' of the (2,2) block of L. The point is that this block has a block diagonal structure at the time of the multiplication even though it is stored as a dense matrix. It will become a dense lower-triangular matrix after the next stage of the Cholesky factorization because crossed random-effects terms create fill-in in the (2,2) block.

Probably the best way around this is to combine the copyto! operation with the left- and right Λ multiplications, for the diagonal blocks at least. If A₂,₂ is uniform block diagonal then Λ₂'A₂,₂Λ₂ will also be uniform block diagonal and those update operations can be done much faster by exploiting this structure.

Another place where I think things can be speeded up is in the rmul! and rdiv! operations that were converted to explicit loops in the explicitloops branch. In all of these operations there is a small triangular matrix operating in-place on a block of columns in a dense matrix. This is a natural application for multi-threading. I tried the naive approach of wrapping the outer loop with Threads.@threads but that just managed to make things much, much slower - I suspect because I don't understand how @threads works.

If anyone wants to attack either of these please assign yourself. I do plan to get at these but I am not sure when.

@dmbates
Copy link
Collaborator Author

dmbates commented Dec 27, 2019

Hmm! It seems I may already have done the first one. The diagonal block is handled by a call to a scaleinflate! method so it must be the lmulΛ! multiplication of the (1,2) block that is taking up time.

@dmbates
Copy link
Collaborator Author

dmbates commented Dec 28, 2019

Another possible enhancement is to define the λ matrix in the ReMat struct to be Union{Diagonal,LowerTriangular}. Using a Diagonal λ when appropriate may make the lmulΛ! and rmulΛ! methods cleaner and faster. It may be worthwhile incorporating the type of λ in the type of the ReMat so that methods can dispatch on it.

@palday palday added this to the Big Future milestone Oct 15, 2020
@dmbates
Copy link
Collaborator Author

dmbates commented Aug 19, 2021

Done.

@dmbates dmbates closed this as completed Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants