Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow LMM to init GLMM #588

Merged
merged 7 commits into from Jan 6, 2022
Merged

Allow LMM to init GLMM #588

merged 7 commits into from Jan 6, 2022

Conversation

palday
Copy link
Member

@palday palday commented Jan 4, 2022

Benchmarking suggests that the GLM inits we're currently using are the best for small models (although at a few hundred millisecond total difference in the fit, it's realistically a wash). But things get more interesting if we look at big models.

Here's an example from the English Lexicon project data -- I've split it into two plots because the scale changes pretty dramatically.

First 50 iterations (after LMM fitting when the LMM is used)
First 50 iterations (after the LMM was fit for models with an LMM)

All successive iterations
All iterations after 50

Here's the dataframe showing the progress (wrapped in a zip file to make GitHub happy):
glmm_fitlog_by_init.arrow.zip

The relevant timing info

Minimizing 973   Time: 1:42:50 ( 6.34  s/it) # GLM init
6179.284576 seconds (1.23 M allocations: 1.374 GiB, 0.00% gc time, 0.00% compilation time)
Minimizing 38    Time: 0:00:04 ( 0.12  s/it) # LMM being fit....
Minimizing 113   Time: 0:11:16 ( 5.98  s/it) # beta and theta from LMM
689.532497 seconds (285.49 k allocations: 1.341 GiB, 0.06% gc time)
Minimizing 38    Time: 0:00:04 ( 0.12  s/it) 
Minimizing 556   Time: 1:00:04 ( 6.48  s/it) # theta from LMM
3618.443314 seconds (767.97 k allocations: 1.358 GiB, 0.01% gc time)
Minimizing 38    Time: 0:00:04 ( 0.12  s/it)
Minimizing 113 Time: 0:12:10 ( 6.47  s/it) # beta from LMM
744.312527 seconds (285.59 k allocations: 1.341 GiB, 0.08% gc time)

It could be interesting plot the norm of successive differences between the parameter vector so that we can see how much movement we're getting -- and maybe how much of that is change in beta and how much is change in theta.

@dmbates

@codecov
Copy link

codecov bot commented Jan 4, 2022

Codecov Report

Merging #588 (1aa42af) into main (abca4ea) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #588      +/-   ##
==========================================
+ Coverage   96.22%   96.23%   +0.01%     
==========================================
  Files          28       28              
  Lines        2516     2523       +7     
==========================================
+ Hits         2421     2428       +7     
  Misses         95       95              
Impacted Files Coverage Δ
src/generalizedlinearmixedmodel.jl 90.15% <100.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abca4ea...1aa42af. Read the comment docs.

@dmbates
Copy link
Collaborator

dmbates commented Jan 4, 2022

Thanks for doing this and for cleaning up some of our earlier code. This definitely looks worth pursuing.

I'm not sure I understand the plots, particularly the objective axis. What is the scaling between the first and second plots?

Where are the values for lmm-\beta on the second plot? The oscillating behavior for lmm-\beta\theta is alarming as is the difference in the eventual objective values for the different starting value methods.

Lots of good stuff here to contemplate.

@dmbates
Copy link
Collaborator

dmbates commented Jan 4, 2022

P.S. Let me know when you want a review of this.

@palday
Copy link
Member Author

palday commented Jan 4, 2022

@dmbates I messed up the first plot -- I was trying to see if I could get it all on one scale and log-scaled the objective on that one, but not on the second one.

For the second plot, I think lmm-beta is overplotted by the glm-init.

I'm a bit concerned by the difference in the final objectives as well -- it's 75 points of log-likelihood! I think part of it is that we're looking at a very large dataset, so tiny perturbations in the parameters can lead to a big change in the log-likelihood because even if the change in LL per observation is small, we're still summing over a huge number of observations.

Taken all together, it seems that the random effects dominate optimization in larger models.

Do you have a better name for the kwarg? I'm tending towards adding a note to the docstring that this feature and kwarg is experimental and may disappear/change without being considered a breaking change. Then we can merge and use this code for further experimentation elsewhere.

@dmbates
Copy link
Collaborator

dmbates commented Jan 4, 2022

I should have thought of a logarithm scale.

Are you referring to the thin kwarg? I agree that it is a poor choice. In retrospect I think it would be sufficient to make the argument a Boolean rather than a peculiar numeric variable where smaller means more information until you get to 0 when it doesn't. If someone wants to prune the values after the fact they can just go ahead and do it.

By the way, we currently have the initial value repeated because we start the structure with the initial value then also record the first iteration. If we want to allow ourselves breaking changes later we should fix that and maybe eliminate a few of the fields of the OptSummary struct that, now redundantly, record the starting values, the initial objective, the final objective and the converged values.

@palday
Copy link
Member Author

palday commented Jan 4, 2022

For this PR, I meant the lmminit kwarg -- maybe init_from_lmm? Clearer but longer.

For OptSummary, we could potentially change some fields to be properties that reference the fitlog. That wouldn't be breaking in terms of API, and define our own serialization methods, so breaking the way the Serializationstdlib serializes that struct wouldn't be breaking IMHO.

@palday palday marked this pull request as ready for review January 6, 2022 03:11
@palday palday requested a review from dmbates January 6, 2022 03:11
@palday
Copy link
Member Author

palday commented Jan 6, 2022

@dmbates I think this is good to go.

Copy link
Collaborator

@dmbates dmbates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for doing this.

@palday palday merged commit df95a29 into main Jan 6, 2022
@palday palday deleted the pa/init branch January 6, 2022 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants