Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use init_from_lmm as default in fit! method for GLMM #750

Open
dmbates opened this issue Mar 4, 2024 · 9 comments
Open

Use init_from_lmm as default in fit! method for GLMM #750

dmbates opened this issue Mar 4, 2024 · 9 comments

Comments

@dmbates
Copy link
Collaborator

dmbates commented Mar 4, 2024

I have been much more successful when using init_from_lmm when fitting GLMMs with PRIMA.bobyqa but have only tried it with Bernoulli responses. Shall we make it the default?

@dmbates
Copy link
Collaborator Author

dmbates commented Mar 5, 2024

The "goldstein" test set in test/pirls.jl is an example where init_from_lmm provides much different, and superior (in the sense of non-trivially smaller deviance), parameter estimates.

julia> using MixedModels, PooledArrays

julia> goldstein = (
           group = PooledArray(repeat(string.('A':'J'), outer=10)),
           y = [
               83, 3, 8, 78, 901, 21, 4, 1, 1, 39,
               82, 3, 2, 82, 874, 18, 5, 1, 3, 50,
               87, 7, 3, 67, 914, 18, 0, 1, 1, 38,
               86, 13, 5, 65, 913, 13, 2, 0, 0, 48,
               90, 5, 5, 71, 886, 19, 3, 0, 2, 32,
               96, 1, 1, 87, 860, 21, 3, 0, 1, 54,
               83, 2, 4, 70, 874, 19, 5, 0, 4, 36,
               100, 11, 3, 71, 950, 21, 6, 0, 1, 40,
               89, 5, 5, 73, 859, 29, 3, 0, 2, 38,
               78, 13, 6, 100, 852, 24, 5, 0, 1, 39
               ],
           );

julia> gform = @formula(y ~ 1 + (1|group));

julia> m1 = GeneralizedLinearMixedModel(gform, goldstein, Poisson());

julia> fit!(m1).optsum                   # using default starting values
Initial parameter vector: [4.727210823648169, 1.0]
Initial objective value:  246.12019299931663

Optimizer (from NLopt):   LN_BOBYQA
Lower bounds:             [-Inf, 0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10]
initial_step:             [4.727210823648169, 0.75]
maxfeval:                 -1
maxtime:                  -1.0

Function evaluations:     31
Final parameter vector:   [4.1921964390775655, 1.83824520173986]
Final objective value:    193.55873023848017
Return code:              FTOL_REACHED


julia> m1 = GeneralizedLinearMixedModel(gform, goldstein, Poisson());

julia> fit!(m1; init_from_lmm=Set((,))).optsum
Initial parameter vector: [4.727210811908346, 2.22268824817508]
Initial objective value:  194.70703488515588

Optimizer (from NLopt):   LN_BOBYQA
Lower bounds:             [-Inf, 0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10]
initial_step:             [4.727210811908346, 1.6670161861313098]
maxfeval:                 -1
maxtime:                  -1.0

Function evaluations:     31
Final parameter vector:   [4.2195828270624585, 3.847141752689735]
Final objective value:    192.001698988138
Return code:              FTOL_REACHED

This is simulated, and probably unrealistic, data with a very large standard deviation for the random effects. The average response in group 5 is over 900 whereas the average response in group 8 is less than 1.

Nevertheless this is an example where the LMM-based initial values are non-trivially better.

@dmbates
Copy link
Collaborator Author

dmbates commented Mar 5, 2024

When I initially looked at this I thought that the starting values were from an unweighted LMM fit to the same formula/data combination as the GLMM, which made me wonder what happened with the scale parameter of the Gaussian distribution for the LMM. On more careful examination I found that the LMM uses the weights from the GLM fit, so the starting values for theta would be in the right range.

@palday
Copy link
Member

palday commented Mar 6, 2024

IIRC the grouseticks example has traditionally been hard to fit -- does it perform better with init_from_lmm?

@dmbates
Copy link
Collaborator Author

dmbates commented Mar 11, 2024

You have to scroll to the right to see the calls to fit but the bottom line is that without init_from_lmm the fit takes about 3.6 seconds and with it about 1.6 seconds.

julia> @b (dataset(:grouseticks), @formula(ticks ~ 1+year+height+ (1|index) + (1|brood) + (1|location)), Poisson()) fit(MixedModel, _[2], first(_), last(_); contrasts, progress=false) seconds=10
3.605 s (387416 allocs: 134.871 MiB, 0.12% gc time)

julia> @b (dataset(:grouseticks), @formula(ticks ~ 1+year+height+ (1|index) + (1|brood) + (1|location)), Poisson()) fit(MixedModel, _[2], first(_), last(_); contrasts, progress=false) seconds=15
3.605 s (387416 allocs: 134.871 MiB, 0.13% gc time)

julia> @b (dataset(:grouseticks), @formula(ticks ~ 1+year+height+ (1|index) + (1|brood) + (1|location)), Poisson()) fit(MixedModel, _[2], first(_), last(_); contrasts, progress=false, fast=true) seconds=15
649.160 ms (73875 allocs: 24.894 MiB)

julia> @b (dataset(:grouseticks), @formula(ticks ~ 1+year+height+ (1|index) + (1|brood) + (1|location)), Poisson()) fit(MixedModel, _[2], first(_), last(_); contrasts, progress=false, init_from_lmm=(, )) seconds=15
1.681 s (179238 allocs: 63.296 MiB)

@dmbates
Copy link
Collaborator Author

dmbates commented Mar 11, 2024

Also a 4-fold increase in speed for init_from_lmm on the verbagg example.

julia> @b fit(MixedModel, @formula(r2 ~ 1 + anger + gender + btype + situ + (1|subj) + (1|item)), dataset(:verbagg), Bernoulli(); contrasts, progress=false)  seconds=5
2.368 s (436902 allocs: 21.895 MiB)

julia> @b fit(MixedModel, @formula(r2 ~ 1 + anger + gender + btype + situ + (1|subj) + (1|item)), dataset(:verbagg), Bernoulli(); contrasts, init_from_lmm=(, ), progress=false)  seconds=5
624.775 ms (150194 allocs: 11.311 MiB)

@palday
Copy link
Member

palday commented Mar 11, 2024

Do we have an intuition about when things would be worse? Maybe in a model with random slopes? In that case, the fit time for the LMM might be nontrivial.

@palday
Copy link
Member

palday commented Mar 11, 2024

(FWIW I'm increasingly in favor of shifting the default, but want to make sure we have some intuitions about potential failure modes.)

@dmbates
Copy link
Collaborator Author

dmbates commented Mar 11, 2024

Here's a collection of model fits, including one with vector-valued random effects.

julia> runglbmk(gltbl; init_from_lmm=(, ))
Table with 4 columns and 4 rows:
     bmk                                            dsnm         dist                       frm
   ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1Sample(time=0.191018, allocs=97529, bytes=56  contra       Bernoulli{Float64}(p=0.5)  use ~ 1 + age + :(abs2(age)) + urban + livch
 2Sample(time=0.187686, allocs=79237, bytes=45  contra       Bernoulli{Float64}(p=0.5)  use ~ 1 + age + :(abs2(age)) + urban + :(((
 3Sample(time=0.627815, allocs=150259, bytes=1  verbagg      Bernoulli{Float64}(p=0.5)  r2 ~ 1 + anger + gender + btype + situ + :(1
 4Sample(time=1.6794, allocs=179262, bytes=663  grouseticks  Poisson{Float64}=1.0)    ticks ~ 1 + year + height + :(1 | index) + :

julia> runglbmk(gltbl)
Table with 4 columns and 4 rows:
     bmk                                            dsnm         dist                       frm
   ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1Sample(time=0.122951, allocs=65664, bytes=42  contra       Bernoulli{Float64}(p=0.5)  use ~ 1 + age + :(abs2(age)) + urban + livch
 2Sample(time=0.201536, allocs=83628, bytes=47  contra       Bernoulli{Float64}(p=0.5)  use ~ 1 + age + :(abs2(age)) + urban + :(((
 3Sample(time=2.3971, allocs=436827, bytes=229  verbagg      Bernoulli{Float64}(p=0.5)  r2 ~ 1 + anger + gender + btype + situ + :(1
 4Sample(time=3.59434, allocs=387440, bytes=14  grouseticks  Poisson{Float64}=1.0)    ticks ~ 1 + year + height + :(1 | index) + :

julia> gltbl
Table with 4 columns and 4 rows:
     dsnm         secs  dist                       frm
   ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1 │ contra       2.0   Bernoulli{Float64}(p=0.5)  use ~ 1 + age + :(abs2(age)) + urban + livch + :(1 | urban & dist)
 2 │ contra       2.0   Bernoulli{Float64}(p=0.5)  use ~ 1 + age + :(abs2(age)) + urban + :((()("0"))(livch)) + :((1 + urban) | dist)
 3 │ verbagg      15.0  Bernoulli{Float64}(p=0.5)  r2 ~ 1 + anger + gender + btype + situ + :(1 | subj) + :(1 | item)
 4 │ grouseticks  15.0  Poisson{Float64}=1.0)    ticks ~ 1 + year + height + :(1 | index) + :(1 | brood) + :(1 | location)

I have been thinking of trying to fit a model to the correct/incorrect response in the EnglishLexicon data, which would be a stress test.

@dmbates
Copy link
Collaborator Author

dmbates commented Mar 11, 2024

code is in bench/runbenchmarks.jl in the db/RegressionTests branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants