Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRT for comparing non-mixed to mixed models #508

Merged
merged 20 commits into from Apr 15, 2021
Merged

LRT for comparing non-mixed to mixed models #508

merged 20 commits into from Apr 15, 2021

Conversation

palday
Copy link
Member

@palday palday commented Apr 12, 2021

Closes #491.

This only allows one non-mixed model and it has to be the first argument. That's fairly restrictive, but generally speaking, I want to avoid testing a massive series of increasingly complicated fixed-effects models and then adding random-effects onto that series.

Comment on lines +62 to +66
!!! note
For comparisons between mixed and non-mixed models, the deviance for the non-mixed
model is taken to be -2 log likelihood, i.e. omitting the additive constant for the
fully saturated model. This is in line with the computation of the deviance for mixed
models.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmbates I can feel you looking over my shoulder and shaking your head when I write something like "the deviance isn't really the deviance". 🙂 Do you have a better way of handling / labelling this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important point is that the difference in -2 log likelihood is the same as the difference in deviances would be if they could be calculated. Of course, now that I have written that I wonder if it makes sense when comparing a LinearModel and a LinearMixedModel. We know what the saturated model for a LinearModel is but we don't for a LinearMixedModel. My statement seems to indicate that the deviance for the saturated model in a LinearMixedModel would then have to be the same as that of a LinearModel but I am not sure that makes sense. I am starting to confuse myself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad I'm not the only one who struggles with some of this stuff. 😄

I think the relevant bit here is actually that the test statistic for LRT is -2 log(L_1/L2) which then is just the difference of -2 log likelihood. Using the true deviance was always just a convenience because the additive constants cancel out, when both models have the same constant.

All that said, my question is: should we change the label of that column and/or the internal fields? I would prefer to keep the internal fieldnames, but changing the show method's output is trivial.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. For a linear mixed model it would be more accurate to label the column as -2 loglik. For a GLMM I think it should be deviance. I believe that for a Binomial GLMM that is not also Bernoulli (e.g. cbpp) the quantity in that column will not be -2 loglik

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that is true. And indeed the loglikelihood and deviance for GLMM are defined independently.

Let me see what it would take to add a conditional to the show method --I think we'll need to add something to the LikelihoodRatioTest struct because right now there is no notion of which model it came from.

@codecov
Copy link

codecov bot commented Apr 12, 2021

Codecov Report

Merging #508 (cdd1369) into main (715fac5) will increase coverage by 0.14%.
The diff coverage is 98.63%.

❗ Current head cdd1369 differs from pull request most recent head dae16e2. Consider uploading reports for the commit dae16e2 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #508      +/-   ##
==========================================
+ Coverage   93.99%   94.14%   +0.14%     
==========================================
  Files          26       26              
  Lines        2164     2236      +72     
==========================================
+ Hits         2034     2105      +71     
- Misses        130      131       +1     
Impacted Files Coverage Δ
src/linearmixedmodel.jl 95.47% <96.15%> (+0.02%) ⬆️
src/likelihoodratiotest.jl 98.47% <100.00%> (+0.79%) ⬆️
src/optsummary.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 715fac5...dae16e2. Read the comment docs.

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand why true nesting isn't checked (it's hard to do in general) but it feels like a potential footgun...I suggested a few places where you could be clearer in the docstrings and/or add some additional heuristics for non-formula-bearing GLMs.

src/likelihoodratiotest.jl Show resolved Hide resolved
src/likelihoodratiotest.jl Outdated Show resolved Hide resolved
src/likelihoodratiotest.jl Outdated Show resolved Hide resolved
@dmbates
Copy link
Collaborator

dmbates commented Apr 13, 2021

I think I understand why true nesting isn't checked (it's hard to do in general) but it feels like a potential footgun...I suggested a few places where you could be clearer in the docstrings and/or add some additional heuristics for non-formula-bearing GLMs.

There is a way of checking nesting - approximately checking due to floating point, of course. The columns of the model matrix for the linear model should be in the column space of m.X where m is the linear mixed model. If you take a QR decomposition of m.X and multiply the linear model's model matrix by Q' then in each of the columns of the product, the squared length of the last n - rank(m.X) positions should be negligible compared to the squared length of the first rank(m.X) positions.

@palday
Copy link
Member Author

palday commented Apr 13, 2021

There is a way of checking nesting - approximately checking due to floating point, of course. The columns of the model matrix for the linear model should be in the column space of m.X where m is the linear mixed model. If you take a QR decomposition of m.X and multiply the linear model's model matrix by Q' then in each of the columns of the product, the squared length of the last n - rank(m.X) positions should be negligible compared to the squared length of the first rank(m.X) positions.

I like this idea (and that's what the tolerance kwarg is for, I suppose), but I think that should be a different PR because we can add that to the nesting checks for two MixedModels.

@dmbates
Copy link
Collaborator

dmbates commented Apr 15, 2021

The CI failures on Julia v1.4 are because the newly added code calls contains which doesn't exist in v1.4.

Should we continue to test on 1.4? Is it because that is currently the LTS release?

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look closely at the matrix checking stuff because it's a bit over my head but otherwise looks good.

src/likelihoodratiotest.jl Outdated Show resolved Hide resolved
src/likelihoodratiotest.jl Show resolved Hide resolved
src/likelihoodratiotest.jl Outdated Show resolved Hide resolved
src/likelihoodratiotest.jl Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
src/likelihoodratiotest.jl Outdated Show resolved Hide resolved
Co-authored-by: Douglas Bates <dmbates@gmail.com>
@palday
Copy link
Member Author

palday commented Apr 15, 2021

@dmbates The current LTS is 1.0! I'm happy to drop 1.4 and 1.5 with MixedModels 4.0, if for no other reason than CI speed. If/when 1.6 is declared LTS, my preference would be to have to LTS to change out tiered support architecture to "LTS" and "current release".

@palday palday merged commit 365e5f7 into main Apr 15, 2021
@palday palday deleted the pa/unmixedlrt branch April 15, 2021 18:06
palday added a commit that referenced this pull request Apr 15, 2021
* LRT  for comparing non-mixed to mixed models

* nesting for LM/GLMM and GLM/LMM

* NEWS entry

* use _iscomparable instead of isnested

* add missing dep

* inner has no more preds than outer, thx @kleinschmidt

* column span check

* oops

* julia 1.4 compat

* condition LRT show method on G/LMM

* use deviance for GLM and -2 loglikelihood for LMM

* contains -> occursin (Julia 1.4 compat)

* Apply suggestions from code review

Co-authored-by: Dave Kleinschmidt <dave.f.kleinschmidt@gmail.com>

* copy-paste error

* reenable a shortcut

* Apply suggestions from code review

Co-authored-by: Douglas Bates <dmbates@gmail.com>

Co-authored-by: Dave Kleinschmidt <dave.f.kleinschmidt@gmail.com>
Co-authored-by: Douglas Bates <dmbates@gmail.com>
(cherry picked from commit 365e5f7)
palday added a commit that referenced this pull request Apr 15, 2021
* LRT  for comparing non-mixed to mixed models (#508)

* LRT  for comparing non-mixed to mixed models

* nesting for LM/GLMM and GLM/LMM

* NEWS entry

* use _iscomparable instead of isnested

* add missing dep

* inner has no more preds than outer, thx @kleinschmidt

* column span check

* oops

* julia 1.4 compat

* condition LRT show method on G/LMM

* use deviance for GLM and -2 loglikelihood for LMM

* contains -> occursin (Julia 1.4 compat)

* Apply suggestions from code review

Co-authored-by: Dave Kleinschmidt <dave.f.kleinschmidt@gmail.com>

* copy-paste error

* reenable a shortcut

* Apply suggestions from code review

Co-authored-by: Douglas Bates <dmbates@gmail.com>

Co-authored-by: Dave Kleinschmidt <dave.f.kleinschmidt@gmail.com>
Co-authored-by: Douglas Bates <dmbates@gmail.com>
(cherry picked from commit 365e5f7)

* patch bump

* add Distribution and Link methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Likelihood ratio test to compare Linear[Mixed]Model
3 participants