null deviance and null loglikelihood for GLM #352

pdeffebach · 2020-01-06T00:35:54Z

This is a commit to fix #256, noting that GLM is without a null deviance or null log likelihood function. It's important to have this because likelihood ratio tests and R^2 use null deviance.

I am not 100% if this is correct. So far my results match R's glm with gaussian() and binomial() families, but to be honest I am still taking econometrics courses so my knowledge of this is slim.

I will add nullloglikelihood next, then tests for everything I can, then add r2 functions and anything else that might use them.

codecov-io · 2020-01-06T00:55:23Z

Codecov Report

Merging #352 into master will decrease coverage by 2.37%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #352      +/-   ##
==========================================
- Coverage   79.56%   77.18%   -2.38%     
==========================================
  Files           7        7              
  Lines         685      653      -32     
==========================================
- Hits          545      504      -41     
- Misses        140      149       +9

Impacted Files	Coverage Δ
src/glmfit.jl	`77.35% <0.00%> (-0.02%)`	⬇️
src/lm.jl	`78.88% <0.00%> (-15.37%)`	⬇️
src/glmtools.jl	`68.42% <0.00%> (-7.21%)`	⬇️
src/linpred.jl	`62.10% <0.00%> (-0.53%)`	⬇️
src/ftest.jl	`98.43% <0.00%> (-0.03%)`	⬇️
src/negbinfit.jl	`92.72% <0.00%> (+11.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e2b1b9...f385a65. Read the comment docs.

nalimilan

Thanks! Should be OK if it matches R, but can you check the other families too? You can reproduce examples from the tests.

src/glmfit.jl

pdeffebach · 2020-04-11T16:25:58Z

I realized I forgot about this. I apologize for that. I will be picking this up slowly, starting with a rebase and then adding more checks against R as I continue tests.

…/GLM.jl into nullloglikelihood_glm

codecov-commenter · 2020-06-08T15:42:37Z

Codecov Report

Merging #352 into master will decrease coverage by 2.37%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #352      +/-   ##
==========================================
- Coverage   79.56%   77.18%   -2.38%     
==========================================
  Files           7        7              
  Lines         685      653      -32     
==========================================
- Hits          545      504      -41     
- Misses        140      149       +9

Impacted Files	Coverage Δ
src/glmfit.jl	`77.35% <0.00%> (-0.02%)`	⬇️
src/lm.jl	`78.88% <0.00%> (-15.37%)`	⬇️
src/glmtools.jl	`68.42% <0.00%> (-7.21%)`	⬇️
src/linpred.jl	`62.10% <0.00%> (-0.53%)`	⬇️
src/ftest.jl	`98.43% <0.00%> (-0.03%)`	⬇️
src/negbinfit.jl	`92.72% <0.00%> (+11.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e2b1b9...77c5b3b. Read the comment docs.

pdeffebach · 2020-06-08T16:15:01Z

Bumping this for @andreasnoack and @nalimilan. I forgot about this PR for a long time but am determined to finish it.

It just adds a nulldeviance method, previously this was defined for lm but not for glm.

I think I need some help getting this over the finish line. As you can see in the test file, I don't know how to add tests for models with offset and for negative binomial models.

The original intention of this PR was to add nullloglikelihood as well. Though I don't quite know the way forward on that, hopefully it is easy. My RCall testing setup has been very useful, but also a lot of work to set up. I would rather not merge this PR, deleting all the RCall stuff and have to re-write it for nullloglikelihood tests.

nalimilan · 2020-06-20T15:29:14Z

Thanks. Looks good. For offsets, I think you can just add + offset(x) to the formula in R (there's an example in ?glm) to check whether results are consistent.

For the negative binomial, I think there are two different cases:

If theta is fixed (as in two testsets), you can just fit the null model in R with the same value using glm(formula, family = MASS::negative.binomial(2), ...). That should give the same results as in Julia.
If theta is not fixed (as in the two testsets calling negbin), the null deviance will correspond to the null model but with the same theta value as the considered model, which is not equivalent to fitting the null model separately. You can check that the null log-likelihood is correct by fitting the model with theta fixed to the value that negbin estimated.

Regarding nulloglikelihood, I think you can just take the definition for loglikelihood and adjust it to use the mean as a prediction, as you did for nulldeviance.

pdeffebach · 2020-08-21T18:05:24Z

I just took a look at this again and spent a few hours trying to get offset to work. No luck.

I tried replicating what R does here but couldn't get numbers to match. In particular, I played around with gm6 in the runtests.jl file.

Additionally, I noticed that R has different null deviance based on whether or not there is an intercept. However, GLM doesn't store whether or not a model has an intercept in anything but the object gm6, i.e. not the AbstractGLM model or the GlmResp object. It's not clear where user-land starts and which functions are internal.

I will take another look at this soon and see if things start making sense.

pdeffebach · 2020-08-21T18:20:41Z

I guess, a concrete question: we have nested structs starting with

m = glm(...)

TableRegressionModel (m)
AbstractGLM (m.model)
GlmResp (m.model.rr)

Currently, nulldeviance takes in only 3. But to depend on the intercept term we really need m to go in the function and then pass down whether or not the model has an intercept into inner functions. Is that acceptable?

nalimilan · 2020-08-22T21:14:54Z

It's probably OK to have the functions take an AbstractGLM object. TableRegressionModel is supposed to go away at some point, in which case the relevant information would be available directly from AbstractGLM. But we allow fitting models from raw matrices, without going through a formula/TableRegressionModel, so it would be better to detect whether the model matrix has a constant. That should be quite trivial since you just have to check whether the first column is full of 1.

If offset doesn't work, can you just throw an error when an offset has been specified?

@kleinschmidt Opinions?

kleinschmidt · 2020-08-23T17:07:57Z

One possibility is extending the use of hasintercept which right now just applies to terms

https://github.com/JuliaStats/StatsModels.jl/blob/04a0ccfb80f6ef7fe4a618556e9d3d21d1c445d1/src/terms.jl#L562-L565

The default would check the formula stored with a model and packages can implement their own specializations as needed...

nalimilan · 2020-08-23T20:36:30Z

We're in GLM here so we don't have to follow any public interface. :-)

kleinschmidt · 2020-08-24T14:16:16Z

Well, at the moment in order to support GLM models as they are usual fit by folks you'd have to define a nulldeviance(::TableRegressionModel) method to unwrap the model, which is pretty extreme type piracy...although now that I'm thinking about it I don't see how my solution solves that problem...and this PR isn't changing anything there.

The advantage of using hasintercept is that it's potentially more efficient (once we require that models store the formula) than checking al the values in the model matrix. But that's probably a pretty minor gain.

pdeffebach · 2020-08-24T14:26:34Z

But we allow fitting models from raw matrices, without going through a formula/TableRegressionModel, so it would be better to detect whether the model matrix has a constant.

Following discussion on Slack, I will write a little function that checks if the matrix has all constant for some value. This should be fast since if it was constructed via a formula then the first column will be all ones.

After this I will add a few tests of models without intercepts.

I am going to abandon my effort to get offsets to work. If someone is using offsets and sees error: nulldeviance with offsets not yet implemented then hopefully they know more than I do and can implement them.

I also still have to work on negative binomial models.

nalimilan · 2022-04-27T18:19:44Z

I ended up reimplementing this at #479. The handling of offsets is indeed tricky. Closing in favor the the other PR.

nalimilan reviewed Jan 8, 2020

View reviewed changes

src/glmfit.jl Outdated Show resolved Hide resolved

src/glmfit.jl Outdated Show resolved Hide resolved

pdeffebach added 4 commits April 11, 2020 12:54

add nulldeviance

602ef9b

Start testing

24354ba

fix tests

d2eed79

continue tests

f385a65

pdeffebach force-pushed the nullloglikelihood_glm branch from 8a8b939 to f385a65 Compare April 11, 2020 16:56

pdeffebach added 6 commits June 8, 2020 10:13

add nulldeviance

b2c2720

Start testing

c9d6d8b

fix tests

ba4bfdd

continue tests

45efbbf

Continue testing

9641144

Merge branch 'nullloglikelihood_glm' of https://github.com/pdeffebach…

02cf2d6

…/GLM.jl into nullloglikelihood_glm

pdeffebach changed the base branch from master to docsupdate June 8, 2020 15:45

pdeffebach changed the base branch from docsupdate to master June 8, 2020 15:46

change ending place

77c5b3b

nalimilan closed this Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

null deviance and null loglikelihood for GLM #352

null deviance and null loglikelihood for GLM #352

pdeffebach commented Jan 6, 2020

codecov-io commented Jan 6, 2020 •

edited

nalimilan left a comment

pdeffebach commented Apr 11, 2020

codecov-commenter commented Jun 8, 2020 •

edited

pdeffebach commented Jun 8, 2020

nalimilan commented Jun 20, 2020

pdeffebach commented Aug 21, 2020

pdeffebach commented Aug 21, 2020 •

edited

nalimilan commented Aug 22, 2020

kleinschmidt commented Aug 23, 2020

nalimilan commented Aug 23, 2020

kleinschmidt commented Aug 24, 2020

pdeffebach commented Aug 24, 2020

nalimilan commented Apr 27, 2022

null deviance and null loglikelihood for GLM #352

null deviance and null loglikelihood for GLM #352

Conversation

pdeffebach commented Jan 6, 2020

codecov-io commented Jan 6, 2020 • edited

Codecov Report

nalimilan left a comment

Choose a reason for hiding this comment

pdeffebach commented Apr 11, 2020

codecov-commenter commented Jun 8, 2020 • edited

Codecov Report

pdeffebach commented Jun 8, 2020

nalimilan commented Jun 20, 2020

pdeffebach commented Aug 21, 2020

pdeffebach commented Aug 21, 2020 • edited

nalimilan commented Aug 22, 2020

kleinschmidt commented Aug 23, 2020

nalimilan commented Aug 23, 2020

kleinschmidt commented Aug 24, 2020

pdeffebach commented Aug 24, 2020

nalimilan commented Apr 27, 2022

codecov-io commented Jan 6, 2020 •

edited

codecov-commenter commented Jun 8, 2020 •

edited

pdeffebach commented Aug 21, 2020 •

edited