Implement `nulldeviance` and `nullloglikelihood` for GLMs #479

nalimilan · 2022-04-18T20:37:38Z

No description provided.

Models with offets are not supported for now.

src/glmfit.jl

palday · 2022-04-20T05:47:30Z

From McCullagh and Nelder:

For the Binomial, m is the number of Bernoulli trials, i.e. the first parameter in the usual formulation.

nalimilan · 2022-04-20T07:45:17Z

What am I supposed to conclude from this excerpt? :-D AFAICT the call to devresid is equivalent to these formulas, right? But I'm still not sure how to handle offsets.

palday · 2022-04-20T13:56:03Z

Yeah, the excerpt was also just for my own note-taking as the closest thing I found to an efficient closed-form formula for the null deviance.

I think offsets are just always awkward to handle. The offset is added to mu, right? Then we have mean(mu+offset) = mean(mu) + mean(offset) and can just use that in all our calculations, which will give the same problems you mentioned in R (#477 (comment)).

For models without an intercept, the null model is actually the same as forcing the mean to zero, right? I'm not sure if that will work numerically for some distributions, but maybe we should warn/error in those cases.

nalimilan · 2022-04-20T21:02:12Z

I think offsets are just always awkward to handle. The offset is added to mu, right? Then we have mean(mu+offset) = mean(mu) + mean(offset) and can just use that in all our calculations, which will give the same problems you mentioned in R (#477 (comment)).

Not exactly, but close: the offset is included in eta, i.e. eta = XB + offset.

After some more investigations, I've made some progress: I'm able to compute the null deviance for a Poisson model with a log link and an offset. I'm not sure why it doesn't work for a Normal model with a log link and an offset. I'll see whether I can find a solution, and if not I will at least push what I have. EDIT: the Normal distribution is probably one of the cases that the R docs refer to with "this will be incorrect if the link function depends on the data other than through the fitted mean". Maybe we should just throw an error for distributions with a dispersion parameter, as it's always better than returning incorrect results.

For models without an intercept, the null model is actually the same as forcing the mean to zero, right? I'm not sure if that will work numerically for some distributions, but maybe we should warn/error in those cases.

I wondered the same thing. The null model is defined as the one with only the intercept in the docstring currently. For models without an intercept, it could make sense to define it as the model for which all predicted values are 0. Though these things are confusing, so throwing an error would be safer, at least as long as users don't request that feature. This matters especially because people may call r2 for GLMs without realizing what null model is taken as the reference.

codecov-commenter · 2022-04-24T19:36:12Z

Codecov Report

Merging #479 (a58f8e6) into master (f149f2b) will increase coverage by 0.28%.
The diff coverage is 92.30%.

@@            Coverage Diff             @@
##           master     #479      +/-   ##
==========================================
+ Coverage   87.16%   87.44%   +0.28%     
==========================================
  Files           7        7              
  Lines         849      900      +51     
==========================================
+ Hits          740      787      +47     
- Misses        109      113       +4

Impacted Files	Coverage Δ
src/glmfit.jl	`81.25% <92.30%> (+1.95%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f149f2b...a58f8e6. Read the comment docs.

nalimilan · 2022-04-24T19:50:50Z

I had code that computes the null deviance/log-likelihood for Poisson models with offsets, but I'm afraid it loses precision significantly in some cases due to the back and forth between operations on the response scale and operations on the linear predictor scale. For a Binomial model with large offsets, this gave an infinite deviance. Given that we need a fallback that does fit the null model for other distributions anyway, I figured it would be simpler to always use it when there are offsets. I've kept the intermediate commits if you want to have a look.

nalimilan · 2022-04-24T19:42:42Z

src/glmfit.jl

+                    maxiter=m.maxiter, minstepfac=m.minstepfac,
+                    atol=m.atol, rtol=m.rtol)


I'm not sure whether that passing the tolerance is really needed, as the null model should converge much more quickly than more complex models. Maybe we don't need to use a low tolerance just to compute the deviance.

Note that this requires storing these values in new GeneralizedLinearModel fields (which could be useful in its own right).

nalimilan · 2022-04-24T20:24:08Z

Another issue is that for models fitted with negbin, the null model is taken to be the one with the same theta parameter, which will have a worse fit than calling negbin again. Maybe negbin should fill a field in the model object to indicate that theta was chosen automatically so that we throw an error from nulldeviance and nullloglikelihood instead of returning misleading results? (We can't always throw an error for NegativeBinomial as users can specify theta manually.)

src/glmfit.jl

palday · 2022-06-01T20:02:53Z

src/glmfit.jl

+        nullm = fit(GeneralizedLinearModel,
+                    fill(1.0, length(y), 1), y, d, L(), wts=wts, offset=offset,
+                    maxiter=m.maxiter, minstepfac=m.minstepfac,
+                    atol=m.atol, rtol=m.rtol)
+        dev = deviance(nullm)


I feel like we're missing an obvious optimization here, but I'm also unwilling to spend too much time optimizing a code path for something that I view as statistically a bit dubious/problematic weird.

nalimilan · 2022-06-11T14:11:17Z

Given that we now have a clear definition the null model when theere's no intercept in StatsAPI (JuliaStats/StatsAPI.jl#14) and an implementation for linear models (#481), I've pushed changes to support the no intercept case.

src/glmfit.jl

palday · 2022-06-14T17:21:07Z

src/glmfit.jl

            @inbounds for i in eachindex(y)
                dev += devresid(d, y[i], mu)
            end
        end
    else
+        X = fill(1.0, length(y), hasint ? 1 : 0)


Suggested change

X = fill(1.0, length(y), hasint ? 1 : 0)

X = fill(1.0, length(y), hasint)

src/glmfit.jl

nalimilan · 2022-07-01T07:43:28Z

@palday Good to go?

Implement nulldeviance and nullloglikelihood for GLMs

f423fe6

Models with offets are not supported for now.

nalimilan mentioned this pull request Apr 18, 2022

Do not mention pseudo-R² in manual #477

Closed

palday reviewed Apr 20, 2022

View reviewed changes

src/glmfit.jl Show resolved Hide resolved

nalimilan added 4 commits April 20, 2022 23:03

WIP offsets (correct nulldeviance)

83d26c5

WIP

1ddd717

Fit null model when there are offsets

4a3ab86

Throw or warn for models without an intercept

49c4a52

Missing tests

44221e2

nalimilan commented Apr 24, 2022

View reviewed changes

This was referenced Apr 27, 2022

Negative r2 for linear model with no intercept #481

Closed

null deviance and null loglikelihood for GLM #352

Closed

nalimilan requested a review from palday May 29, 2022 17:12

palday approved these changes Jun 1, 2022

View reviewed changes

nalimilan added 6 commits June 11, 2022 15:13

Support models with no intercept

86527eb

Drop redundant weights call

a185e21

Merge branch 'master' into nl/null

c2a57d9

Use r.link

4fdbf11

Cleanup

c8a231d

More cleanup

a58f8e6

nalimilan requested a review from palday June 11, 2022 14:11

palday reviewed Jun 14, 2022

View reviewed changes

src/glmfit.jl Show resolved Hide resolved

palday reviewed Jun 14, 2022

View reviewed changes

src/glmfit.jl Show resolved Hide resolved

palday reviewed Jun 14, 2022

View reviewed changes

src/glmfit.jl Show resolved Hide resolved

palday approved these changes Jul 1, 2022

View reviewed changes

nalimilan merged commit ebc82cc into master Jul 1, 2022

nalimilan deleted the nl/null branch July 1, 2022 15:52

nalimilan mentioned this pull request Jul 1, 2022

r2 for GLM #475

Closed

nalimilan mentioned this pull request Aug 26, 2022

svyglm field status xKDR/Survey.jl#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `nulldeviance` and `nullloglikelihood` for GLMs #479

Implement `nulldeviance` and `nullloglikelihood` for GLMs #479

nalimilan commented Apr 18, 2022 •

edited

Loading

palday commented Apr 20, 2022 •

edited

Loading

nalimilan commented Apr 20, 2022

palday commented Apr 20, 2022

nalimilan commented Apr 20, 2022 •

edited

Loading

codecov-commenter commented Apr 24, 2022 •

edited

Loading

nalimilan commented Apr 24, 2022

nalimilan Apr 24, 2022 •

edited

Loading

nalimilan commented Apr 24, 2022

palday Jun 1, 2022

nalimilan commented Jun 11, 2022

palday Jun 14, 2022

nalimilan commented Jul 1, 2022

		maxiter=m.maxiter, minstepfac=m.minstepfac,
		atol=m.atol, rtol=m.rtol)

	X = fill(1.0, length(y), hasint ? 1 : 0)
	X = fill(1.0, length(y), hasint)

Implement nulldeviance and nullloglikelihood for GLMs #479

Implement nulldeviance and nullloglikelihood for GLMs #479

Conversation

nalimilan commented Apr 18, 2022 • edited Loading

palday commented Apr 20, 2022 • edited Loading

nalimilan commented Apr 20, 2022

palday commented Apr 20, 2022

nalimilan commented Apr 20, 2022 • edited Loading

codecov-commenter commented Apr 24, 2022 • edited Loading

Codecov Report

nalimilan commented Apr 24, 2022

nalimilan Apr 24, 2022 • edited Loading

Choose a reason for hiding this comment

nalimilan commented Apr 24, 2022

palday Jun 1, 2022

Choose a reason for hiding this comment

nalimilan commented Jun 11, 2022

palday Jun 14, 2022

Choose a reason for hiding this comment

nalimilan commented Jul 1, 2022

Implement `nulldeviance` and `nullloglikelihood` for GLMs #479

Implement `nulldeviance` and `nullloglikelihood` for GLMs #479

nalimilan commented Apr 18, 2022 •

edited

Loading

palday commented Apr 20, 2022 •

edited

Loading

nalimilan commented Apr 20, 2022 •

edited

Loading

codecov-commenter commented Apr 24, 2022 •

edited

Loading

nalimilan Apr 24, 2022 •

edited

Loading