-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate inverse link and its derivative simultaneously #192
Conversation
Modify tests to use approximate comparisons. The ftest tables now fail locally, perhaps due to tab/spaces inconsistency. I will change them to doctests.
test/runtests.jl
Outdated
""" | ||
Res. DOF DOF ΔDOF SSR ΔSSR R² ΔR² F* p(>F) | ||
Model 1 10 3 0.1283 0.9603 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trailing whitespace here will need to be preserved since it's inside of a string. Below (line 391) as well.
jldoctest chunks are failing because GLM namespace is not visible. Help would be appreciated.
.travis.yml
Outdated
@@ -5,7 +5,7 @@ os: | |||
julia: | |||
- 0.5 | |||
- 0.6 | |||
- nightly | |||
# - nightly Still a problem with StatsFuns package on nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just fixed minutes ago when the Compat.jl PR to METADATA got merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll wait to see if this version gets through CI cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my machine the package passes its tests on nightly with the Compat PR merged.
@andreasnoack @ararslan Can this be merged now? |
I'll try to get through this PR over the weekend. |
Likewise |
I was able to do a preliminary look over the changes and they look good in general. However, the code blocks in the docstrings are annotated as |
end | ||
|
||
linkfun(::InverseSquareLink, μ) = inv(abs2(μ)) | ||
linkinv(::InverseSquareLink, η) = inv(sqrt(η)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by the theory here. In Table 2.1 of McCullagh and Nelder, they state that while the canonical link of the Inverse Gaussian is indeed 1/μ²
, the "inverse" is (-2θ)^(-1/2)
which is not the inverse of θ(μ)=1/μ²
but has the property that dμ/dθ=μ³
. The current version here doesn't have that property so it appears to me that the optimization in updateμ!
is invalid for the Inverse Gaussian with the canonical link with the current version of linkinv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the careful review.
I used the expressions returned by inverse.gaussian()
in R as the basis for my code.
julia> R"print(names(fam <- inverse.gaussian()))";
[1] "family" "link" "linkfun" "linkinv" "variance"
[6] "dev.resids" "aic" "mu.eta" "initialize" "validmu"
[11] "valideta" "simulate"
julia> R"fam$linkfun"
RCall.RObject{RCall.ClosSxp}
function (mu)
1/mu^2
<environment: namespace:stats>
julia> R"fam$linkinv"
RCall.RObject{RCall.ClosSxp}
function (eta)
1/sqrt(eta)
<environment: namespace:stats>
julia> R"fam$mu.eta"
RCall.RObject{RCall.ClosSxp}
function (eta)
-1/(2 * eta^1.5)
<environment: namespace:stats>
julia> R"fam$variance"
RCall.RObject{RCall.ClosSxp}
function (mu)
mu^3
<bytecode: 0x560013a5a328>
<environment: 0x560013a5c4a8>
and it looks as if you are correct - the variance function differs from dμ/dη
by a factor of -1/2
. I guess this must be the case because the variance function should be positive but this link and its inverse is monotone decreasing.
Of course the same is true for InverseLink
which is the canonical link for the Gamma
distribution.
This raises the interesting question of how the examples using these distributions and canonical links worked in the first place.
I'll see what I can find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the bad news is that my iscanonical
methods don't work the way that I thought (I know, if I had written unit tests I would have found this out earlier).
julia> using DataFrames, GLM
julia> clotting = DataFrame(u = log.([5,10,15,20,30,40,60,80,100]),
lot1 = [118,58,42,35,27,25,21,19,18])
9×2 DataFrames.DataFrame
│ Row │ u │ lot1 │
├─────┼─────────┼──────┤
│ 1 │ 1.60944 │ 118 │
│ 2 │ 2.30259 │ 58 │
│ 3 │ 2.70805 │ 42 │
│ 4 │ 2.99573 │ 35 │
│ 5 │ 3.4012 │ 27 │
│ 6 │ 3.68888 │ 25 │
│ 7 │ 4.09434 │ 21 │
│ 8 │ 4.38203 │ 19 │
│ 9 │ 4.60517 │ 18 │
julia> gm8 = fit(GeneralizedLinearModel, @formula(lot1 ~ 1 + u), clotting, Gamma(), verbose=true)
1: 0.016729715388848446, 0.0021177181489758666
2: 0.016729715178483907, 1.2574304824954926e-8
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: lot1 ~ 1 + u
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) -0.0165544 0.000927542 -17.8476 <1e-70
u 0.0153431 0.000414958 36.9751 <1e-99
julia> gm8.model.rr
GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink}([118.0, 58.0, 42.0, 35.0, 27.0, 25.0, 21.0, 19.0, 18.0], Distributions.Gamma{Float64}(α=1.0, θ=1.0), [0.00160669, 0.00746689, 0.00240188, 0.000843898, 0.00147963, 1.23783e-6, 0.000823452, 0.00141054, 0.000695503], [0.00813941, 0.0187744, 0.0249955, 0.0294095, 0.0356306, 0.0400445, 0.0462656, 0.0506796, 0.0541033], [122.859, 53.2639, 40.0071, 34.0026, 28.0658, 24.9722, 21.6143, 19.7318, 18.4832], Float64[], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [15094.3, 2837.04, 1600.57, 1156.18, 787.688, 623.611, 467.179, 389.345, 341.628], [0.000321911, -0.00166938, -0.0012451, -0.000862636, 0.00135305, -4.45692e-5, 0.00131496, 0.00187963, 0.00141432])
julia> GLM.iscanonical(gm8.model.rr)
false
The good news is that the apparently redundant calculations in the working weights are not being short circuited and the desired answer was obtained.
I still need to figure out why the methods for iscanonical
weren't providing the answers I thought they would and how to properly evaluate the working weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks as if I need to define the methods as
julia> iscanonical{V,T}(::GlmResp{V,Gamma{T},InverseLink}) = true
iscanonical (generic function with 1 method)
julia> iscanonical(gm8.model.rr)
true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. I've made that mistake many times. There is a slightly more convenient syntax for this now. You should be able to do iscanonical(::GlmResp{<:Any,<:Gamma},InverseLink}) = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've looked more carefully at the inverse Gaussian case now. The thing is that there is a bit of freedom in how you can choose in the canonical parameters in the inverse Gaussian model. You can e.g. set θ = 1/μ²
or θ = -1/(2μ²)
which have different signs. The former choice implies a negative variance function but therefore also a negative dispersion. The latter version has positive variance function and dispersion but θ<0
.
However, it seems that you can scale the variance function with any (non-zero) constant and it will still work since it will end up as the same factor in the score and the information so it will cancel in the update. When the variance is calculated, the estimation of the dispersion will capture the "wrong" factor. Bottomline is that a fast version should also work just fine even if we keep R's choices of functions. However, it is still a bit confusing that the functions don't follow the theory here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always add the optimization later. It doesn't have to be in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the function name to cancancel
and added some explanation in the documentation.
I also added tests for it and for fitting an InverseGaussian
model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually still think I prefer iscanonical
. Do you think it is better to avoid "canonical"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A link can be the canonical link for a distribution but not provide for cancellation of terms. E.g. Gamma/InverseLink
and InverseGaussian/InverseSquareLink
I would say that iscanonical
should return true
for those cases but cancancel
returns false
.
The cancancel
function is only used in the evaluation of the working weights in the updateμ!
methods.
@ararslan Thanks for the review. Regarding the |
The canonical links for `Gamma` and `InverseGaussian` are flagged at present because it is not true that the variance function is equal to `dμ/dη`. The variance function is a negative multiple of the derivative.
Ah, sorry for the oversight. I forgot that |
We're green. @andreasnoack could you do the honors, please? |
Am I right that there isn't a (non-perf) test for fitting an |
Also use `L()` instead of `Link(r)`.
@@ -0,0 +1,30 @@ | |||
site_name: GLM.jl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file isn't needed nor used with the standard setup where the native HTML backend is used. See for example in CategoricalArrays: https://github.com/JuliaData/CategoricalArrays.jl/blob/master/docs/make.jl
I am proposing these changes in place of #190
During the iterations of the IRLS algorithm both the inverse link function and its derivative are evaluated. Separate evaluations can result in a small amount of repeated evaluation which can be avoided by having both values returned by the method for
inverselink
.The more important issue is ensuring that the evaluation of the derivative and of the variance function returns a non-zero value, because the derivative appears in the denominator of the working residual and the variance function appears in the denominator of the working weights.
It turns out that this is not a problem for most combinations of distribution and link. When the link is canonical for the distribution the variance function is the derivative. That is
and cancellation yields a simpler form for the working weights. See the use of
iscanonical(r)
in theupdateμ!
methods forr
.The only remaining problematic cases are the
Bernoulli
orBinomial
distributions with non-canonical link functions becauseand the value of μ may round off to 1.0 even when the product, μ * (1 - μ), can be evaluated stably from η.
inverselink
methods for subtypes ofLink01
therefore return a 3-tuple of the mean, the derivative and the variance function.Doing so avoids most of the clamping required in other approaches. The only exception is the
CloglogLink
. The inverse link in aLink01
type is a continuous, differentiable, invertible (hence monotone) function on 𝕽 into (0,1). Without loss of generality it can be assumed to be monotone increasing which means it corresponds to the cdf of a continuous univariate distribution with support on 𝕽. The pdf is the derivative function. A symmetric Link01 type has a symmetric pdf.ProbitLink
andCauchitLink
, corresponding to the standardNormal
andCauchy
distributions, are symmetric.CloglogLink
, corresponding to the extreme-value distribution, is highly asymmetric. However, in this approach the only clamping usesrealmin()
, noteps()
. (I found out the hard way that this clamping was still needed.)The Travis runs are failing on a test in
predict
. I feel that the test is at fault, not the code. The test fits a Binomial glm to data that are not binomial responses. The responses are random uniform variates, not fractions of successes. This is the moral equivalent of a least squares fit of data simulated as Xβ without any random variation. The test failure is essentially due to round off because the test is phrased as==
notisapprox
.I think it is the test that should be fixed. The simplest fix is to replace the
==
with an approximate comparison. I feel it would be better to go further and create a simulated value that actually corresponds to the model.