Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate inverse link and its derivative simultaneously #192

Merged
merged 22 commits into from
Aug 16, 2017

Conversation

dmbates
Copy link
Contributor

@dmbates dmbates commented Aug 9, 2017

I am proposing these changes in place of #190

During the iterations of the IRLS algorithm both the inverse link function and its derivative are evaluated. Separate evaluations can result in a small amount of repeated evaluation which can be avoided by having both values returned by the method for inverselink.

The more important issue is ensuring that the evaluation of the derivative and of the variance function returns a non-zero value, because the derivative appears in the denominator of the working residual and the variance function appears in the denominator of the working weights.

It turns out that this is not a problem for most combinations of distribution and link. When the link is canonical for the distribution the variance function is the derivative. That is

glmvar(D, linkinv(η)) == mueta(canonicallink(D), η)

and cancellation yields a simpler form for the working weights. See the use of iscanonical(r) in the updateμ! methods for r.

The only remaining problematic cases are the Bernoulli or Binomial distributions with non-canonical link functions because

glmvar(::Union{Bernoulli,Binomial}, μ) = μ * (1 - μ)

and the value of μ may round off to 1.0 even when the product, μ * (1 - μ), can be evaluated stably from η. inverselink methods for subtypes of Link01 therefore return a 3-tuple of the mean, the derivative and the variance function.

Doing so avoids most of the clamping required in other approaches. The only exception is the CloglogLink. The inverse link in a Link01 type is a continuous, differentiable, invertible (hence monotone) function on 𝕽 into (0,1). Without loss of generality it can be assumed to be monotone increasing which means it corresponds to the cdf of a continuous univariate distribution with support on 𝕽. The pdf is the derivative function. A symmetric Link01 type has a symmetric pdf. ProbitLink and CauchitLink, corresponding to the standard Normal and Cauchy distributions, are symmetric. CloglogLink, corresponding to the extreme-value distribution, is highly asymmetric. However, in this approach the only clamping uses realmin(), not eps(). (I found out the hard way that this clamping was still needed.)

function inverselink(::CloglogLink, η)
    expη = exp(η)
    μ = -expm1(-expη)
    omμ = exp(-expη)   # the complement, 1 - μ
    μ, max(realmin(μ), expη * omμ), max(realmin(μ), μ * omμ)
end

The Travis runs are failing on a test in predict. I feel that the test is at fault, not the code. The test fits a Binomial glm to data that are not binomial responses. The responses are random uniform variates, not fractions of successes. This is the moral equivalent of a least squares fit of data simulated as Xβ without any random variation. The test failure is essentially due to round off because the test is phrased as == not isapprox.

I think it is the test that should be fixed. The simplest fix is to replace the == with an approximate comparison. I feel it would be better to go further and create a simulated value that actually corresponds to the model.

Modify tests to use approximate comparisons.
The ftest tables now fail locally, perhaps due to tab/spaces
inconsistency. I will change them to doctests.
test/runtests.jl Outdated
"""
Res. DOF DOF ΔDOF SSR ΔSSR R² ΔR² F* p(>F)
Model 1 10 3 0.1283 0.9603
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing whitespace here will need to be preserved since it's inside of a string. Below (line 391) as well.

.travis.yml Outdated
@@ -5,7 +5,7 @@ os:
julia:
- 0.5
- 0.6
- nightly
# - nightly Still a problem with StatsFuns package on nightly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just fixed minutes ago when the Compat.jl PR to METADATA got merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll wait to see if this version gets through CI cleanly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine the package passes its tests on nightly with the Compat PR merged.

@dmbates
Copy link
Contributor Author

dmbates commented Aug 12, 2017

@andreasnoack @ararslan Can this be merged now?

@andreasnoack
Copy link
Member

I'll try to get through this PR over the weekend.

@ararslan
Copy link
Member

Likewise

@ararslan
Copy link
Member

I was able to do a preliminary look over the changes and they look good in general. However, the code blocks in the docstrings are annotated as jldoctest likely won't pass as tests without an extra newline between output and the next prompt; Documenter expects jldoctest blocks to look identical to a REPL session. I'm also not familiar with an mkdocs-based Documenter setup so I can't really speak to the correctness of that part (I typically do the native HTML generation from Documenter) but it looks reasonable at least.

end

linkfun(::InverseSquareLink, μ) = inv(abs2(μ))
linkinv(::InverseSquareLink, η) = inv(sqrt(η))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by the theory here. In Table 2.1 of McCullagh and Nelder, they state that while the canonical link of the Inverse Gaussian is indeed 1/μ², the "inverse" is (-2θ)^(-1/2) which is not the inverse of θ(μ)=1/μ² but has the property that dμ/dθ=μ³. The current version here doesn't have that property so it appears to me that the optimization in updateμ! is invalid for the Inverse Gaussian with the canonical link with the current version of linkinv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the careful review.

I used the expressions returned by inverse.gaussian() in R as the basis for my code.

julia> R"print(names(fam <- inverse.gaussian()))";
 [1] "family"     "link"       "linkfun"    "linkinv"    "variance"
 [6] "dev.resids" "aic"        "mu.eta"     "initialize" "validmu"
[11] "valideta"   "simulate"

julia> R"fam$linkfun"
RCall.RObject{RCall.ClosSxp}
function (mu)
1/mu^2
<environment: namespace:stats>


julia> R"fam$linkinv"
RCall.RObject{RCall.ClosSxp}
function (eta)
1/sqrt(eta)
<environment: namespace:stats>


julia> R"fam$mu.eta"
RCall.RObject{RCall.ClosSxp}
function (eta)
-1/(2 * eta^1.5)
<environment: namespace:stats>


julia> R"fam$variance"
RCall.RObject{RCall.ClosSxp}
function (mu)
mu^3
<bytecode: 0x560013a5a328>
<environment: 0x560013a5c4a8>

and it looks as if you are correct - the variance function differs from dμ/dη by a factor of -1/2. I guess this must be the case because the variance function should be positive but this link and its inverse is monotone decreasing.

Of course the same is true for InverseLink which is the canonical link for the Gamma distribution.

This raises the interesting question of how the examples using these distributions and canonical links worked in the first place.

I'll see what I can find.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the bad news is that my iscanonical methods don't work the way that I thought (I know, if I had written unit tests I would have found this out earlier).

julia> using DataFrames, GLM

julia> clotting = DataFrame(u = log.([5,10,15,20,30,40,60,80,100]),
                            lot1 = [118,58,42,35,27,25,21,19,18])
9×2 DataFrames.DataFrame
│ Row │ u       │ lot1 │
├─────┼─────────┼──────┤
│ 11.60944118  │
│ 22.3025958   │
│ 32.7080542   │
│ 42.9957335   │
│ 53.401227   │
│ 63.6888825   │
│ 74.0943421   │
│ 84.3820319   │
│ 94.6051718   │

julia> gm8 = fit(GeneralizedLinearModel, @formula(lot1 ~ 1 + u), clotting, Gamma(), verbose=true)
1: 0.016729715388848446, 0.0021177181489758666
2: 0.016729715178483907, 1.2574304824954926e-8
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: lot1 ~ 1 + u

Coefficients:
               Estimate   Std.Error  z value Pr(>|z|)
(Intercept)  -0.0165544 0.000927542 -17.8476   <1e-70
u             0.0153431 0.000414958  36.9751   <1e-99


julia> gm8.model.rr
GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink}([118.0, 58.0, 42.0, 35.0, 27.0, 25.0, 21.0, 19.0, 18.0], Distributions.Gamma{Float64}=1.0, θ=1.0), [0.00160669, 0.00746689, 0.00240188, 0.000843898, 0.00147963, 1.23783e-6, 0.000823452, 0.00141054, 0.000695503], [0.00813941, 0.0187744, 0.0249955, 0.0294095, 0.0356306, 0.0400445, 0.0462656, 0.0506796, 0.0541033], [122.859, 53.2639, 40.0071, 34.0026, 28.0658, 24.9722, 21.6143, 19.7318, 18.4832], Float64[], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [15094.3, 2837.04, 1600.57, 1156.18, 787.688, 623.611, 467.179, 389.345, 341.628], [0.000321911, -0.00166938, -0.0012451, -0.000862636, 0.00135305, -4.45692e-5, 0.00131496, 0.00187963, 0.00141432])

julia> GLM.iscanonical(gm8.model.rr)
false

The good news is that the apparently redundant calculations in the working weights are not being short circuited and the desired answer was obtained.

I still need to figure out why the methods for iscanonical weren't providing the answers I thought they would and how to properly evaluate the working weights.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as if I need to define the methods as

julia> iscanonical{V,T}(::GlmResp{V,Gamma{T},InverseLink}) = true
iscanonical (generic function with 1 method)

julia> iscanonical(gm8.model.rr)
true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. I've made that mistake many times. There is a slightly more convenient syntax for this now. You should be able to do iscanonical(::GlmResp{<:Any,<:Gamma},InverseLink}) = true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've looked more carefully at the inverse Gaussian case now. The thing is that there is a bit of freedom in how you can choose in the canonical parameters in the inverse Gaussian model. You can e.g. set θ = 1/μ² or θ = -1/(2μ²) which have different signs. The former choice implies a negative variance function but therefore also a negative dispersion. The latter version has positive variance function and dispersion but θ<0.

However, it seems that you can scale the variance function with any (non-zero) constant and it will still work since it will end up as the same factor in the score and the information so it will cancel in the update. When the variance is calculated, the estimation of the dispersion will capture the "wrong" factor. Bottomline is that a fast version should also work just fine even if we keep R's choices of functions. However, it is still a bit confusing that the functions don't follow the theory here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always add the optimization later. It doesn't have to be in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the function name to cancancel and added some explanation in the documentation.

I also added tests for it and for fitting an InverseGaussian model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually still think I prefer iscanonical. Do you think it is better to avoid "canonical"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A link can be the canonical link for a distribution but not provide for cancellation of terms. E.g. Gamma/InverseLink and InverseGaussian/InverseSquareLink I would say that iscanonical should return true for those cases but cancancel returns false.

The cancancel function is only used in the evaluation of the working weights in the updateμ! methods.

@dmbates
Copy link
Contributor Author

dmbates commented Aug 14, 2017

@ararslan Thanks for the review. Regarding the jldoctest blocks, I used showcompact, which does not add a blank line when printing a vector. I think you will find that they do pass doctest - at least on my machine they do. Or you can just cut-and-paste those statements into the REPL.

The canonical links for `Gamma` and `InverseGaussian` are flagged at
present because it is not true that the variance function is equal to
`dμ/dη`.  The variance function is a negative multiple of the
derivative.
@ararslan
Copy link
Member

Ah, sorry for the oversight. I forgot that showcompact omits the newline.

@dmbates
Copy link
Contributor Author

dmbates commented Aug 14, 2017

We're green. @andreasnoack could you do the honors, please?

@andreasnoack
Copy link
Member

Am I right that there isn't a (non-perf) test for fitting an InverseGaussian?

@andreasnoack andreasnoack merged commit 3d22e3b into master Aug 16, 2017
@andreasnoack andreasnoack deleted the db/morecloglog branch August 16, 2017 02:00
@@ -0,0 +1,30 @@
site_name: GLM.jl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file isn't needed nor used with the standard setup where the native HTML backend is used. See for example in CategoricalArrays: https://github.com/JuliaData/CategoricalArrays.jl/blob/master/docs/make.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants