-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLM fails to match certified values for the Filip dataset #558
Comments
Moreover,
It might be because of the limitation of LINPACK.
and finally,
|
I don't think it's surprising that a coefficient is dropped since the default is julia> cond(X)
1.7679681910558162e15 It's surprising, though, that we error out when setting julia> lm(@formula(y~1+x+x^2+x^3+x^4+x^5+x^6+x^7+x^8+x^9+x^10.0), df_filip; method=:qr, dropcollinear=false)
ERROR: RankDeficientException(10)
Stacktrace:
[1] delbeta!(p::GLM.DensePredQR{Float64, LinearAlgebra.QRCompactWY}, r::Vector{Float64})
@ GLM ~/.julia/dev/GLM/src/linpred.jl:66
[2] fit!(obj::LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredQR{Float64, LinearAlgebra.QRCompactWY}})
@ GLM ~/.julia/dev/GLM/src/lm.jl:97
[3] fit(::Type{…}, f::FormulaTerm{…}, data::DataFrame, allowrankdeficient_dep::Nothing; wts::Nothing, dropcollinear::Bool, method::Symbol, contrasts::Dict{…})
@ GLM ~/.julia/dev/GLM/src/lm.jl:0
[4] fit
@ ~/.julia/dev/GLM/src/lm.jl:149 [inlined]
[5] lm
@ ~/.julia/dev/GLM/src/lm.jl:179 [inlined]
[6] top-level scope
@ REPL[42]:1
Some type information was truncated. Use `show(err)` to see complete types. The problem is the check in Line 66 in e2f6c98
dropcollinear = false then we shouldn't try to outsmart her by computing a numerical rank. We should just check that there isn't a zero element in the diagonal of R .
However, to my surprise, it looks like fixing this still isn't sufficient to match the reference estimates. It looks like LAPACK's pivoted QR loses some precision relative to the standard QR. julia> qr(X, ColumnNorm()) \ df_filip.y
11-element Vector{Float64}:
9.01342654713381
1.6525459423443027
-5.767606630529379
-3.8636659613827122
-0.6703658653069542
0.1806043161686472
0.10552342922775357
0.0214449396217032
0.0022774832947497475
0.00012622643173822303
2.8896433329375396e-6
julia> qr(X) \ df_filip.y
11-element Vector{Float64}:
-1467.4896018652016
-2772.179569812206
-2316.3710641177045
-1127.9739329347135
-354.4782313151358
-75.12420126161706
-10.875317970213466
-1.0622149798557083
-0.06701911509851412
-0.002467810770121917
-4.029625231108671e-5 |
I was also a bit surprised when I checked the diagonal elements of R and rank(R).
which is again a little bit different from what is generated by |
The QR based least squares solution is julia> F = qr(X);
julia> F.R\(F.Q'*df_filip.y)[1:size(F, 2)]
11-element Vector{Float64}:
-1467.4896018652016
-2772.179569812206
-2316.3710641177045
-1127.9739329347135
-354.4782313151358
-75.12420126161706
-10.875317970213466
-1.0622149798557083
-0.06701911509851412
-0.002467810770121917
-4.029625231108671e-5 |
Oh. Now I realize what is going on in the pivoted case. There is a rank determination by default, so the relevant comparison is julia> ldiv!(qr(X, ColumnNorm()), df_filip.y[:,:], 0.0)[1][1:size(X, 2)]
11-element Vector{Float64}:
-1467.4896118403967
-2772.1795879957995
-2316.371078886713
-1127.9739399822226
-354.4782335067247
-75.12420172645771
-10.875318038405625
-1.062214986693476
-0.06701911554715413
-0.002467810787511543
-4.029625261329208e-5 where the third argument to Line 66 in e2f6c98
dropcollinear = false .
|
filip.csv
In response to industrial concerns about the numerical accuracy of computations from statistical software, the Statistical Engineering and Mathematical and Computational Sciences Divisions of NIST's Information Technology Laboratory are providing datasets with certified values for various statistical methods
.One of them for linear regression is the Filip dataset.
https://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml
For this dataset, the
lm
function with theQR
decomposition method fails to estimate all parameterswhereas the
lm
function with theCholesky
decomposition method fails to estimate 6 parametersThe following table shows that the estimates from the
lm
function inR
are close to the certified values.The text was updated successfully, but these errors were encountered: