Fixed linear model with perfectly collinear rhs variables and weights #432

dwinkler1 · 2021-05-27T11:41:05Z

This is my probably not very efficient suggestion to fix #420
Suggestions very welcome!!

dwinkler1 · 2021-11-23T17:37:23Z

Please let me know if I should change anything in the PR. :)

andreasnoack · 2021-11-25T20:07:44Z

Looks right to me. Could you please add a test case as well?

Update to latest GLM

dwinkler1 · 2021-12-02T12:32:25Z

I tried to add a teset but for me the earlier testset "linear model with weights" errors already. I feel like it is related to the PR but I am not sure how we even end up with a CholeskyPivoted in this case.

This would be my testset

@testset "collinearity and weights" begin
    rng = StableRNG(1234321)
    x1 = randn(100)
    x1_2 = 3 * x1
    x2 = 10 * randn(100)
    x2_2 = -2.4 * x2
    y = 1 .+ randn() * x1 + randn() * x2 + 2 * randn(100)
    df = DataFrame(y = y, x1 = x1, x2 = x1_2, x3 = x2, x4 = x2_2, weights = repeat([1, 0.5],50))
    f = @formula(y ~ x1 + x2 + x3 + x4)
    lm_model = lm(f, df, wts = df.weights)#, dropcollinear = true)
    X = [ones(length(y)) x1_2 x2_2]
    W = Diagonal(df.weights)
    coef_naive = (X'W*X)\X'W*y
    @test lm_model.model.pp.chol isa CholeskyPivoted
    @test rank(lm_model.model.pp.chol) == 3
    @test isapprox(filter(!=(0.0), coef(lm_model)), coef_naive)
end

This is what goes wrong earlier

julia>     lm_model = lm(f, df, wts = df.weights)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

FoodExp ~ 1 + Income

Coefficients:
──────────────────────────────────────────────────────────────────────────────
                     Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept)  -2794.53        2.92818e5  -0.01    0.9924  -5.77698e5  5.72109e5
Income           3.63222e6   3.36375e8   0.01    0.9914  -6.56787e8  6.64051e8
──────────────────────────────────────────────────────────────────────────────

nalimilan · 2021-12-04T19:10:18Z

I tried to add a teset but for me the earlier testset "linear model with weights" errors already. I feel like it is related to the PR but I am not sure how we even end up with a CholeskyPivoted in this case.

We use dropcollinear=true by default, so you get a CholeskyPivoted unless you pass dropcollinear=false explicitly. From a quick check, it appears that the new failure is due to changes this PR makes for the if rnk == length(ch.p) case. Do you really need to change the code in that branch?

dwinkler1 · 2022-02-04T13:19:24Z

@nalimilan you were of course correct. I fixed that and added tests (all test passing now). However, I made a mess with autoformatting in vscode. What would be the best way to get just the stuff I changed in without the unnecessary indentation changes? Should I just fork again?
Sorry my experience with git pull requests is quite limited.

EDIT: I gave it a try let me know if that is ok!

This reverts commit 519f3cd.

codecov-commenter · 2022-02-06T14:12:27Z

Codecov Report

Base: 87.08% // Head: 87.27% // Increases project coverage by +0.19% 🎉

Coverage data is based on head (b8eb789) compared to base (7299d18).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #432      +/-   ##
==========================================
+ Coverage   87.08%   87.27%   +0.19%     
==========================================
  Files           7        7              
  Lines         929      943      +14     
==========================================
+ Hits          809      823      +14     
  Misses        120      120

Impacted Files	Coverage Δ
src/linpred.jl	`85.29% <100.00%> (+1.68%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

nalimilan · 2022-02-06T14:25:38Z

Thanks. There are still some failures which may well be random given that they happen on different versions. Any idea what may be going on? Can you reproduce these locally?

dwinkler1 · 2022-02-07T10:03:43Z

Looks like numeric instability to me. I'll investigate. Maybe I can write a more stable version of the test case. They do not happen on my machine (ubuntu with 1.6)

nalimilan · 2022-04-08T11:55:34Z

Bump. Maybe try a different set of values?

alecloudenback · 2022-08-14T04:55:08Z

Bump on this PR. I am encountering the issue in #420 as well.

On my local machine (1.8RC4, Mac M1) the tests pass. I agree with @danielw2904 that this appears to just be a numerical difference which stands out because the failing tests were printing +/- 0.0...

All four failing test cases in the F test for model comparison testset were a change in +/- 0.0, for example:

Screen.Recording.2022-08-13.at.11.53.45.PM.mov

dwinkler1 · 2022-08-14T14:38:25Z

Sorry, I am unable to work on this anymore.

nalimilan · 2022-08-28T18:14:38Z

Unfortunately it seems to me that something more serious is going on. Locally, tests always fail, even on Julia 1.7. Maybe this has to do with BLAS and the CPU? The following models give completely different coefficients here:

julia> using GLM, StableRNGs

julia> rng = StableRNG(1234321);

julia> x1 = randn(rng, 100);

julia> x1_2 = 3 * x1;

julia> x2 = 10 * randn(rng, 100);

julia> x2_2 = -2.4 * x2;

julia> y = 1 .+ randn(rng) * x1 + randn(rng) * x2 + 2 * randn(rng, 100);

julia> df = DataFrame(y = y, x1 = x1, x2 = x1_2, x3 = x2, x4 = x2_2, weights = repeat([1, 0.5],50));

julia> lm_collinear = lm(@formula(y ~ x1 + x2 + x3 + x4), df, wts=df.weights)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

y ~ 1 + x1 + x2 + x3 + x4

Coefficients:
───────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────────
(Intercept)  0.00389006    7.74397     0.00    0.9996  -15.4334    15.4412
x1           0.0         NaN         NaN       NaN     NaN        NaN
x2           0.594857      2.78844     0.21    0.8317   -4.96379    6.1535
x3           0.0         NaN         NaN       NaN     NaN        NaN
x4           4.11564       0.351608   11.71    <1e-17    3.41472    4.81656
───────────────────────────────────────────────────────────────────────────

julia> lm_reduced = lm(@formula(y ~ x2 + x4), df, wts=df.weights)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

y ~ 1 + x2 + x4

Coefficients:
────────────────────────────────────────────────────────────────────────
                Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)  1.29678    0.265978    4.88    <1e-05  0.766565    1.827
x2           0.215149   0.0973285   2.21    0.0302  0.0211286   0.40917
x4           0.648579   0.0124034  52.29    <1e-58  0.623853    0.673305
────────────────────────────────────────────────────────────────────────

julia> deviance(lm_collinear)
410484.4884591147

julia> deviance(lm_reduced)
355.2246843374244

On current master, both models have the same deviance and the same predictions, so it seems that this PR introduces a regression at least in some cases.

nalimilan · 2022-08-28T18:19:12Z

Though if #487 (comment) fixes it, then maybe we don't need this PR after all?

nalimilan · 2022-08-30T19:51:40Z

@alecloudenback If you're interested in making a PR, you could extract the relevant part from #487 so that we can fix this bug without waiting for the more general improvements of weighting.

Fixed linear model with perfectly collinear rhs variables and weights

088a269

nalimilan requested review from andreasnoack and dmbates June 19, 2021 20:34

Merge pull request #1 from JuliaStats/master

c8fd098

Update to latest GLM

dwinkler1 and others added 4 commits February 4, 2022 13:55

Merge branch 'JuliaStats:master' into master

6c28c85

added test

3acce1e

fixed full-rank case

519f3cd

Merge branch 'master' of github.com:danielw2903/GLM.jl

bc9af0c

dwinkler1 changed the title ~~WIP: Fixed linear model with perfectly collinear rhs variables and weights~~ Fixed linear model with perfectly collinear rhs variables and weights Feb 4, 2022

Remove unnecessary deps

20b59d8

Daniel Winkler added 2 commits February 4, 2022 14:22

Revert "fixed full-rank case"

5585306

This reverts commit 519f3cd.

fixed full-rank case

eb0031f

dwinkler1 closed this Aug 14, 2022

alecloudenback mentioned this pull request Aug 14, 2022

Taking weighting seriously #487

Open

3 tasks

nalimilan reopened this Aug 28, 2022

nalimilan added 2 commits August 28, 2022 19:33

Merge branch 'master' into master

dd58d18

Remove comment

b8eb789

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed linear model with perfectly collinear rhs variables and weights #432

Fixed linear model with perfectly collinear rhs variables and weights #432

dwinkler1 commented May 27, 2021

dwinkler1 commented Nov 23, 2021

andreasnoack commented Nov 25, 2021

dwinkler1 commented Dec 2, 2021

nalimilan commented Dec 4, 2021

dwinkler1 commented Feb 4, 2022 •

edited

Loading

codecov-commenter commented Feb 6, 2022 •

edited

Loading

nalimilan commented Feb 6, 2022

dwinkler1 commented Feb 7, 2022 •

edited

Loading

nalimilan commented Apr 8, 2022

alecloudenback commented Aug 14, 2022

dwinkler1 commented Aug 14, 2022

nalimilan commented Aug 28, 2022

nalimilan commented Aug 28, 2022

nalimilan commented Aug 30, 2022

Fixed linear model with perfectly collinear rhs variables and weights #432

Are you sure you want to change the base?

Fixed linear model with perfectly collinear rhs variables and weights #432

Conversation

dwinkler1 commented May 27, 2021

dwinkler1 commented Nov 23, 2021

andreasnoack commented Nov 25, 2021

dwinkler1 commented Dec 2, 2021

nalimilan commented Dec 4, 2021

dwinkler1 commented Feb 4, 2022 • edited Loading

codecov-commenter commented Feb 6, 2022 • edited Loading

Codecov Report

nalimilan commented Feb 6, 2022

dwinkler1 commented Feb 7, 2022 • edited Loading

nalimilan commented Apr 8, 2022

alecloudenback commented Aug 14, 2022

dwinkler1 commented Aug 14, 2022

nalimilan commented Aug 28, 2022

nalimilan commented Aug 28, 2022

nalimilan commented Aug 30, 2022

dwinkler1 commented Feb 4, 2022 •

edited

Loading

codecov-commenter commented Feb 6, 2022 •

edited

Loading

dwinkler1 commented Feb 7, 2022 •

edited

Loading