-
Notifications
You must be signed in to change notification settings - Fork 115
Negative r2 for linear model with no intercept #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #481 +/- ##
==========================================
+ Coverage 85.12% 87.13% +2.00%
==========================================
Files 7 7
Lines 827 847 +20
==========================================
+ Hits 704 738 +34
+ Misses 123 109 -14
Continue to review full report at Codecov.
|
|
@nalimilan @ViralBShah the PR is only failing on Julia nightly—windows. I don't think it's to do with anything that we've changed. What shall we do in this situation? It's happening with PR 482 as well. |
|
Best would be to isolate a minimal test case and file a bug against Julia. |
|
Actually do check the logs. This has to do with timezones, so ignore. Need to see what is going on though |
|
I think the TimeZones issue is a network problem, I've seen it before. Regarding the PR itself, I think we'll have to adjust the definition of |
…defination of adjusted r-squared, added test cases, added more metrics in test cases
kleinschmidt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably good to also test teh case when the model is are not fit to a formula? e.g. the lm(X::Matrix, y::Vector) method. AFAICT it should Just Work TM (there are methods for hasintercept for formula-less models) but good to make sure it doesn't break :)
|
(For some reason I can't comment in the thread above.) I think |
Thanks for your suggestion. It has been committed. Co-authored-by: Alex Arslan <ararslan@comcast.net>
Removed indexing as suggested. Co-authored-by: Alex Arslan <ararslan@comcast.net>
…` function + added one more test case without providing any formula in `lm` function
|
Thanks, looks good! I just wonder what release strategy we should adopt. Technically this could be considered as a breaking change (because it changes the definition of |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Removed `;`. The suggestion is commited. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
|
@nalimilan I like the idea of issuing a warning for models without intercepts and a minor version bump -- it's a good thing to call out anyway that a lot of classical summary definitions are "weird" for models without intercepts. |
| m = mean(y, weights(wts)) | ||
| end | ||
| else | ||
| m = zero(eltype(y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's try something like this then? It would also be good to adapt Project.toml to require StatsAPI 1.4, which I'll tag shortly (JuliaStats/StatsAPI.jl#14), so that the docstring is consistent with what we do here.
| m = zero(eltype(y)) | |
| @warn("Starting from GLM.jl 1.8, null model is defined as having no predictor at all " * | |
| "when a model without an intercept is passed.") | |
| m = zero(eltype(y)) |
|
@kleinschmidt Technically, it's possible to fit a model which has an intercept but for which |
|
I don't have rights to edit your branch so I pushed the commit directly to master with the warning. Thanks! |
The Julia
lmproduces negative r2 (-0.15702479338842856) for the following:data = DataFrame(x = 60:70, y = 130:140)mdl = lm(@formula(y ~ 0 + x), data)r2(mdl)The certified value of r2 = 0.999365492298663
https://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA/NoInt1.dat
While investigating the reason, we found that the
nulldeviancecalculation is different. Ideally, if the model has the intercept term, then thenulldeviance=y’y – n * mean(y)^2and if the model does not have the intercept, then thenulldeviance=y’y.In this PR, we have attempted to correct the
nulldeviancecalculation.The following is the summary of the changes:
hasinterceptof typeBoolinnulldeviancefunction with existing parameter typeLinResp.nulldeviance(obj::LinearModel) = nulldeviance(obj.rr, hasintercept(obj))