-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8601][ML] Add an option to disable standardization for linear regression #7037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8601][ML] Add an option to disable standardization for linear regression #7037
Conversation
|
Test build #35853 has finished for PR 7037 at commit
|
|
Test build #35879 has finished for PR 7037 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAVER - http://www.naver.com/
sujkh@naver.com 님께 보내신 메일 <Re: [spark] [SPARK-8601][ML] Add an option to disable standardization for linear regression (#7037)> 이 다음과 같은 이유로 전송 실패했습니다.
받는 사람이 회원님의 메일을 수신차단 하였습니다.
|
You don't cover all the test cases including with/without intercept. Also, for regParam = 0, they should converge to the same solution. |
|
Test build #36173 has finished for PR 7037 at commit
|
… an option to disable standardization (but for LoR).
|
Test build #36177 has finished for PR 7037 at commit
|
|
@dbtsai I've extended the test coverage. |
|
@holdenk Cool. I'll work on this tonight. Thanks. |
|
Test build #36975 has finished for PR 7037 at commit
|
|
Test build #36983 has finished for PR 7037 at commit
|
|
jenkins, retest this please |
|
Test build #36985 has finished for PR 7037 at commit
|
|
jenkins, retest this please |
|
Test build #29 has finished for PR 7037 at commit
|
|
Test build #37572 has finished for PR 7037 at commit
|
|
@holdenk can you merge master? Thanks. |
…park-8601-in-Linear_regression
|
Test build #37926 has finished for PR 7037 at commit
|
All compressed sensing applications, and some of the regression use-cases will have better result by turning the feature scaling off. However, if we implement this naively by training the dataset without doing any standardization, the rate of convergency will not be good. This can be implemented by still standardizing the training dataset but we penalize each component differently to get effectively the same objective function but a better numerical problem. As a result, for those columns with high variances, they will be penalized less, and vice versa. Without this, since all the features are standardized, so they will be penalized the same.
In R, there is an option for this.
standardizeLogical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize. See details below for y standardization with family="gaussian".