Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13132] [MLlib] cache standardization param value in LogisticRegression #11027

Closed

Conversation

idigary
Copy link
Contributor

@idigary idigary commented Feb 2, 2016

cache the value of the standardization Param in LogisticRegression, rather than re-fetching it from the ParamMap for every index and every optimization step in the quasi-newton optimizer

also, fix Param#toString to cache the stringified representation, rather than re-interpolating it on every call, so any other implementations that have similar repeated access patterns will see a benefit.

this change improves training times for one of my test sets from ~7m30s to ~4m30s

improve LogisticRegression training times by ~35%-45% by caching
the model standardization enable parameter within the regularization
closure, rather than repeatedly referencing it from the set /
default maps
repeated lookup of paramter values within ParamMaps was causing
a significant (35-45%) performance hit within LogisticRegression
(SPARK-13132) due to the string interpolation performed by every
call to hashCode.

cache the stringified representation of the Param in a private
instance variable, so that the string interpolation only happens
once
@holdenk
Copy link
Contributor

holdenk commented Feb 2, 2016

This looks good to test, maybe @srowen who has been active on the JIRA could whitelist the test?

@srowen
Copy link
Member

srowen commented Feb 3, 2016

Jenkins add to whitelist

@srowen
Copy link
Member

srowen commented Feb 3, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Feb 3, 2016

Test build #50636 has finished for PR 11027 at commit 6790e35.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 7, 2016

Merged to master

@asfgit asfgit closed this in bc8890b Feb 7, 2016
asfgit pushed a commit that referenced this pull request Feb 25, 2016
…ram value

## What changes were proposed in this pull request?
Like #11027 for ```LogisticRegression```, ```LinearRegression``` with L1 regularization should also cache the value of the ```standardization``` rather than re-fetching it from the ```ParamMap``` for every OWLQN iteration.
cc srowen

## How was this patch tested?
No extra tests are added. It should pass all existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11367 from yanboliang/spark-13490.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants