[SPARK-13132] [MLlib] cache standardization param value in LogisticRegression #11027

idigary · 2016-02-02T16:27:24Z

cache the value of the standardization Param in LogisticRegression, rather than re-fetching it from the ParamMap for every index and every optimization step in the quasi-newton optimizer

also, fix Param#toString to cache the stringified representation, rather than re-interpolating it on every call, so any other implementations that have similar repeated access patterns will see a benefit.

this change improves training times for one of my test sets from ~7m30s to ~4m30s

improve LogisticRegression training times by ~35%-45% by caching the model standardization enable parameter within the regularization closure, rather than repeatedly referencing it from the set / default maps

repeated lookup of paramter values within ParamMaps was causing a significant (35-45%) performance hit within LogisticRegression (SPARK-13132) due to the string interpolation performed by every call to hashCode. cache the stringified representation of the Param in a private instance variable, so that the string interpolation only happens once

holdenk · 2016-02-02T22:39:32Z

This looks good to test, maybe @srowen who has been active on the JIRA could whitelist the test?

srowen · 2016-02-03T03:21:01Z

Jenkins add to whitelist

srowen · 2016-02-03T03:21:33Z

Jenkins test this please

SparkQA · 2016-02-03T04:15:59Z

Test build #50636 has finished for PR 11027 at commit 6790e35.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-02-07T09:13:45Z

Merged to master

…ram value ## What changes were proposed in this pull request? Like #11027 for ```LogisticRegression```, ```LinearRegression``` with L1 regularization should also cache the value of the ```standardization``` rather than re-fetching it from the ```ParamMap``` for every OWLQN iteration. cc srowen ## How was this patch tested? No extra tests are added. It should pass all existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #11367 from yanboliang/spark-13490.

idigary added 2 commits February 2, 2016 08:14

[spark-13132/ml] optimize parameter fetch in LogisticRegression

895facf

improve LogisticRegression training times by ~35%-45% by caching the model standardization enable parameter within the regularization closure, rather than repeatedly referencing it from the set / default maps

asfgit closed this in bc8890b Feb 7, 2016

yanboliang mentioned this pull request Feb 25, 2016

[SPARK-13490] [ML] ML LinearRegression should cache standardization param value #11367

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13132] [MLlib] cache standardization param value in LogisticRegression #11027

[SPARK-13132] [MLlib] cache standardization param value in LogisticRegression #11027

idigary commented Feb 2, 2016

holdenk commented Feb 2, 2016

srowen commented Feb 3, 2016

srowen commented Feb 3, 2016

SparkQA commented Feb 3, 2016

srowen commented Feb 7, 2016

[SPARK-13132] [MLlib] cache standardization param value in LogisticRegression #11027

[SPARK-13132] [MLlib] cache standardization param value in LogisticRegression #11027

Conversation

idigary commented Feb 2, 2016

holdenk commented Feb 2, 2016

srowen commented Feb 3, 2016

srowen commented Feb 3, 2016

SparkQA commented Feb 3, 2016

srowen commented Feb 7, 2016