Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

expose intermediateStorageLevel in mllib:
1, add new shared param HasIntermediateStorageLevel;
2, make LinearSVCParams,LogisticRegressionParams,MultilayerPerceptronParams,AFTSurvivalRegressionParams,LinearRegressionParams extend HasIntermediateStorageLevel;
2, make DecisionTreeParams extend HasIntermediateStorageLevel for all tree models;
3, make FactorizationMachinesParams extend HasIntermediateStorageLevel for FMRegressor and FMClassifier;
4, make ALSParams extend HasIntermediateStorageLevel;

Why are the changes needed?

Existing mllib impls persist intermediate datasets at level "MEMORY_AND_DISK", it should be useful to expose it to end users.

Does this PR introduce any user-facing change?

Yes, new param is added

How was this patch tested?

updated py doc test

@zhengruifeng zhengruifeng changed the title [SPARK-33773][ML][PYSPARK] expose intermediateStorageLevel in mllib [SPARK-33773][ML][PYSPARK] expose intermediateStorageLevel in mllib - als,clf,reg Dec 14, 2020
@SparkQA
Copy link

SparkQA commented Dec 14, 2020

Test build #132758 has finished for PR 30758 at commit 8c0bce6.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class HasIntermediateStorageLevel(Params):
  • class _ALSParams(_ALSModelParams, HasMaxIter, HasRegParam, HasCheckpointInterval, HasSeed,
  • class _DecisionTreeParams(HasCheckpointInterval, HasSeed, HasWeightCol,

@SparkQA
Copy link

SparkQA commented Dec 14, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37360/

@SparkQA
Copy link

SparkQA commented Dec 14, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37360/

@SparkQA
Copy link

SparkQA commented Dec 14, 2020

Test build #132770 has finished for PR 30758 at commit d8dac13.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 14, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37372/

@SparkQA
Copy link

SparkQA commented Dec 14, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37372/

@zhengruifeng zhengruifeng changed the title [SPARK-33773][ML][PYSPARK] expose intermediateStorageLevel in mllib - als,clf,reg [SPARK-33773][ML][PYSPARK][WIP] expose intermediateStorageLevel in mllib - als,clf,reg Dec 15, 2020
fix mima
@github-actions github-actions bot added the BUILD label Dec 15, 2020
@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37391/

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37391/

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Test build #132790 has finished for PR 30758 at commit 782b51d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 15, 2020

How often does it matter to change this?

@zhengruifeng
Copy link
Contributor Author

@srowen
In SPARK-29967, @huaxingao and I found that there is a performance degradation in KMeans, if the dataset persisting level is changed from OFF_HEAP to default MEMORY_AND_DISK (due to double caching).
And in some practical cases, I just want to (or try to) save some RAM by MEMORY_ONLY_SER.
In general, I think this will hurt nothing and will be helpful in some cases.

@zhengruifeng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37600/

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37600/

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Test build #133001 has finished for PR 30758 at commit 782b51d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng zhengruifeng deleted the share_storage_clf branch January 8, 2021 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants