[SPARK-18356] [ML] KMeans should cache RDD before training #16295

ZakariaHili · 2016-12-15T09:15:07Z

What changes were proposed in this pull request?

According to request of Mr. Joseph Bradley , I did this update of my PR #15965 in order to eliminate the extrat fit() method.

@jkbradley

How was this patch tested?

Pass existing tests

… zakbranch Conflicts: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala

srowen · 2016-12-15T09:23:10Z

mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala

-  }
-
-  @Since("2.2.0")
-  protected def fit(dataset: Dataset[_], handlePersistence: Boolean): KMeansModel = {


CC @hhbyyh who I believe suggested this method for consistency

I'm OK with the change.

@hhbyyh, so that means that we should do the same things for others methods:
caching rdd before training for for:
-GaussianMixture.scala
-BisectingKMeans.scala
-LDA.scala
and remove the extrat train() method for all methods in the package < classification > .

There's another jira focusing on that issue. SPARK-18608. Feel free to share your suggestions in the jira discussion.

SparkQA · 2016-12-16T17:52:10Z

Test build #3508 has started for PR 16295 at commit 7c4883f.

srowen · 2016-12-17T13:48:53Z

CC @jkbradley for approval

SparkQA · 2016-12-19T10:18:19Z

Test build #3510 has finished for PR 16295 at commit 7c4883f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-12-19T10:30:57Z

Merged to master

jkbradley · 2016-12-19T18:24:52Z

This looks fine, thanks!

## What changes were proposed in this pull request? According to request of Mr. Joseph Bradley , I did this update of my PR apache#15965 in order to eliminate the extrat fit() method. jkbradley ## How was this patch tested? Pass existing tests Author: Zakaria_Hili <zakahili@gmail.com> Author: HILI Zakaria <zakahili@gmail.com> Closes apache#16295 from ZakariaHili/zakbranch.

ZakariaHili added 8 commits November 22, 2016 10:33

[SPARK-18356] [ML] Improve MLKmeans Performance

6b596dc

[SPARK-18356] [ML] Improve MLKmeans Performance

d49da76

[SPARK-18356] [ML] Improve MLKmeans Performance

ce596e8

[SPARK-18356] [ML] Improve MLKmeans Performance

fd4543d

Update KMeans.scala

f17a54c

KMeans should cache RDD before training

58de549

Merge branch 'zakbranch' of https://github.com/ZakariaHili/spark into…

37963ec

… zakbranch Conflicts: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala

Eliminate the extra fit() method

7c4883f

srowen reviewed Dec 15, 2016

View reviewed changes

asfgit closed this in 7db09ab Dec 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-18356] [ML] KMeans should cache RDD before training #16295

[SPARK-18356] [ML] KMeans should cache RDD before training #16295

ZakariaHili commented Dec 15, 2016

srowen Dec 15, 2016

hhbyyh Dec 15, 2016

ZakariaHili Dec 16, 2016 •

edited

Loading

hhbyyh Dec 16, 2016

SparkQA commented Dec 16, 2016

srowen commented Dec 17, 2016

SparkQA commented Dec 19, 2016

srowen commented Dec 19, 2016

jkbradley commented Dec 19, 2016

[SPARK-18356] [ML] KMeans should cache RDD before training #16295

[SPARK-18356] [ML] KMeans should cache RDD before training #16295

Conversation

ZakariaHili commented Dec 15, 2016

What changes were proposed in this pull request?

How was this patch tested?

srowen Dec 15, 2016

Choose a reason for hiding this comment

hhbyyh Dec 15, 2016

Choose a reason for hiding this comment

ZakariaHili Dec 16, 2016 • edited Loading

Choose a reason for hiding this comment

hhbyyh Dec 16, 2016

Choose a reason for hiding this comment

SparkQA commented Dec 16, 2016

srowen commented Dec 17, 2016

SparkQA commented Dec 19, 2016

srowen commented Dec 19, 2016

jkbradley commented Dec 19, 2016

ZakariaHili Dec 16, 2016 •

edited

Loading