-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18356] [ML] KMeans should cache RDD before training #16295
Conversation
… zakbranch Conflicts: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
} | ||
|
||
@Since("2.2.0") | ||
protected def fit(dataset: Dataset[_], handlePersistence: Boolean): KMeansModel = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @hhbyyh who I believe suggested this method for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hhbyyh, so that means that we should do the same things for others methods:
caching rdd before training for for:
-GaussianMixture.scala
-BisectingKMeans.scala
-LDA.scala
and remove the extrat train() method for all methods in the package < classification > .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another jira focusing on that issue. SPARK-18608. Feel free to share your suggestions in the jira discussion.
Test build #3508 has started for PR 16295 at commit |
CC @jkbradley for approval |
Test build #3510 has finished for PR 16295 at commit
|
Merged to master |
This looks fine, thanks! |
## What changes were proposed in this pull request? According to request of Mr. Joseph Bradley , I did this update of my PR apache#15965 in order to eliminate the extrat fit() method. jkbradley ## How was this patch tested? Pass existing tests Author: Zakaria_Hili <zakahili@gmail.com> Author: HILI Zakaria <zakahili@gmail.com> Closes apache#16295 from ZakariaHili/zakbranch.
What changes were proposed in this pull request?
According to request of Mr. Joseph Bradley , I did this update of my PR #15965 in order to eliminate the extrat fit() method.
@jkbradley
How was this patch tested?
Pass existing tests