Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18356] [ML] KMeans should cache RDD before training #16295

Closed
wants to merge 8 commits into from

Conversation

ZakariaHili
Copy link
Contributor

What changes were proposed in this pull request?

According to request of Mr. Joseph Bradley , I did this update of my PR #15965 in order to eliminate the extrat fit() method.

@jkbradley

How was this patch tested?

Pass existing tests

}

@Since("2.2.0")
protected def fit(dataset: Dataset[_], handlePersistence: Boolean): KMeansModel = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @hhbyyh who I believe suggested this method for consistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with the change.

Copy link
Contributor Author

@ZakariaHili ZakariaHili Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hhbyyh, so that means that we should do the same things for others methods:
caching rdd before training for for:
-GaussianMixture.scala
-BisectingKMeans.scala
-LDA.scala
and remove the extrat train() method for all methods in the package < classification > .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another jira focusing on that issue. SPARK-18608. Feel free to share your suggestions in the jira discussion.

@SparkQA
Copy link

SparkQA commented Dec 16, 2016

Test build #3508 has started for PR 16295 at commit 7c4883f.

@srowen
Copy link
Member

srowen commented Dec 17, 2016

CC @jkbradley for approval

@SparkQA
Copy link

SparkQA commented Dec 19, 2016

Test build #3510 has finished for PR 16295 at commit 7c4883f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 19, 2016

Merged to master

@asfgit asfgit closed this in 7db09ab Dec 19, 2016
@jkbradley
Copy link
Member

This looks fine, thanks!

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

According to request of Mr. Joseph Bradley , I did this update of my PR apache#15965 in order to eliminate the extrat fit() method.

jkbradley
## How was this patch tested?
Pass existing tests

Author: Zakaria_Hili <zakahili@gmail.com>
Author: HILI Zakaria <zakahili@gmail.com>

Closes apache#16295 from ZakariaHili/zakbranch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants