Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13035] [ML] [PySpark] PySpark ml.clustering support export/import #10999

Closed
wants to merge 1 commit into from

Conversation

yanboliang
Copy link
Contributor

PySpark ml.clustering support export/import.

@SparkQA
Copy link

SparkQA commented Jan 31, 2016

Test build #50461 has finished for PR 10999 at commit dffafbf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel(JavaModel, MLWritable, MLReadable):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol, HasSeed,

@@ -69,6 +70,25 @@ class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol
True
>>> rows[2].prediction == rows[3].prediction
True
>>> import os, tempfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is maybe a bit much for a doctest since in general they are supposed to be example-ish. Maybe this should be in the tests file instead? Just a suggestion though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... Here we combine the test and example functions. I do not have strong preference about whether this should live here or in tests file. @jkbradley

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @holdenk . This may be too verbose for a doctest. We can move the temp directory setup in test preparation (where we initialize sqlContext) and clean up. We can do that in a separate PR. @holdenk Could you create a JIRA for it? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing will do

On Thursday, February 11, 2016, Xiangrui Meng notifications@github.com
wrote:

In python/pyspark/ml/clustering.py
#10999 (comment):

@@ -69,6 +70,25 @@ class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol
True
>>> rows[2].prediction == rows[3].prediction
True

  • import os, tempfile

I agree with @holdenk https://github.com/holdenk . This may be too
verbose for a doctest. We can move the temp directory setup in test
preparation (where we initialize sqlContext) and clean up. We can do that
in a separate PR. @holdenk https://github.com/holdenk Could you create
a JIRA for it? Thanks!


Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/10999/files#r52688040.

Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

@asfgit asfgit closed this in 30e0095 Feb 11, 2016
@mengxr
Copy link
Contributor

mengxr commented Feb 11, 2016

LGTM. Merged into master. Thanks!

@yanboliang yanboliang deleted the spark-13035 branch February 12, 2016 08:11
asfgit pushed a commit that referenced this pull request Feb 20, 2016
… outside of the doctests

Some of the new doctests in ml/clustering.py have a lot of setup code, move the setup code to the general test init to keep the doctest more example-style looking.
In part this is a follow up to #10999
Note that the same pattern is followed in regression & recommendation - might as well clean up all three at the same time.

Author: Holden Karau <holden@us.ibm.com>

Closes #11197 from holdenk/SPARK-13302-cleanup-doctests-in-ml-clustering.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants