[SPARK-13035] [ML] [PySpark] PySpark ml.clustering support export/import #10999

yanboliang · 2016-01-31T09:42:42Z

PySpark ml.clustering support export/import.

SparkQA · 2016-01-31T10:08:59Z

Test build #50461 has finished for PR 10999 at commit dffafbf.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class KMeansModel(JavaModel, MLWritable, MLReadable):
- class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol, HasSeed,

holdenk · 2016-02-01T00:49:28Z

python/pyspark/ml/clustering.py

@@ -69,6 +70,25 @@ class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol
    True
    >>> rows[2].prediction == rows[3].prediction
    True
+    >>> import os, tempfile


this is maybe a bit much for a doctest since in general they are supposed to be example-ish. Maybe this should be in the tests file instead? Just a suggestion though.

hmm... Here we combine the test and example functions. I do not have strong preference about whether this should live here or in tests file. @jkbradley

I agree with @holdenk . This may be too verbose for a doctest. We can move the temp directory setup in test preparation (where we initialize sqlContext) and clean up. We can do that in a separate PR. @holdenk Could you create a JIRA for it? Thanks!

Sure thing will do

On Thursday, February 11, 2016, Xiangrui Meng notifications@github.com
wrote:

In python/pyspark/ml/clustering.py
#10999 (comment):

@@ -69,6 +70,25 @@ class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol
True
>>> rows[2].prediction == rows[3].prediction
True

import os, tempfile

I agree with @holdenk https://github.com/holdenk . This may be too
verbose for a doctest. We can move the temp directory setup in test
preparation (where we initialize sqlContext) and clean up. We can do that
in a separate PR. @holdenk https://github.com/holdenk Could you create
a JIRA for it? Thanks!

—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/10999/files#r52688040.

Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

mengxr · 2016-02-11T23:58:20Z

LGTM. Merged into master. Thanks!

… outside of the doctests Some of the new doctests in ml/clustering.py have a lot of setup code, move the setup code to the general test init to keep the doctest more example-style looking. In part this is a follow up to #10999 Note that the same pattern is followed in regression & recommendation - might as well clean up all three at the same time. Author: Holden Karau <holden@us.ibm.com> Closes #11197 from holdenk/SPARK-13302-cleanup-doctests-in-ml-clustering.

PySpark ml.clustering support export/import

dffafbf

holdenk reviewed Feb 1, 2016
View reviewed changes

asfgit closed this in 30e0095 Feb 11, 2016

yanboliang deleted the spark-13035 branch February 12, 2016 08:11

This was referenced Feb 13, 2016

[SPARK-13302][PYSPARK][TESTS] Move the temp file creation and cleanup outside of the doctests #11197

Closed

[SPARK-13036][SPARK-13318][SPARK-13319] Add save/load for feature.py #11203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13035] [ML] [PySpark] PySpark ml.clustering support export/import #10999

[SPARK-13035] [ML] [PySpark] PySpark ml.clustering support export/import #10999

yanboliang commented Jan 31, 2016

SparkQA commented Jan 31, 2016

holdenk Feb 1, 2016

yanboliang Feb 1, 2016

mengxr Feb 11, 2016

holdenk Feb 12, 2016

mengxr commented Feb 11, 2016

[SPARK-13035] [ML] [PySpark] PySpark ml.clustering support export/import #10999

[SPARK-13035] [ML] [PySpark] PySpark ml.clustering support export/import #10999

Conversation

yanboliang commented Jan 31, 2016

SparkQA commented Jan 31, 2016

holdenk Feb 1, 2016

Choose a reason for hiding this comment

yanboliang Feb 1, 2016

Choose a reason for hiding this comment

mengxr Feb 11, 2016

Choose a reason for hiding this comment

holdenk Feb 12, 2016

Choose a reason for hiding this comment

mengxr commented Feb 11, 2016