[SPARK-13672] [ML] Add python examples of BisectingKMeans in ML and MLLIB #11515

zhengruifeng · 2016-03-04T08:26:02Z

JIRA: https://issues.apache.org/jira/browse/SPARK-13672

What changes were proposed in this pull request?

add two python examples of BisectingKMeans for ml and mllib

How was this patch tested?

manual tests

SparkQA · 2016-03-04T08:29:12Z

Test build #52455 has finished for PR 11515 at commit 5ed2a47.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-04T09:29:49Z

Test build #52456 has finished for PR 11515 at commit e6da291.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-04T09:30:54Z

Test build #52457 has finished for PR 11515 at commit ebce780.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-03-09T10:07:34Z

examples/src/main/python/mllib/bisecting_k_means_example.py

+    parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')]))
+
+    # Build the model (cluster the data)
+    clusters = BisectingKMeans.train(parsedData, 2, maxIterations=5)


While trying to run this, got an exception:

TypeError: unbound method train() must be called with BisectingKMeans instance as first argument (got PipelinedRDD instance instead)

train is missing a @classmethod annotation here. You can just add that in this PR.

the annotation was added

SparkQA · 2016-03-09T13:26:04Z

Test build #52751 has finished for PR 11515 at commit cea8ddf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-03-10T08:53:00Z

examples/src/main/python/ml/bisecting_k_means_example.py

+    sqlContext = SQLContext(sc)
+
+    # $example on$
+    training = sqlContext.createDataFrame([


Could we make this example more consistent with the style of the other one (and the ML kmeans example):

from pyspark.sql.types import Row from pyspark.mllib.linalg import Vectors ... data = sc.textFile("data/mllib/kmeans_data.txt") parsedData = data.map(lambda line: Row(features=Vectors.dense([float(x) for x in line.split(' ')]))) training = sqlContext.createDataFrame(parsedData) ...

MLnick · 2016-03-10T08:59:39Z

Please add an include_example for the Python example in mllib-clustering.md

zhengruifeng · 2016-03-11T03:16:07Z

@MLnick I have add an include_example in mllib-clustering.md. And some changes were make according to your commentations.

SparkQA · 2016-03-11T03:19:10Z

Test build #52890 has finished for PR 11515 at commit 399290c.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-11T03:49:11Z

Test build #52892 has finished for PR 11515 at commit d441511.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-11T04:18:42Z

Test build #52894 has finished for PR 11515 at commit 165a4fe.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zhengruifeng · 2016-03-11T06:46:19Z

Jenkins test this please

SparkQA · 2016-03-11T07:02:18Z

Test build #52906 has finished for PR 11515 at commit 165a4fe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-03-11T07:23:26Z

Thanks! Merging to master.

JIRA: https://issues.apache.org/jira/browse/SPARK-13672 ## What changes were proposed in this pull request? add two python examples of BisectingKMeans for ml and mllib ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes apache#11515 from zhengruifeng/mllib_bkm_pe.

rmchurch · 2017-03-30T17:47:06Z

This example doesn't seem to work in Spark 2.0.0, and from the master mllib/clustering.py, I don't expect it to work in the most updated code either. Specifically, the Python BisectingKMeansModel class does not have a save method (the KMeansModel class does), so that the last three lines of the following code do not work:

# Build the model (cluster the data)
model = BisectingKMeans.train(parsedData, 2, maxIterations=5)

# Evaluate clustering
cost = model.computeCost(parsedData)
print("Bisecting K-means Cost = " + str(cost))

# Save and load model
path = "target/org/apache/spark/PythonBisectingKMeansExample/BisectingKMeansModel"
model.save(sc, path)
sameModel = BisectingKMeansModel.load(sc, path)

zhengruifeng · 2017-03-31T03:41:14Z

@rmchurch I think this bug has been resolved in #16515

MLnick reviewed Mar 9, 2016
View reviewed changes

zhengruifeng added 8 commits March 9, 2016 20:48

create bkm_example

0619127

update db

871c5c0

update path

948f50c

add to ml

31fead0

format

b05680f

add example off

6bce85f

format

be718be

del unnecessary dataset,import and add missing annotation

cea8ddf

zhengruifeng force-pushed the mllib_bkm_pe branch from ebce780 to cea8ddf Compare March 9, 2016 13:03

MLnick reviewed Mar 10, 2016
View reviewed changes

add include_example

399290c

zhengruifeng added 2 commits March 11, 2016 11:45

reformat

3ab7533

reformat

d441511

fix python style

165a4fe

asfgit closed this in d18276c Mar 11, 2016

zhengruifeng deleted the mllib_bkm_pe branch March 11, 2016 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13672] [ML] Add python examples of BisectingKMeans in ML and MLLIB #11515

[SPARK-13672] [ML] Add python examples of BisectingKMeans in ML and MLLIB #11515

zhengruifeng commented Mar 4, 2016

SparkQA commented Mar 4, 2016

SparkQA commented Mar 4, 2016

SparkQA commented Mar 4, 2016

MLnick Mar 9, 2016

zhengruifeng Mar 9, 2016

SparkQA commented Mar 9, 2016

MLnick Mar 10, 2016

MLnick commented Mar 10, 2016

zhengruifeng commented Mar 11, 2016

SparkQA commented Mar 11, 2016

SparkQA commented Mar 11, 2016

SparkQA commented Mar 11, 2016

zhengruifeng commented Mar 11, 2016

SparkQA commented Mar 11, 2016

MLnick commented Mar 11, 2016

rmchurch commented Mar 30, 2017

zhengruifeng commented Mar 31, 2017

[SPARK-13672] [ML] Add python examples of BisectingKMeans in ML and MLLIB #11515

[SPARK-13672] [ML] Add python examples of BisectingKMeans in ML and MLLIB #11515

Conversation

zhengruifeng commented Mar 4, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Mar 4, 2016

SparkQA commented Mar 4, 2016

SparkQA commented Mar 4, 2016

MLnick Mar 9, 2016

Choose a reason for hiding this comment

zhengruifeng Mar 9, 2016

Choose a reason for hiding this comment

SparkQA commented Mar 9, 2016

MLnick Mar 10, 2016

Choose a reason for hiding this comment

MLnick commented Mar 10, 2016

zhengruifeng commented Mar 11, 2016

SparkQA commented Mar 11, 2016

SparkQA commented Mar 11, 2016

SparkQA commented Mar 11, 2016

zhengruifeng commented Mar 11, 2016

SparkQA commented Mar 11, 2016

MLnick commented Mar 11, 2016

rmchurch commented Mar 30, 2017

zhengruifeng commented Mar 31, 2017