-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13672] [ML] Add python examples of BisectingKMeans in ML and MLLIB #11515
Conversation
Test build #52455 has finished for PR 11515 at commit
|
Test build #52456 has finished for PR 11515 at commit
|
Test build #52457 has finished for PR 11515 at commit
|
parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')])) | ||
|
||
# Build the model (cluster the data) | ||
clusters = BisectingKMeans.train(parsedData, 2, maxIterations=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While trying to run this, got an exception:
TypeError: unbound method train() must be called with BisectingKMeans instance as first argument (got PipelinedRDD instance instead)
train
is missing a @classmethod
annotation here. You can just add that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the annotation was added
ebce780
to
cea8ddf
Compare
Test build #52751 has finished for PR 11515 at commit
|
sqlContext = SQLContext(sc) | ||
|
||
# $example on$ | ||
training = sqlContext.createDataFrame([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this example more consistent with the style of the other one (and the ML kmeans example):
from pyspark.sql.types import Row
from pyspark.mllib.linalg import Vectors
...
data = sc.textFile("data/mllib/kmeans_data.txt")
parsedData = data.map(lambda line: Row(features=Vectors.dense([float(x) for x in line.split(' ')])))
training = sqlContext.createDataFrame(parsedData)
...
Please add an |
@MLnick I have add an include_example in mllib-clustering.md. And some changes were make according to your commentations. |
Test build #52890 has finished for PR 11515 at commit
|
Test build #52892 has finished for PR 11515 at commit
|
Test build #52894 has finished for PR 11515 at commit
|
Jenkins test this please |
Test build #52906 has finished for PR 11515 at commit
|
Thanks! Merging to master. |
JIRA: https://issues.apache.org/jira/browse/SPARK-13672 ## What changes were proposed in this pull request? add two python examples of BisectingKMeans for ml and mllib ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes apache#11515 from zhengruifeng/mllib_bkm_pe.
This example doesn't seem to work in Spark 2.0.0, and from the master mllib/clustering.py, I don't expect it to work in the most updated code either. Specifically, the Python BisectingKMeansModel class does not have a save method (the KMeansModel class does), so that the last three lines of the following code do not work:
|
JIRA: https://issues.apache.org/jira/browse/SPARK-13672
What changes were proposed in this pull request?
add two python examples of BisectingKMeans for ml and mllib
How was this patch tested?
manual tests