[SPARK-5988][MLlib] add save/load for PowerIterationClusteringModel #5450

yinxusen · 2015-04-10T09:01:14Z

See JIRA issue SPARK-5988.

SparkQA · 2015-04-10T10:32:00Z

Test build #30022 has finished for PR 5450 at commit b1dd24c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

mengxr · 2015-04-13T06:22:18Z

mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala

+        ("class" -> thisClassName) ~ ("version" -> thisFormatVersion) ~ ("k" -> model.k)))
+      sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path))
+
+      val dataRDD = model.assignments.map(x => (x.id, x.cluster)).toDF()


assignment.toDF() should be sufficient and correct. Otherwise, the output column names would be "_1" and "_2".

assignment is not a case class, so we cannot call toDF() directly. Shall I change Assignment into a case class?

Then call toDF("id", "cluster").

It seems that changing Assignment into a case class has nothing wrong, so I change it. Otherwise I will write more code to check the schema, for the Loader.checkSchema[]()is not worked for an ordinary class.

That's okay. In general, the issue with case classes is that they are hard to extend. For example, changing case class Assignment(id: Long, cluster: Int) to case class Assignment(id: Long, cluster: Int, confidence: Double) breaks binary compatibility.

SparkQA · 2015-04-13T11:31:32Z

Test build #30156 has finished for PR 5450 at commit cb1ecfa.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Assignment(id: Long, cluster: Int)
This patch does not change any dependencies.

mengxr · 2015-04-13T18:53:33Z

LGTM. Merged into master. Thanks!

yinxusen added 2 commits April 10, 2015 16:13

add save load for power iteration clustering

63c3923

add test suite

b1dd24c

mengxr reviewed Apr 13, 2015
View reviewed changes

change Assignment into case class

cb1ecfa

asfgit closed this in 1e340c3 Apr 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-5988][MLlib] add save/load for PowerIterationClusteringModel #5450

[SPARK-5988][MLlib] add save/load for PowerIterationClusteringModel #5450

yinxusen commented Apr 10, 2015

SparkQA commented Apr 10, 2015

mengxr Apr 13, 2015

yinxusen Apr 13, 2015

mengxr Apr 13, 2015

yinxusen Apr 13, 2015

mengxr Apr 13, 2015

SparkQA commented Apr 13, 2015

mengxr commented Apr 13, 2015

[SPARK-5988][MLlib] add save/load for PowerIterationClusteringModel #5450

[SPARK-5988][MLlib] add save/load for PowerIterationClusteringModel #5450

Conversation

yinxusen commented Apr 10, 2015

SparkQA commented Apr 10, 2015

mengxr Apr 13, 2015

Choose a reason for hiding this comment

yinxusen Apr 13, 2015

Choose a reason for hiding this comment

mengxr Apr 13, 2015

Choose a reason for hiding this comment

yinxusen Apr 13, 2015

Choose a reason for hiding this comment

mengxr Apr 13, 2015

Choose a reason for hiding this comment

SparkQA commented Apr 13, 2015

mengxr commented Apr 13, 2015