Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5988][MLlib] add save/load for PowerIterationClusteringModel #5450

Closed
wants to merge 3 commits into from

Conversation

yinxusen
Copy link
Contributor

See JIRA issue SPARK-5988.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30022 has finished for PR 5450 at commit b1dd24c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

("class" -> thisClassName) ~ ("version" -> thisFormatVersion) ~ ("k" -> model.k)))
sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path))

val dataRDD = model.assignments.map(x => (x.id, x.cluster)).toDF()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assignment.toDF() should be sufficient and correct. Otherwise, the output column names would be "_1" and "_2".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assignment is not a case class, so we cannot call toDF() directly. Shall I change Assignment into a case class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then call toDF("id", "cluster").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that changing Assignment into a case class has nothing wrong, so I change it. Otherwise I will write more code to check the schema, for the Loader.checkSchema[]()is not worked for an ordinary class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay. In general, the issue with case classes is that they are hard to extend. For example, changing case class Assignment(id: Long, cluster: Int) to case class Assignment(id: Long, cluster: Int, confidence: Double) breaks binary compatibility.

@SparkQA
Copy link

SparkQA commented Apr 13, 2015

Test build #30156 has finished for PR 5450 at commit cb1ecfa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Assignment(id: Long, cluster: Int)
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 13, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 1e340c3 Apr 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants