Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11940][PYSPARK][ML] Python API for ml.clustering.LDA PR2 #12723

Closed
wants to merge 9 commits into from

Conversation

jkbradley
Copy link
Member

What changes were proposed in this pull request?

pyspark.ml API for LDA

  • LDA, LDAModel, LocalLDAModel, DistributedLDAModel
  • includes persistence

This replaces [https://github.com//pull/10242]

How was this patch tested?

  • doc test for LDA, including Param setters
  • unit test for persistence

@jkbradley
Copy link
Member Author

CC: @yanboliang Would you have time to take a look at this PR? Thanks!

@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57081 has finished for PR 12723 at commit 4f807e8.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DistributedLDAModel(LDAModel, JavaMLReadable, JavaMLWritable):
    • class LocalLDAModel(LDAModel, JavaMLReadable, JavaMLWritable):
    • class LDA(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed, HasCheckpointInterval,

pass


class LDA(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed, HasCheckpointInterval,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inherit_doc

@yanboliang
Copy link
Contributor

Made one pass, thanks.

@jkbradley
Copy link
Member Author

@yanboliang Thanks! I think I addressed everything so far.

@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57163 has finished for PR 12723 at commit 46979e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

I saw you put save/load tests at tests.py rather than in doctest. I think the doctest is not only unit tests but also used for illustrating how to use this Estimator/Model. It's better to have simple test for save/load in doc test. LGTM otherwise. Thanks!

@jkbradley
Copy link
Member Author

I've wondered about the save/load in doc tests. On the one hand, it's nice to have that example, but on the other hand, it's going to be an extra save/load test for every run of the Jenkins tests (once we beef up the actual unit tests). I'll add save/load for now but we should re-evaluate in the future.

@jkbradley
Copy link
Member Author

Updated!

@SparkQA
Copy link

SparkQA commented Apr 28, 2016

Test build #57268 has finished for PR 12723 at commit f37c1c1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member Author

test this please

@SparkQA
Copy link

SparkQA commented Apr 28, 2016

Test build #57281 has finished for PR 12723 at commit f37c1c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member Author

Ready now?

@yanboliang
Copy link
Contributor

LGTM, thanks!

@jkbradley
Copy link
Member Author

Thanks @yanboliang and @zjffdu !
Merging with master

@asfgit asfgit closed this in 775772d Apr 29, 2016
@jkbradley jkbradley deleted the zjffdu-SPARK-11940 branch April 29, 2016 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants