Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapper in SparkR #14392

Closed
wants to merge 10 commits into from

Conversation

yanboliang
Copy link
Contributor

What changes were proposed in this pull request?

Gaussian Mixture Model wrapper in SparkR, similarly to R's mvnormalmixEM.

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented Jul 28, 2016

Test build #62968 has finished for PR 14392 at commit 96e50a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

yanboliang commented Jul 28, 2016

cc @mengxr @felixcheung

@SparkQA
Copy link

SparkQA commented Jul 29, 2016

Test build #63004 has finished for PR 14392 at commit 230672f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

#' @rdname spark.mvnormalmixEM
#' @name spark.mvnormalmixEM
#' @export
#' @examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a @Seealso to R mvnormalmixEM?
ah - I see it below, nit: could you more that up?

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63082 has finished for PR 14392 at commit 620e7f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2016

Test build #63124 has finished for PR 14392 at commit 6bc6d96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

@felixcheung @junyangq Any thoughts?

@felixcheung
Copy link
Member

btw, I think it'll be great to get some feedback on the naming of this.
As per SPARK-14831, should we go with a more Spark specific name like gaussianmixture rather than a R one? How well known is mvnormalmixEM? How close is the Spark implementation to that?

@shivaram
Copy link
Contributor

Yeah I am not sure mvnormalmixEM is very descriptive. @junyangq Any opinions on the name here ?

@yanboliang
Copy link
Contributor Author

yanboliang commented Aug 15, 2016

@felixcheung @shivaram @junyangq I changed the name to spark.gaussianMixture following other SparkR ML wrappers such as spark.naiveBayes. The Spark implementation is very similar with R mvnormalmixEM, but I agree it's less descriptive as @shivaram said. Any more comments, please feel free to let me know. Thanks!

@SparkQA
Copy link

SparkQA commented Aug 15, 2016

Test build #63771 has finished for PR 14392 at commit 90fcb79.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

spark.gaussianMixture sounds good to me.

#' @export
setGeneric("spark.gaussianMixture",
function(data, formula, ...) {
standardGeneric("spark.gaussianMixture")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it fit one line, like the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can not fit one line, since lint-r requires lines should not be more than 100 characters.

@SparkQA
Copy link

SparkQA commented Aug 16, 2016

Test build #63825 has finished for PR 14392 at commit cc708b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 16, 2016

Test build #63827 has finished for PR 14392 at commit 05afe23.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

#' @param object A fitted gaussian mixture model
#' @return \code{summary} returns the model's lambda, mu, sigma and posterior
#' @param object a fitted gaussian mixture model.
#' @param ... additional argument(s) passed to the method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "Currently not used" instead for this case.

@felixcheung
Copy link
Member

only one last comment, LGTM.

@SparkQA
Copy link

SparkQA commented Aug 16, 2016

Test build #63836 has finished for PR 14392 at commit 0abdecf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@junyangq
Copy link
Contributor

Sorry for being late. I would vote for spark.gaussianMixture. I don't think the name mvnormalmixEM itself is well known enough except for people who frequently use that.

#' @examples
#' \dontrun{
#' sparkR.session()
#' library(mvtnorm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be concerned about the fact that the package is not in the package dependency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like it's only needed to build sample data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think it's OK since users can load other library in SparkR session, and this is not necessary if users have their own dataset.

0.1641373, -0.1673806, -0.1673806, 0.7508951)
expect_equal(stats$lambda, rLambda)
expect_equal(as.vector(unlist(stats$mu)), rMu, tolerance = 1e-3)
expect_equal(as.vector(unlist(stats$sigma)), rSigma, tolerance = 1e-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@SparkQA
Copy link

SparkQA commented Aug 17, 2016

Test build #63923 has finished for PR 14392 at commit 2f8930e.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 17, 2016

Test build #63925 has finished for PR 14392 at commit da464f9.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 17, 2016

Test build #63927 has finished for PR 14392 at commit da3c549.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

@felixcheung @shivaram I resolved the merge conflicts. Please let me know whether it's ok to go. Since this involves changes of many files and it's hard to solve conflicts, it's better we can have a high priority to get this in. Thanks!

@mengxr
Copy link
Contributor

mengxr commented Aug 17, 2016

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 4d92af3 Aug 17, 2016
@yanboliang yanboliang deleted the spark-16446 branch August 18, 2016 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants