Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16710] [SparkR] [ML] spark.glm should support weightCol #14346

Closed
wants to merge 2 commits into from

Conversation

yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Jul 25, 2016

What changes were proposed in this pull request?

Training GLMs on weighted dataset is very important use cases, but it is not supported by SparkR currently. Users can pass argument weights to specify the weights vector in native R. For spark.glm, we can pass in the weightCol which is consistent with MLlib.

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #62824 has finished for PR 14346 at commit 4e92737.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

cc @mengxr

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2016

cc: @junyangq

@@ -119,7 +121,7 @@ NULL
#' @note spark.glm since 2.0.0
#' @seealso \link{glm}, \link{read.ml}
setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25) {
function(data, formula, family = gaussian, weightCol = NULL, tol = 1e-6, maxIter = 25) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might not want to add a parameter in the middle of the list. if someone has existing code calling this function in parameter order it might misalign

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63081 has finished for PR 14346 at commit 5f96b6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@junyangq
Copy link
Contributor

junyangq commented Aug 1, 2016

LGTM

1 similar comment
@felixcheung
Copy link
Member

LGTM

@yanboliang
Copy link
Contributor Author

ping @mengxr @shivaram

@shivaram
Copy link
Contributor

LGTM. Merging this to master.

@mengxr @felixcheung I didn't merge this into branch-2.0 as having Scala + R changes could affect the CRAN package we are building to match the 2.0 release. We can do a round of backports after that is done if required.

@asfgit asfgit closed this in d4a9122 Aug 10, 2016
@yanboliang yanboliang deleted the spark-16710 branch August 11, 2016 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants