[SPARK-16710] [SparkR] [ML] spark.glm should support weightCol #14346

yanboliang · 2016-07-25T13:35:44Z

What changes were proposed in this pull request?

Training GLMs on weighted dataset is very important use cases, but it is not supported by SparkR currently. Users can pass argument weights to specify the weights vector in native R. For spark.glm, we can pass in the weightCol which is consistent with MLlib.

How was this patch tested?

Unit test.

SparkQA · 2016-07-25T14:27:46Z

Test build #62824 has finished for PR 14346 at commit 4e92737.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-07-28T13:27:22Z

cc @mengxr

mengxr · 2016-07-30T00:07:21Z

cc: @junyangq

felixcheung · 2016-07-30T02:28:51Z

R/pkg/R/mllib.R

@@ -119,7 +121,7 @@ NULL
 #' @note spark.glm since 2.0.0
 #' @seealso \link{glm}, \link{read.ml}
 setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
-          function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25) {
+          function(data, formula, family = gaussian, weightCol = NULL, tol = 1e-6, maxIter = 25) {


you might not want to add a parameter in the middle of the list. if someone has existing code calling this function in parameter order it might misalign

SparkQA · 2016-08-01T16:49:54Z

Test build #63081 has finished for PR 14346 at commit 5f96b6e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

junyangq · 2016-08-01T18:55:21Z

LGTM

felixcheung · 2016-08-09T17:33:29Z

LGTM

yanboliang · 2016-08-10T14:41:49Z

ping @mengxr @shivaram

shivaram · 2016-08-10T17:53:32Z

LGTM. Merging this to master.

@mengxr @felixcheung I didn't merge this into branch-2.0 as having Scala + R changes could affect the CRAN package we are building to match the 2.0 release. We can do a round of backports after that is done if required.

spark.glm should support weightCol

4e92737

felixcheung reviewed Jul 30, 2016
View reviewed changes

Move weightCol to the end of arguments.

5f96b6e

asfgit closed this in d4a9122 Aug 10, 2016

yanboliang deleted the spark-16710 branch August 11, 2016 01:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16710] [SparkR] [ML] spark.glm should support weightCol #14346

[SPARK-16710] [SparkR] [ML] spark.glm should support weightCol #14346

yanboliang commented Jul 25, 2016 •

edited

SparkQA commented Jul 25, 2016

yanboliang commented Jul 28, 2016

mengxr commented Jul 30, 2016

felixcheung Jul 30, 2016

SparkQA commented Aug 1, 2016

junyangq commented Aug 1, 2016

felixcheung commented Aug 9, 2016

yanboliang commented Aug 10, 2016

shivaram commented Aug 10, 2016

[SPARK-16710] [SparkR] [ML] spark.glm should support weightCol #14346

[SPARK-16710] [SparkR] [ML] spark.glm should support weightCol #14346

Conversation

yanboliang commented Jul 25, 2016 • edited

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jul 25, 2016

yanboliang commented Jul 28, 2016

mengxr commented Jul 30, 2016

felixcheung Jul 30, 2016

Choose a reason for hiding this comment

SparkQA commented Aug 1, 2016

junyangq commented Aug 1, 2016

felixcheung commented Aug 9, 2016

yanboliang commented Aug 10, 2016

shivaram commented Aug 10, 2016

yanboliang commented Jul 25, 2016 •

edited