[SPARK-18865][SparkR]:SparkR vignettes MLP and LDA updates #16284

wangmiao1981 · 2016-12-14T23:15:27Z

What changes were proposed in this pull request?

When do the QA work, I found that the following issues:

1). spark.mlp doesn't include an example;
2). spark.mlp and spark.lda have redundant parameter explanations;
3). spark.lda document misses default values for some parameters.

I also changed the spark.logit regParam in the examples, as we discussed in #16222.

How was this patch tested?

Manual test

felixcheung · 2016-12-14T23:21:43Z

R/pkg/R/mllib.R

@@ -1218,11 +1218,11 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula
 #' @param data A SparkDataFrame for training.
 #' @param features Features column name. Either libSVM-format column or character-format column is
 #'        valid.
-#' @param k Number of topics.
-#' @param maxIter Maximum iterations.
+#' @param k Number of topics, default is 10.


we discussed this earlier - we should avoid having default value in the doc since it's in the function signature already, unless there is additional information or explanation we are adding.

ditto below.

OK. There are other places in the document that contains default values. Shall I remove them all?

that shouldn't be the case, do you have an example?

Please see the examples below.

felixcheung · 2016-12-14T23:22:41Z

could you include a screenshot of the vignettes output?

wangmiao1981 · 2016-12-14T23:23:44Z

I will attach the screenshot.

wangmiao1981 · 2016-12-14T23:29:08Z

wangmiao1981 · 2016-12-14T23:31:03Z

R/pkg/R/mllib.R

@@ -1233,7 +1233,7 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula
 #'        docConcentration. Only 1-size or \code{k}-size numeric is accepted.
 #' @param customizedStopWords stopwords that need to be removed from the given corpus. Ignore the
 #'        parameter if libSVM-format column is used as the features column.
-#' @param maxVocabSize maximum vocabulary size, default 1 << 18
+#' @param maxVocabSize maximum vocabulary size, default is 1 << 18


This is one example of default. See the deleted line.

Random forest:
#' @param impurity Criterion used for information gain calculation. #' For regression, must be "variance". For classification, must be one of #' "entropy" and "gini", default is "gini".

Gradient Boosted Tree Model for Regression and Classification
#' @param lossType Loss function which GBT tries to minimize. #' For classification, must be "logistic". For regression, must be one of #' "squared" (L2) and "absolute" (L1), default is "squared".

they are because they don't have an actual single value listed in the function signature (signature has = NULL).
also default is chosen based on another parameter ("type") and we should explain.

maxVocabSize maximum vocabulary size, default 1 << 18 could still be useful since the value in the signature is bitwShiftL(1, 18)

you are welcome to remove this if you think it's obvious

Got it. Let me go through the document again based on the rules.

let's stick to vignettes for now. I'm sure there's more clean up we could do later.

wangmiao1981

See the two examples

wangmiao1981 · 2016-12-14T23:37:21Z

LDA output. The document link ?spark.lda doesn't work in my browser. How to refer API document in the vignettes?

felixcheung · 2016-12-14T23:41:22Z

R/pkg/R/mllib.R

@@ -1233,7 +1233,7 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula
 #'        docConcentration. Only 1-size or \code{k}-size numeric is accepted.
 #' @param customizedStopWords stopwords that need to be removed from the given corpus. Ignore the
 #'        parameter if libSVM-format column is used as the features column.
-#' @param maxVocabSize maximum vocabulary size, default 1 << 18
+#' @param maxVocabSize maximum vocabulary size, default is 1 << 18


they are because they don't have an actual single value listed in the function signature (signature has = NULL).
also default is chosen based on another parameter ("type") and we should explain.

felixcheung · 2016-12-14T23:49:43Z

?spark.lda is for your REPL, your R shell or RStudio when SparkR package is loaded.
it's not a link

felixcheung · 2016-12-14T23:51:38Z

do you have screen shot of logit?

wangmiao1981 · 2016-12-14T23:53:40Z

Logit

felixcheung · 2016-12-14T23:53:51Z

R/pkg/R/mllib.R

@@ -717,7 +716,7 @@ setMethod("predict", signature(object = "KMeansModel"),
 #' @param regParam the regularization parameter.
 #' @param elasticNetParam the ElasticNet mixing parameter. For alpha = 0.0, the penalty is an L2 penalty.
 #'                        For alpha = 1.0, it is an L1 penalty. For 0.0 < alpha < 1.0, the penalty is a combination
-#'                        of L1 and L2. Default is 0.0 which is an L2 penalty.


since there is an explanation I wouldn't mind keeping it.
ditto the next one

felixcheung · 2016-12-14T23:57:47Z

mlp example output is a bit long. Could you truncate the output like it is done here

SparkQA · 2016-12-15T00:01:02Z

Test build #70152 has finished for PR 16284 at commit 74b2bb9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-15T00:02:21Z

Test build #70151 has finished for PR 16284 at commit 6a9ded1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-12-15T00:19:40Z

looks good, would be good to check the output for mlp and logit if you could add screenshot

wangmiao1981 · 2016-12-15T00:24:16Z

mlp

wangmiao1981 · 2016-12-15T00:24:34Z

logit

wangmiao1981 · 2016-12-15T00:24:49Z

lda

SparkQA · 2016-12-15T00:38:52Z

Test build #70157 has finished for PR 16284 at commit c474025.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-15T00:48:30Z

Test build #70160 has finished for PR 16284 at commit 63a3d51.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-15T00:54:27Z

Test build #70158 has finished for PR 16284 at commit 97b77a8.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-15T00:59:30Z

Test build #70162 has finished for PR 16284 at commit 930c4c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-12-15T01:06:31Z

LGTM.

## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in #16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16284 from wangmiao1981/ks. (cherry picked from commit 3243885) Signed-off-by: Felix Cheung <felixcheung@apache.org>

felixcheung · 2016-12-15T01:08:23Z

Merging with master and branch-2.1
thanks @wangmiao1981

## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in apache#16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16284 from wangmiao1981/ks.

wangmiao1981 added 3 commits December 14, 2016 14:59

remove repeated contents and add example

a33262f

update dataset

6a9ded1

remove extra blank line

74b2bb9

felixcheung reviewed Dec 14, 2016

View reviewed changes

wangmiao1981 commented Dec 14, 2016

View reviewed changes

wangmiao1981 closed this Dec 14, 2016

wangmiao1981 reopened this Dec 14, 2016

felixcheung requested changes Dec 14, 2016

View reviewed changes

update default

c474025

felixcheung reviewed Dec 14, 2016

View reviewed changes

revert document change

97b77a8

wangmiao1981 added 3 commits December 14, 2016 16:02

revert file

4119d4e

revert default

63a3d51

short the output length

930c4c3

asfgit closed this in 3243885 Dec 15, 2016

felixcheung mentioned this pull request Dec 15, 2016

[SPARK-18849][ML][SPARKR][DOC] vignettes final check update #16286

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-18865][SparkR]:SparkR vignettes MLP and LDA updates #16284

[SPARK-18865][SparkR]:SparkR vignettes MLP and LDA updates #16284

wangmiao1981 commented Dec 14, 2016

felixcheung Dec 14, 2016

wangmiao1981 Dec 14, 2016

felixcheung Dec 14, 2016

wangmiao1981 Dec 14, 2016

felixcheung commented Dec 14, 2016

wangmiao1981 commented Dec 14, 2016

wangmiao1981 commented Dec 14, 2016

wangmiao1981 Dec 14, 2016

wangmiao1981 Dec 14, 2016

felixcheung Dec 14, 2016 •

edited

felixcheung Dec 14, 2016 •

edited

wangmiao1981 Dec 14, 2016

felixcheung Dec 14, 2016 •

edited

wangmiao1981 left a comment

wangmiao1981 commented Dec 14, 2016 •

edited

felixcheung Dec 14, 2016 •

edited

felixcheung commented Dec 14, 2016 •

edited

felixcheung commented Dec 14, 2016

wangmiao1981 commented Dec 14, 2016

felixcheung Dec 14, 2016

felixcheung commented Dec 14, 2016

SparkQA commented Dec 15, 2016

SparkQA commented Dec 15, 2016

felixcheung commented Dec 15, 2016

wangmiao1981 commented Dec 15, 2016

wangmiao1981 commented Dec 15, 2016

wangmiao1981 commented Dec 15, 2016

SparkQA commented Dec 15, 2016

SparkQA commented Dec 15, 2016

SparkQA commented Dec 15, 2016

SparkQA commented Dec 15, 2016

felixcheung commented Dec 15, 2016

felixcheung commented Dec 15, 2016

[SPARK-18865][SparkR]:SparkR vignettes MLP and LDA updates #16284

[SPARK-18865][SparkR]:SparkR vignettes MLP and LDA updates #16284

Conversation

wangmiao1981 commented Dec 14, 2016

What changes were proposed in this pull request?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixcheung commented Dec 14, 2016

wangmiao1981 commented Dec 14, 2016

wangmiao1981 commented Dec 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixcheung Dec 14, 2016 • edited

Choose a reason for hiding this comment

felixcheung Dec 14, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixcheung Dec 14, 2016 • edited

Choose a reason for hiding this comment

wangmiao1981 left a comment

Choose a reason for hiding this comment

wangmiao1981 commented Dec 14, 2016 • edited

felixcheung Dec 14, 2016 • edited

Choose a reason for hiding this comment

felixcheung commented Dec 14, 2016 • edited

felixcheung commented Dec 14, 2016

wangmiao1981 commented Dec 14, 2016

Choose a reason for hiding this comment

felixcheung commented Dec 14, 2016

SparkQA commented Dec 15, 2016

SparkQA commented Dec 15, 2016

felixcheung commented Dec 15, 2016

wangmiao1981 commented Dec 15, 2016

wangmiao1981 commented Dec 15, 2016

wangmiao1981 commented Dec 15, 2016

SparkQA commented Dec 15, 2016

SparkQA commented Dec 15, 2016

SparkQA commented Dec 15, 2016

SparkQA commented Dec 15, 2016

felixcheung commented Dec 15, 2016

felixcheung commented Dec 15, 2016

felixcheung Dec 14, 2016 •

edited

felixcheung Dec 14, 2016 •

edited

felixcheung Dec 14, 2016 •

edited

wangmiao1981 commented Dec 14, 2016 •

edited

felixcheung Dec 14, 2016 •

edited

felixcheung commented Dec 14, 2016 •

edited