Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18865][SparkR]:SparkR vignettes MLP and LDA updates #16284

Closed
wants to merge 8 commits into from

Conversation

wangmiao1981
Copy link
Contributor

What changes were proposed in this pull request?

When do the QA work, I found that the following issues:

1). spark.mlp doesn't include an example;
2). spark.mlp and spark.lda have redundant parameter explanations;
3). spark.lda document misses default values for some parameters.

I also changed the spark.logit regParam in the examples, as we discussed in #16222.

How was this patch tested?

Manual test

@@ -1218,11 +1218,11 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula
#' @param data A SparkDataFrame for training.
#' @param features Features column name. Either libSVM-format column or character-format column is
#' valid.
#' @param k Number of topics.
#' @param maxIter Maximum iterations.
#' @param k Number of topics, default is 10.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed this earlier - we should avoid having default value in the doc since it's in the function signature already, unless there is additional information or explanation we are adding.

ditto below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. There are other places in the document that contains default values. Shall I remove them all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that shouldn't be the case, do you have an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the examples below.

@felixcheung
Copy link
Member

could you include a screenshot of the vignettes output?

@wangmiao1981
Copy link
Contributor Author

I will attach the screenshot.

@wangmiao1981
Copy link
Contributor Author

mlp

@@ -1233,7 +1233,7 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula
#' docConcentration. Only 1-size or \code{k}-size numeric is accepted.
#' @param customizedStopWords stopwords that need to be removed from the given corpus. Ignore the
#' parameter if libSVM-format column is used as the features column.
#' @param maxVocabSize maximum vocabulary size, default 1 << 18
#' @param maxVocabSize maximum vocabulary size, default is 1 << 18
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one example of default. See the deleted line.

Random forest:
#' @param impurity Criterion used for information gain calculation. #' For regression, must be "variance". For classification, must be one of #' "entropy" and "gini", default is "gini".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradient Boosted Tree Model for Regression and Classification
#' @param lossType Loss function which GBT tries to minimize. #' For classification, must be "logistic". For regression, must be one of #' "squared" (L2) and "absolute" (L1), default is "squared".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are because they don't have an actual single value listed in the function signature (signature has = NULL).
also default is chosen based on another parameter ("type") and we should explain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxVocabSize maximum vocabulary size, default 1 << 18 could still be useful since the value in the signature is bitwShiftL(1, 18)

you are welcome to remove this if you think it's obvious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let me go through the document again based on the rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's stick to vignettes for now. I'm sure there's more clean up we could do later.

Copy link
Contributor Author

@wangmiao1981 wangmiao1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the two examples

@wangmiao1981
Copy link
Contributor Author

wangmiao1981 commented Dec 14, 2016

lda
LDA output. The document link ?spark.lda doesn't work in my browser. How to refer API document in the vignettes?

@@ -1233,7 +1233,7 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula
#' docConcentration. Only 1-size or \code{k}-size numeric is accepted.
#' @param customizedStopWords stopwords that need to be removed from the given corpus. Ignore the
#' parameter if libSVM-format column is used as the features column.
#' @param maxVocabSize maximum vocabulary size, default 1 << 18
#' @param maxVocabSize maximum vocabulary size, default is 1 << 18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are because they don't have an actual single value listed in the function signature (signature has = NULL).
also default is chosen based on another parameter ("type") and we should explain.

@felixcheung
Copy link
Member

felixcheung commented Dec 14, 2016

?spark.lda is for your REPL, your R shell or RStudio when SparkR package is loaded.
it's not a link

@felixcheung
Copy link
Member

do you have screen shot of logit?

@wangmiao1981
Copy link
Contributor Author

logit

Logit

@@ -717,7 +716,7 @@ setMethod("predict", signature(object = "KMeansModel"),
#' @param regParam the regularization parameter.
#' @param elasticNetParam the ElasticNet mixing parameter. For alpha = 0.0, the penalty is an L2 penalty.
#' For alpha = 1.0, it is an L1 penalty. For 0.0 < alpha < 1.0, the penalty is a combination
#' of L1 and L2. Default is 0.0 which is an L2 penalty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there is an explanation I wouldn't mind keeping it.
ditto the next one

@felixcheung
Copy link
Member

mlp example output is a bit long. Could you truncate the output like it is done here

@SparkQA
Copy link

SparkQA commented Dec 15, 2016

Test build #70152 has finished for PR 16284 at commit 74b2bb9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 15, 2016

Test build #70151 has finished for PR 16284 at commit 6a9ded1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

looks good, would be good to check the output for mlp and logit if you could add screenshot

@wangmiao1981
Copy link
Contributor Author

mlp
mlp

@wangmiao1981
Copy link
Contributor Author

logit
logit

@wangmiao1981
Copy link
Contributor Author

lda
lda

@SparkQA
Copy link

SparkQA commented Dec 15, 2016

Test build #70157 has finished for PR 16284 at commit c474025.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 15, 2016

Test build #70160 has finished for PR 16284 at commit 63a3d51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 15, 2016

Test build #70158 has finished for PR 16284 at commit 97b77a8.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 15, 2016

Test build #70162 has finished for PR 16284 at commit 930c4c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

LGTM.

asfgit pushed a commit that referenced this pull request Dec 15, 2016
## What changes were proposed in this pull request?

When do the QA work, I found that the following issues:

1). `spark.mlp` doesn't include an example;
2). `spark.mlp` and `spark.lda` have redundant parameter explanations;
3). `spark.lda` document misses default values for some parameters.

I also changed the `spark.logit` regParam in the examples, as we discussed in #16222.

## How was this patch tested?

Manual test

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16284 from wangmiao1981/ks.

(cherry picked from commit 3243885)
Signed-off-by: Felix Cheung <felixcheung@apache.org>
@felixcheung
Copy link
Member

Merging with master and branch-2.1
thanks @wangmiao1981

@asfgit asfgit closed this in 3243885 Dec 15, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
## What changes were proposed in this pull request?

When do the QA work, I found that the following issues:

1). `spark.mlp` doesn't include an example;
2). `spark.mlp` and `spark.lda` have redundant parameter explanations;
3). `spark.lda` document misses default values for some parameters.

I also changed the `spark.logit` regParam in the examples, as we discussed in apache#16222.

## How was this patch tested?

Manual test

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes apache#16284 from wangmiao1981/ks.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

When do the QA work, I found that the following issues:

1). `spark.mlp` doesn't include an example;
2). `spark.mlp` and `spark.lda` have redundant parameter explanations;
3). `spark.lda` document misses default values for some parameters.

I also changed the `spark.logit` regParam in the examples, as we discussed in apache#16222.

## How was this patch tested?

Manual test

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes apache#16284 from wangmiao1981/ks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants