-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18865][SparkR]:SparkR vignettes MLP and LDA updates #16284
Conversation
@@ -1218,11 +1218,11 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula | |||
#' @param data A SparkDataFrame for training. | |||
#' @param features Features column name. Either libSVM-format column or character-format column is | |||
#' valid. | |||
#' @param k Number of topics. | |||
#' @param maxIter Maximum iterations. | |||
#' @param k Number of topics, default is 10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we discussed this earlier - we should avoid having default value in the doc since it's in the function signature already, unless there is additional information or explanation we are adding.
ditto below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. There are other places in the document that contains default values. Shall I remove them all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that shouldn't be the case, do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the examples below.
could you include a screenshot of the vignettes output? |
I will attach the screenshot. |
@@ -1233,7 +1233,7 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula | |||
#' docConcentration. Only 1-size or \code{k}-size numeric is accepted. | |||
#' @param customizedStopWords stopwords that need to be removed from the given corpus. Ignore the | |||
#' parameter if libSVM-format column is used as the features column. | |||
#' @param maxVocabSize maximum vocabulary size, default 1 << 18 | |||
#' @param maxVocabSize maximum vocabulary size, default is 1 << 18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one example of default. See the deleted line.
Random forest:
#' @param impurity Criterion used for information gain calculation. #' For regression, must be "variance". For classification, must be one of #' "entropy" and "gini", default is "gini".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gradient Boosted Tree Model for Regression and Classification
#' @param lossType Loss function which GBT tries to minimize. #' For classification, must be "logistic". For regression, must be one of #' "squared" (L2) and "absolute" (L1), default is "squared".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are because they don't have an actual single value listed in the function signature (signature has = NULL
).
also default is chosen based on another parameter ("type") and we should explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxVocabSize maximum vocabulary size, default 1 << 18
could still be useful since the value in the signature is bitwShiftL(1, 18)
you are welcome to remove this if you think it's obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Let me go through the document again based on the rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's stick to vignettes for now. I'm sure there's more clean up we could do later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the two examples
@@ -1233,7 +1233,7 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula | |||
#' docConcentration. Only 1-size or \code{k}-size numeric is accepted. | |||
#' @param customizedStopWords stopwords that need to be removed from the given corpus. Ignore the | |||
#' parameter if libSVM-format column is used as the features column. | |||
#' @param maxVocabSize maximum vocabulary size, default 1 << 18 | |||
#' @param maxVocabSize maximum vocabulary size, default is 1 << 18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are because they don't have an actual single value listed in the function signature (signature has = NULL
).
also default is chosen based on another parameter ("type") and we should explain.
|
do you have screen shot of logit? |
@@ -717,7 +716,7 @@ setMethod("predict", signature(object = "KMeansModel"), | |||
#' @param regParam the regularization parameter. | |||
#' @param elasticNetParam the ElasticNet mixing parameter. For alpha = 0.0, the penalty is an L2 penalty. | |||
#' For alpha = 1.0, it is an L1 penalty. For 0.0 < alpha < 1.0, the penalty is a combination | |||
#' of L1 and L2. Default is 0.0 which is an L2 penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since there is an explanation I wouldn't mind keeping it.
ditto the next one
mlp example output is a bit long. Could you truncate the output like it is done here |
Test build #70152 has finished for PR 16284 at commit
|
Test build #70151 has finished for PR 16284 at commit
|
looks good, would be good to check the output for mlp and logit if you could add screenshot |
Test build #70157 has finished for PR 16284 at commit
|
Test build #70160 has finished for PR 16284 at commit
|
Test build #70158 has finished for PR 16284 at commit
|
Test build #70162 has finished for PR 16284 at commit
|
LGTM. |
## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in #16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16284 from wangmiao1981/ks. (cherry picked from commit 3243885) Signed-off-by: Felix Cheung <felixcheung@apache.org>
Merging with master and branch-2.1 |
## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in apache#16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16284 from wangmiao1981/ks.
## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in apache#16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16284 from wangmiao1981/ks.
What changes were proposed in this pull request?
When do the QA work, I found that the following issues:
1).
spark.mlp
doesn't include an example;2).
spark.mlp
andspark.lda
have redundant parameter explanations;3).
spark.lda
document misses default values for some parameters.I also changed the
spark.logit
regParam in the examples, as we discussed in #16222.How was this patch tested?
Manual test