Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format #11186

Conversation

BryanCutler
Copy link
Member

Part of task for SPARK-11219 to make PySpark MLlib parameter description formatting consistent. This is for the fpm and recommendation modules.

Closes #10602
Closes #10897

@SparkQA
Copy link

SparkQA commented Feb 12, 2016

Test build #51195 has finished for PR 11186 at commit 744e37d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

itemsets.
:param minSupport:
The minimal support level of the sequential pattern, any
pattern appears more than (minSupport * size-of-the-dataset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this from appears -> appearing (or ... pattern that appears ...)

@MLnick
Copy link
Contributor

MLnick commented Feb 16, 2016

@BryanCutler made a quick pass. While we're doing the format change, we may as well make a few little doc clean ups as per my comments.

@BryanCutler
Copy link
Member Author

Thanks for taking a look @MLnick , I agree that we might as well try to clean up the docs as well. I made the corrections you suggested and some others that I found - also tried to sync up the scala side too.

ping @mengxr to take a look also

@BryanCutler
Copy link
Member Author

jenkins retest please

@BryanCutler
Copy link
Member Author

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Feb 17, 2016

Test build #51401 has finished for PR 11186 at commit 37de948.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*
* @param ratings RDD of (userID, productID, rating) pairs
* @param ratings RDD of [[Rating]] objects with userID, productID, and rating
* @param rank number of features to use
* @param iterations number of iterations of ALS (recommended: 10-20)
* @param lambda regularization factor (recommended: 0.01)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer regularization parameter rather than factor. Could we also change the PySpark doc from smoothing parameter -> regularization parameter, to be consistent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, I don't like the (recommended: 0.01) in the comment for lambda. This is the default for the method call without specifying lambda, but I certainly don't think this is the recommended regularization. Lambda can vary widely depending on the dataset characteristics, and should be selected in the usual manner (e.g. through cross-validation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions, I agree with what you said. Also, the recommendation of the number of iterations is also a bit strange. I can see putting out a ballpark number so people don't put a huge number, since it can converge quickly and take a bit longer relative to other algos. But, the PySpark default is 5 while in Scala it recommends 10-20. Maybe it would be better to just put a sentence in the online docs about this and remove the recommendation here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's remove the recommended part of the iterations doc, and the doc here can be amended accordingly to something like iterations is the number of iterations to run. ALS typically converges to a reasonable solution within 10-20 iterations.

@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51572 has finished for PR 11186 at commit 5d59b4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in e298ac9 Feb 22, 2016
@MLnick
Copy link
Contributor

MLnick commented Feb 22, 2016

LGTM, merged into master. Thanks @BryanCutler!

@BryanCutler BryanCutler deleted the param-desc-consistent-fpmrecc-SPARK-12632 branch December 2, 2016 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants