[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format #11186

BryanCutler · 2016-02-12T20:07:20Z

Part of task for SPARK-11219 to make PySpark MLlib parameter description formatting consistent. This is for the fpm and recommendation modules.

Closes #10602
Closes #10897

…ark MLlib FPM and Recommendation]

… and spacing

SparkQA · 2016-02-12T20:44:23Z

Test build #51195 has finished for PR 11186 at commit 744e37d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-02-16T07:43:32Z

python/pyspark/mllib/fpm.py

+          itemsets.
+        :param minSupport:
+          The minimal support level of the sequential pattern, any
+          pattern appears more than (minSupport * size-of-the-dataset)


Can we change this from appears -> appearing (or ... pattern that appears ...)

MLnick · 2016-02-16T07:53:46Z

@BryanCutler made a quick pass. While we're doing the format change, we may as well make a few little doc clean ups as per my comments.

…wording

BryanCutler · 2016-02-16T22:53:28Z

Thanks for taking a look @MLnick , I agree that we might as well try to clean up the docs as well. I made the corrections you suggested and some others that I found - also tried to sync up the scala side too.

ping @mengxr to take a look also

BryanCutler · 2016-02-16T23:38:55Z

jenkins retest please

BryanCutler · 2016-02-17T02:09:44Z

Jenkins retest this please

SparkQA · 2016-02-17T03:08:09Z

Test build #51401 has finished for PR 11186 at commit 37de948.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-02-18T08:26:01Z

mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala

   *
-   * @param ratings    RDD of (userID, productID, rating) pairs
+   * @param ratings    RDD of [[Rating]] objects with userID, productID, and rating
   * @param rank       number of features to use
   * @param iterations number of iterations of ALS (recommended: 10-20)
   * @param lambda     regularization factor (recommended: 0.01)


I'd prefer regularization parameter rather than factor. Could we also change the PySpark doc from smoothing parameter -> regularization parameter, to be consistent?

While we're at it, I don't like the (recommended: 0.01) in the comment for lambda. This is the default for the method call without specifying lambda, but I certainly don't think this is the recommended regularization. Lambda can vary widely depending on the dataset characteristics, and should be selected in the usual manner (e.g. through cross-validation).

Good suggestions, I agree with what you said. Also, the recommendation of the number of iterations is also a bit strange. I can see putting out a ballpark number so people don't put a huge number, since it can converge quickly and take a bit longer relative to other algos. But, the PySpark default is 5 while in Scala it recommends 10-20. Maybe it would be better to just put a sentence in the online docs about this and remove the recommendation here?

Agreed, let's remove the recommended part of the iterations doc, and the doc here can be amended accordingly to something like iterations is the number of iterations to run. ALS typically converges to a reasonable solution within 10-20 iterations.

…ed iteration to online docs

SparkQA · 2016-02-19T20:56:45Z

Test build #51572 has finished for PR 11186 at commit 5d59b4e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-02-22T10:52:48Z

LGTM, merged into master. Thanks @BryanCutler!

somideshmukh and others added 6 commits January 5, 2016 17:48

[SPARK-12632][Python][Make Parameter Descriptions Consistent for PySp…

5b53e88

…ark MLlib FPM and Recommendation]

[SPARK-12632][Python][Make Parameter Descriptions Consistent for PySp…

6f4ef15

…ark MLlib FPM and Recommendation]

[SPARK-12632][Python][Make Parameter Descriptions Consistent for PySp…

bfcff90

…ark MLlib FPM and Recommendation]

[SPARK-12632][Python][Make Parameter Descriptions Consistent for PySp…

a033a55

…ark MLlib FPM and Recommendation]

[SPARK-12632] Added missing ALS param descriptions. Fixed indentation…

e7fd140

… and spacing

reworded ALS ratings param desc, more spacing fixes

744e37d

BryanCutler mentioned this pull request Feb 12, 2016

[SPARK-12632][Python][Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation] #10602

Closed

MLnick reviewed Feb 16, 2016
View reviewed changes

Fixed wrong description of ALS iteration param, improved grammar and …

37de948

…wording

MLnick reviewed Feb 18, 2016
View reviewed changes

fixed reg param wording, removed recommended reg val, moved recommend…

5d59b4e

…ed iteration to online docs

asfgit closed this in e298ac9 Feb 22, 2016

BryanCutler deleted the param-desc-consistent-fpmrecc-SPARK-12632 branch December 2, 2016 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format #11186

[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format #11186

BryanCutler commented Feb 12, 2016

SparkQA commented Feb 12, 2016

MLnick Feb 16, 2016

MLnick commented Feb 16, 2016

BryanCutler commented Feb 16, 2016

BryanCutler commented Feb 16, 2016

BryanCutler commented Feb 17, 2016

SparkQA commented Feb 17, 2016

MLnick Feb 18, 2016

MLnick Feb 18, 2016

BryanCutler Feb 18, 2016

MLnick Feb 19, 2016

SparkQA commented Feb 19, 2016

MLnick commented Feb 22, 2016

[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format #11186

[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format #11186

Conversation

BryanCutler commented Feb 12, 2016

SparkQA commented Feb 12, 2016

MLnick Feb 16, 2016

Choose a reason for hiding this comment

MLnick commented Feb 16, 2016

BryanCutler commented Feb 16, 2016

BryanCutler commented Feb 16, 2016

BryanCutler commented Feb 17, 2016

SparkQA commented Feb 17, 2016

MLnick Feb 18, 2016

Choose a reason for hiding this comment

MLnick Feb 18, 2016

Choose a reason for hiding this comment

BryanCutler Feb 18, 2016

Choose a reason for hiding this comment

MLnick Feb 19, 2016

Choose a reason for hiding this comment

SparkQA commented Feb 19, 2016

MLnick commented Feb 22, 2016