[SPARK-2786][mllib] Python correlations #1713

dorx · 2014-08-01T06:28:31Z

No description provided.

SparkQA · 2014-08-01T06:34:02Z

QA tests have started for PR 1713. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17653/consoleFull

SparkQA · 2014-08-01T06:34:51Z

QA results for PR 1713:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Statistics(object):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17653/consoleFull

SparkQA · 2014-08-01T06:54:07Z

QA tests have started for PR 1713. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17657/consoleFull

mengxr · 2014-08-01T07:10:03Z

mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala

+ * After new correlation algorithms are added, please update the documentation here and in
+ * Statistics.scala for the correlation APIs.
+ */
+object CorrelationNames {


Is it supposed to be a public API?

I originally planned on using it inside of pyspark but private[mllib] is sufficient scope now.

SparkQA · 2014-08-01T07:48:05Z

QA results for PR 1713:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Statistics(object):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17657/consoleFull

srowen · 2014-08-01T10:06:14Z

mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala

+   * Returns the correlation matrix serialized into a byte array understood by deserializers in
+   * pyspark.
+   */
+  def corr(X: JavaRDD[Array[Byte]], method: String): Array[Byte] = {


You can ignore this if you like, but if I'm being picky, why not spell out "correlations"?

This was designed to match R's method name. R was a good candidate here since it also allows you to pass in the method name as a string after the input arrays/matrix.

SparkQA · 2014-08-01T18:34:18Z

QA tests have started for PR 1713. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17688/consoleFull

SparkQA · 2014-08-01T19:27:46Z

QA results for PR 1713:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Statistics(object):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17688/consoleFull

mengxr · 2014-08-01T20:31:29Z

mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala

+   * pyspark.
+   */
+  def corr(X: JavaRDD[Array[Byte]], method: String): Array[Byte] = {
+    val inputMatrix = X.rdd.map(deserializeDoubleVector(_))


nit: X.rdd.map(deserializeDoubleVector) ((_) is not necessary)

Actually it is in this case, since deserializeDoubleVector has 2 arguments (with offset being optional).

SparkQA · 2014-08-01T20:54:27Z

QA tests have started for PR 1713. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17698/consoleFull

SparkQA · 2014-08-01T21:58:36Z

QA results for PR 1713:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Statistics(object):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17698/consoleFull

mengxr · 2014-08-01T22:02:35Z

LGTM. Merged into master. Thanks!!

Author: Doris Xin <doris.s.xin@gmail.com> Closes apache#1713 from dorx/pythonCorrelation and squashes the following commits: 5f1e60c [Doris Xin] reviewer comments. 46ff6eb [Doris Xin] reviewer comments. ad44085 [Doris Xin] style fix e69d446 [Doris Xin] fixed missed conflicts. eb5bf56 [Doris Xin] merge master cc9f725 [Doris Xin] units passed. 9141a63 [Doris Xin] WIP2 d199f1f [Doris Xin] Moved correlation names into a public object cd163d6 [Doris Xin] WIP

dorx added 6 commits July 29, 2014 11:44

WIP

cd163d6

Moved correlation names into a public object

d199f1f

WIP2

9141a63

units passed.

cc9f725

merge master

eb5bf56

fixed missed conflicts.

e69d446

style fix

ad44085

mengxr reviewed Aug 1, 2014
View reviewed changes

srowen reviewed Aug 1, 2014
View reviewed changes

reviewer comments.

46ff6eb

mengxr reviewed Aug 1, 2014
View reviewed changes

reviewer comments.

5f1e60c

asfgit closed this in d88e695 Aug 1, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-2786][mllib] Python correlations #1713

[SPARK-2786][mllib] Python correlations #1713

dorx commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

mengxr Aug 1, 2014

dorx Aug 1, 2014

SparkQA commented Aug 1, 2014

srowen Aug 1, 2014

dorx Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

mengxr Aug 1, 2014

dorx Aug 1, 2014

mengxr Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

mengxr commented Aug 1, 2014

[SPARK-2786][mllib] Python correlations #1713

[SPARK-2786][mllib] Python correlations #1713

Conversation

dorx commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

mengxr Aug 1, 2014

Choose a reason for hiding this comment

dorx Aug 1, 2014

Choose a reason for hiding this comment

SparkQA commented Aug 1, 2014

srowen Aug 1, 2014

Choose a reason for hiding this comment

dorx Aug 1, 2014

Choose a reason for hiding this comment

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

mengxr Aug 1, 2014

Choose a reason for hiding this comment

dorx Aug 1, 2014

Choose a reason for hiding this comment

mengxr Aug 1, 2014

Choose a reason for hiding this comment

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

mengxr commented Aug 1, 2014