Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2786][mllib] Python correlations #1713

Closed
wants to merge 9 commits into from

Conversation

dorx
Copy link
Contributor

@dorx dorx commented Aug 1, 2014

No description provided.

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA tests have started for PR 1713. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17653/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA results for PR 1713:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Statistics(object):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17653/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA tests have started for PR 1713. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17657/consoleFull

* After new correlation algorithms are added, please update the documentation here and in
* Statistics.scala for the correlation APIs.
*/
object CorrelationNames {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it supposed to be a public API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally planned on using it inside of pyspark but private[mllib] is sufficient scope now.

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA results for PR 1713:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Statistics(object):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17657/consoleFull

* Returns the correlation matrix serialized into a byte array understood by deserializers in
* pyspark.
*/
def corr(X: JavaRDD[Array[Byte]], method: String): Array[Byte] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore this if you like, but if I'm being picky, why not spell out "correlations"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was designed to match R's method name. R was a good candidate here since it also allows you to pass in the method name as a string after the input arrays/matrix.

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA tests have started for PR 1713. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17688/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA results for PR 1713:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Statistics(object):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17688/consoleFull

* pyspark.
*/
def corr(X: JavaRDD[Array[Byte]], method: String): Array[Byte] = {
val inputMatrix = X.rdd.map(deserializeDoubleVector(_))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: X.rdd.map(deserializeDoubleVector) ((_) is not necessary)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is in this case, since deserializeDoubleVector has 2 arguments (with offset being optional).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA tests have started for PR 1713. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17698/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA results for PR 1713:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Statistics(object):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17698/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Aug 1, 2014

LGTM. Merged into master. Thanks!!

@asfgit asfgit closed this in d88e695 Aug 1, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Author: Doris Xin <doris.s.xin@gmail.com>

Closes apache#1713 from dorx/pythonCorrelation and squashes the following commits:

5f1e60c [Doris Xin] reviewer comments.
46ff6eb [Doris Xin] reviewer comments.
ad44085 [Doris Xin] style fix
e69d446 [Doris Xin] fixed missed conflicts.
eb5bf56 [Doris Xin] merge master
cc9f725 [Doris Xin] units passed.
9141a63 [Doris Xin] WIP2
d199f1f [Doris Xin] Moved correlation names into a public object
cd163d6 [Doris Xin] WIP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants