[SPARK-12962] [SQL] [PySpark] PySpark support covar_samp and covar_pop #10876

yanboliang · 2016-01-22T06:37:28Z

PySpark support covar_samp and covar_pop.

SparkQA · 2016-01-22T07:16:06Z

Test build #49909 has finished for PR 10876 at commit 07b58d0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2016-01-25T20:51:32Z

python/pyspark/sql/functions.py

+    >>> df = sqlContext.createDataFrame(zip(a, b), ["a", "b"])
+    >>> covDf = df.agg(covar_pop("a", "b").alias('c'))
+    >>> covDf.select("c").collect()
+    [Row(c=565.25)]


Should we maybe compare with a tolerance as done in the other doctests since floating point?

+1 - please see corr https://github.com/yanboliang/spark/blame/spark-12962/python/pyspark/sql/functions.py#L251

SparkQA · 2016-01-27T03:27:43Z

Test build #50164 has finished for PR 10876 at commit 645a5d4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2016-01-28T21:07:10Z

This looks good to me, maybe @davies or @jkbradley could take a look?

davies · 2016-01-28T21:24:14Z

python/pyspark/sql/functions.py

+    >>> b = range(20)
+    >>> df = sqlContext.createDataFrame(zip(a, b), ["a", "b"])
+    >>> covDf = df.agg(covar_pop("a", "b").alias('c'))
+    >>> covDf.selectExpr('abs(c - 565.25) < 1e-16 as t').collect()


Since these are going to be part of the API docs, I'd like to be as simple as possible.
Also the Python API is only a wrapper of Scala one, so we don't need to test the correctness here.

So just as a side note - it seems the other functions (including corr) do have these tests for correctness in their doctests. We should probably be consistent with them one way or the other yes?

Yeah, we should cleanup them up.

@davies OK, I will cleanup them.

davies · 2016-01-28T21:25:42Z

@yanboliang Could you also add Python API for corr?

felixcheung · 2016-01-28T22:56:25Z

@davies corr was added? https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L251

davies · 2016-01-28T23:30:22Z

@felixcheung I see, thanks!

SparkQA · 2016-01-29T05:46:53Z

Test build #50344 has finished for PR 10876 at commit ad15ff4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-02-05T03:29:27Z

ping @davies

davies · 2016-02-05T05:18:00Z

@yanboliang I'm sorry that I didn't make it clear: It's good to have a simple test, that could also be used as an example to demo how to use these functions (it should be as simple as possible).

SparkQA · 2016-02-11T15:43:44Z

Test build #51104 has finished for PR 10876 at commit 8029911.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-11T16:34:19Z

Test build #51106 has finished for PR 10876 at commit c29ff29.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-02-11T18:38:07Z

python/pyspark/sql/functions.py

+
+    >>> a = [x * x - 2 * x + 3.5 for x in range(20)]
+    >>> b = range(20)
+    >>> df = sqlContext.createDataFrame(zip(a, b), ["a", "b"])


How about a == b, then the result will be zero?

SparkQA · 2016-02-12T10:23:43Z

Test build #51181 has finished for PR 10876 at commit 374931a.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-12T11:06:48Z

Test build #51183 has finished for PR 10876 at commit 525d1f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-02-12T20:42:33Z

LGTM, merging into master, thanks!

PySpark support covar_samp and covar_pop

07b58d0

holdenk reviewed Jan 25, 2016
View reviewed changes

davies reviewed Jan 28, 2016
View reviewed changes

clean up doc test

8029911

yanboliang force-pushed the spark-12962 branch from ad15ff4 to 8029911 Compare February 11, 2016 15:13

fix output of doc test

c29ff29

davies reviewed Feb 11, 2016
View reviewed changes

clean up doc tests

374931a

fix doc tests

525d1f2

asfgit closed this in 90de6b2 Feb 12, 2016

yanboliang deleted the spark-12962 branch February 14, 2016 03:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12962] [SQL] [PySpark] PySpark support covar_samp and covar_pop #10876

[SPARK-12962] [SQL] [PySpark] PySpark support covar_samp and covar_pop #10876

yanboliang commented Jan 22, 2016

SparkQA commented Jan 22, 2016

holdenk Jan 25, 2016

felixcheung Jan 26, 2016

SparkQA commented Jan 27, 2016

holdenk commented Jan 28, 2016

davies Jan 28, 2016

holdenk Jan 28, 2016

davies Jan 28, 2016

yanboliang Jan 29, 2016

davies commented Jan 28, 2016

felixcheung commented Jan 28, 2016

davies commented Jan 28, 2016

SparkQA commented Jan 29, 2016

yanboliang commented Feb 5, 2016

davies commented Feb 5, 2016

SparkQA commented Feb 11, 2016

SparkQA commented Feb 11, 2016

davies Feb 11, 2016

SparkQA commented Feb 12, 2016

SparkQA commented Feb 12, 2016

davies commented Feb 12, 2016

[SPARK-12962] [SQL] [PySpark] PySpark support covar_samp and covar_pop #10876

[SPARK-12962] [SQL] [PySpark] PySpark support covar_samp and covar_pop #10876

Conversation

yanboliang commented Jan 22, 2016

SparkQA commented Jan 22, 2016

holdenk Jan 25, 2016

Choose a reason for hiding this comment

felixcheung Jan 26, 2016

Choose a reason for hiding this comment

SparkQA commented Jan 27, 2016

holdenk commented Jan 28, 2016

davies Jan 28, 2016

Choose a reason for hiding this comment

holdenk Jan 28, 2016

Choose a reason for hiding this comment

davies Jan 28, 2016

Choose a reason for hiding this comment

yanboliang Jan 29, 2016

Choose a reason for hiding this comment

davies commented Jan 28, 2016

felixcheung commented Jan 28, 2016

davies commented Jan 28, 2016

SparkQA commented Jan 29, 2016

yanboliang commented Feb 5, 2016

davies commented Feb 5, 2016

SparkQA commented Feb 11, 2016

SparkQA commented Feb 11, 2016

davies Feb 11, 2016

Choose a reason for hiding this comment

SparkQA commented Feb 12, 2016

SparkQA commented Feb 12, 2016

davies commented Feb 12, 2016