Skip to content

Commit

Permalink
[MINOR][DOC] Fix python variance() documentation
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

The Python documentation incorrectly says that `variance()` acts as `var_pop` whereas it acts like `var_samp` here: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.variance

It was not the case in Spark 1.6 doc but it is in Spark 2.0 doc:
https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/functions.html
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/functions.html

The Scala documentation is correct: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#variance-org.apache.spark.sql.Column-

The alias is set on this line:
https://github.com/apache/spark/blob/v2.4.3/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L786

## How was this patch tested?
Using variance() in pyspark 2.4.3 returns:
```
>>> spark.createDataFrame([(1, ), (2, ), (3, )], "a: int").select(variance("a")).show()
+-----------+
|var_samp(a)|
+-----------+
|        1.0|
+-----------+
```

Closes #24895 from tools4origins/patch-1.

Authored-by: tools4origins <tools4origins@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
tools4origins authored and dongjoon-hyun committed Jun 20, 2019
1 parent ea0e119 commit 25c5d57
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions python/pyspark/sql/functions.py
Expand Up @@ -209,14 +209,14 @@ def _():
"""
_functions_1_6_over_column = {
# unary math functions
'stddev': 'Aggregate function: returns the unbiased sample standard deviation of' +
' the expression in a group.',
'stddev': 'Aggregate function: alias for stddev_samp.',
'stddev_samp': 'Aggregate function: returns the unbiased sample standard deviation of' +
' the expression in a group.',
'stddev_pop': 'Aggregate function: returns population standard deviation of' +
' the expression in a group.',
'variance': 'Aggregate function: returns the population variance of the values in a group.',
'var_samp': 'Aggregate function: returns the unbiased variance of the values in a group.',
'variance': 'Aggregate function: alias for var_samp.',
'var_samp': 'Aggregate function: returns the unbiased sample variance of' +
' the values in a group.',
'var_pop': 'Aggregate function: returns the population variance of the values in a group.',
'skewness': 'Aggregate function: returns the skewness of the values in a group.',
'kurtosis': 'Aggregate function: returns the kurtosis of the values in a group.',
Expand Down

0 comments on commit 25c5d57

Please sign in to comment.