[SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name #26435

MaxHaertwig · 2019-11-08T10:03:58Z

What changes were proposed in this pull request?

I added functions that can be called with the column name for the functions in the non-aggregate functions section of functions.scala.

isnan(columnName: String): Column
isnull(columnName: String): Column
nanvl(col1Name: String, col2Name: String): Column
negate(columnName: String): Column
not(columnName: String): Column
bitwiseNOT(columnName: String): Column

Why are the changes needed?

This pull requests makes it possible to check for nan values in the column x by calling isnan("x"), instead of isnan($"x"). PySpark: isnan("x"), instead of isnan(col("x")). This way, users don't need to remember to transform the value to a column. This makes it consistent with other functions such as sqrt that can already be called with the column name.

Does this PR introduce any user-facing change?

Yes
See previous section.

How was this patch tested?

I couldn't find a test file, where sql functions and pyspark sql functions are tested. Please point me in the right direction.

AmplabJenkins · 2019-11-08T10:05:48Z

Can one of the admins verify this patch?

maropu · 2019-11-09T00:58:42Z

Can you file jira first and add a JIRA ID in the title? see: https://spark.apache.org/contributing.html

HyukjinKwon · 2019-11-11T01:25:44Z

python/pyspark/sql/functions.py

    [Row(r1=False, r2=False), Row(r1=True, r2=True)]
    """
    sc = SparkContext._active_spark_context
+    if type(col) is str:


This seems already working.

>>> from pyspark.sql.functions import isnan >>> df = spark.createDataFrame([(1.0, float('nan')), (float('nan'), 2.0)], ("a", "b")) >>> df.select(isnan("a")).collect() [Row(isnan(a)=False), Row(isnan(a)=True)]

HyukjinKwon · 2019-11-11T01:26:37Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

+   * @group normal_funcs
+   * @since 1.6.0
+   */
+  def isnan(columnName: String): Column = isnan(Column(columnName))


We won't add this per the comments on the top of this file.

spark/sql/core/src/main/scala/org/apache/spark/sql/functions.scala

Lines 58 to 60 in f8b1424

* This function APIs usually have methods with `Column` signature only because it can support not

* only `Column` but also other types such as a native string. The other variants currently exist

* for historical reasons.

Max Härtwig added 2 commits November 8, 2019 10:42

Allow calling non-aggregate sql functions with column name

16bb9a2

Allow calling non-aggregate pyspark sql functions with column name

f8b1424

MaxHaertwig changed the title ~~Allow calling non-aggregate SQL functions with column name~~ [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name Nov 9, 2019

HyukjinKwon reviewed Nov 11, 2019

View reviewed changes

MaxHaertwig closed this Jan 3, 2020

dongjoon-hyun added the SQL label Feb 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name #26435

[SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name #26435

Uh oh!

MaxHaertwig commented Nov 8, 2019 •

edited

Loading

Uh oh!

AmplabJenkins commented Nov 8, 2019

Uh oh!

maropu commented Nov 9, 2019

Uh oh!

HyukjinKwon Nov 11, 2019

Uh oh!

HyukjinKwon Nov 11, 2019

Uh oh!

HyukjinKwon Nov 11, 2019

Uh oh!

HyukjinKwon Nov 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	* This function APIs usually have methods with `Column` signature only because it can support not
	* only `Column` but also other types such as a native string. The other variants currently exist
	* for historical reasons.

[SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name #26435

[SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name #26435

Uh oh!

Conversation

MaxHaertwig commented Nov 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Nov 8, 2019

Uh oh!

maropu commented Nov 9, 2019

Uh oh!

HyukjinKwon Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MaxHaertwig commented Nov 8, 2019 •

edited

Loading