Skip to content

[SPARK-43937][CONNECT][PYTHON] Add ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python#41534

Closed
panbingkun wants to merge 11 commits intoapache:masterfrom
panbingkun:SPARK-43937
Closed

[SPARK-43937][CONNECT][PYTHON] Add ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python#41534
panbingkun wants to merge 11 commits intoapache:masterfrom
panbingkun:SPARK-43937

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jun 9, 2023

What changes were proposed in this pull request?

Add following functions:

  • not: already exists on the Scala side, but is it still not on the Python side due to keyword conflicts?
  • if: new add on the scala side, however it conflicts with Python keywords if and cannot be added on the Python side?
  • ifnull
  • isnotnull
  • equal_null
  • nullif
  • nvl
  • nvl2

to:

  • Scala API
  • Python API
  • Spark Connect Scala Client
  • Spark Connect Python Client

Why are the changes needed?

for parity

Does this PR introduce any user-facing change?

Yes, new functions.

How was this patch tested?

  • Add New UT.

@panbingkun
Copy link
Contributor Author

cc @zhengruifeng

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if: new add on the scala side, however it conflicts with Python keywords if and cannot be added on the Python side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should apply a different name in this case, like map in Scala <-> create_map in Python.

cc @HyukjinKwon @beliefer do you have a good name for it?

Copy link
Contributor

@beliefer beliefer Jun 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered the same like problem. I want add the any API to Pyspark, but it is a Python keywords too.
Personally, I use the name py_any instead. How about py_if or py_replaceable_if ? cc @HyukjinKwon @zhengruifeng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping @HyukjinKwon for the function names for any and if ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer @panbingkun sorry, but I'd suggest we exclude any if for now: we have some/bool_or to replace any, and when/otherwise to replace if.

let's add them in separate PRs later if we really need them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe name it after misc non-aggregate functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should apply a different name in this case, like map in Scala <-> create_map in Python.

cc @HyukjinKwon @beliefer do you have a good name for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current ifnull use the same expression as nvl, let's discuss in #41516 (comment) first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep current implementation which directly invoke alias ifnull

Copy link
Contributor

@beliefer beliefer Jun 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered the same like problem. I want add the any API to Pyspark, but it is a Python keywords too.
Personally, I use the name py_any instead. How about py_if or py_replaceable_if ? cc @HyukjinKwon @zhengruifeng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the reason put the EqualNull into the group misc_funcs. It seems EqualNull should putted into group predicate_funcs. cc @cloud-fan @zhengruifeng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a few functions in FunctionRegistry (and other places) were placed in wrong group. e.g. Abs should be in math instead of misc non-aggregate functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fix it together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, let's put these functions in 'predicate_funcs' first? and later I will propose a new PR to fix grouping and other trivial issues. @zhengruifeng @beliefer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine, let's fix the grouping in separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@group predicate_funcs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also exclude if on the scala side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's don't touch unrelated files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Let me revert it.

@panbingkun panbingkun requested a review from zhengruifeng June 15, 2023 03:43
* @group predicates_funcs
* @since 3.5.0
*/
def ifnull(col1: Column, col2: Column): Column = withExpr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the replacement in ifnull/nvl/nvl2/nullif is missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my understanding, 'replacement: Expression' is only an internal Spark mechanism that exists to cooperate with 'RuntimeReplaceable' and is designed to reuse expressions. I guess it is actually not needed in our function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhengruifeng the default constructor with replacement: Expression will be replaced by rules. We only need consider the other constructor without replacement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, thanks for the explanation! @panbingkun @beliefer

@zhengruifeng
Copy link
Contributor

merged to mater, thank you @panbingkun @beliefer

czxm pushed a commit to czxm/spark that referenced this pull request Jun 19, 2023
…,nvl,nvl2 to Scala and Python

### What changes were proposed in this pull request?
Add following functions:

- ~~not: already exists on the Scala side, but is it still not on the Python side due to keyword conflicts?~~
- ~~if: new add on the scala side, however it conflicts with Python keywords `if` and cannot be added on the Python side?~~
- ifnull
- isnotnull
- equal_null
- nullif
- nvl
- nvl2

to:

- Scala API
- Python API
- Spark Connect Scala Client
- Spark Connect Python Client

### Why are the changes needed?
for parity

### Does this PR introduce _any_ user-facing change?
Yes, new functions.

### How was this patch tested?
- Add New UT.

Closes apache#41534 from panbingkun/SPARK-43937.

Lead-authored-by: panbingkun <pbk1982@gmail.com>
Co-authored-by: panbingkun <84731559@qq.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments