Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43934][SQL][PYTHON][CONNECT] Add regexp_* functions to Scala and Python #41515

Closed
wants to merge 3 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Jun 8, 2023

What changes were proposed in this pull request?

This PR want add regexp_* functions to Scala, Python and Connect API.
These functions show below.

  • rlike

  • regexp

  • regexp_count

  • regexp_extract_all

  • regexp_instr

  • regexp_like

  • regexp_substr

Why are the changes needed?

Add regexp_* functions to Scala, Python and Connect API.

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New test cases.

@beliefer
Copy link
Contributor Author

beliefer commented Jun 9, 2023

ping @HyukjinKwon @zhengruifeng cc @cloud-fan

@@ -1988,13 +1988,70 @@ def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column:
split.__doc__ = pysparkfuncs.split.__doc__


def rlike(string: "ColumnOrName", pattern: Union[str, Column]) -> Column:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the python side, let's use ColumnOrName augments. that is, when pattern is a str, treat it as a column name

see discussion here #41505 (comment)

@zhengruifeng
Copy link
Contributor

looks pretty good, let's wait for the discussion #41516 (comment) , since regexp_like and regexp are alias to rlike

@beliefer
Copy link
Contributor Author

@HyukjinKwon @zhengruifeng The GA failure is unrelated to this PR.

Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind rebasing to the lasted master?

@@ -7596,6 +7721,48 @@ def regexp_extract(str: "ColumnOrName", pattern: str, idx: int) -> Column:
return _invoke_function("regexp_extract", _to_java_column(str), pattern, idx)


@try_remote_functions
def regexp_extract_all(
str: "ColumnOrName", regexp: "ColumnOrName", idx: Optional[Union[int, Column]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to make idx also support column name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite confirm but according to @HyukjinKwon 's suggestion.

@beliefer
Copy link
Contributor Author

would you mind rebasing to the lasted master?

Has been rebased. Please wait for GA passed.

@zhengruifeng
Copy link
Contributor

merged to master

@beliefer
Copy link
Contributor Author

@zhengruifeng Thank you very much!

czxm pushed a commit to czxm/spark that referenced this pull request Jun 19, 2023
…nd Python

### What changes were proposed in this pull request?
This PR want add regexp_* functions to Scala, Python and Connect API.
These functions show below.

- rlike

- regexp

- regexp_count

- regexp_extract_all

- regexp_instr

- regexp_like

- regexp_substr

### Why are the changes needed?
Add regexp_* functions to Scala, Python and Connect API.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes apache#41515 from beliefer/SPARK-43934.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants