New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43934][SQL][PYTHON][CONNECT] Add regexp_* functions to Scala and Python #41515
Conversation
ping @HyukjinKwon @zhengruifeng cc @cloud-fan |
@@ -1988,13 +1988,70 @@ def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column: | |||
split.__doc__ = pysparkfuncs.split.__doc__ | |||
|
|||
|
|||
def rlike(string: "ColumnOrName", pattern: Union[str, Column]) -> Column: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the python side, let's use ColumnOrName
augments. that is, when pattern
is a str
, treat it as a column name
see discussion here #41505 (comment)
looks pretty good, let's wait for the discussion #41516 (comment) , since |
@HyukjinKwon @zhengruifeng The GA failure is unrelated to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you mind rebasing to the lasted master?
@@ -7596,6 +7721,48 @@ def regexp_extract(str: "ColumnOrName", pattern: str, idx: int) -> Column: | |||
return _invoke_function("regexp_extract", _to_java_column(str), pattern, idx) | |||
|
|||
|
|||
@try_remote_functions | |||
def regexp_extract_all( | |||
str: "ColumnOrName", regexp: "ColumnOrName", idx: Optional[Union[int, Column]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to make idx
also support column name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite confirm but according to @HyukjinKwon 's suggestion.
Has been rebased. Please wait for GA passed. |
merged to master |
@zhengruifeng Thank you very much! |
…nd Python ### What changes were proposed in this pull request? This PR want add regexp_* functions to Scala, Python and Connect API. These functions show below. - rlike - regexp - regexp_count - regexp_extract_all - regexp_instr - regexp_like - regexp_substr ### Why are the changes needed? Add regexp_* functions to Scala, Python and Connect API. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes apache#41515 from beliefer/SPARK-43934. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
This PR want add regexp_* functions to Scala, Python and Connect API.
These functions show below.
rlike
regexp
regexp_count
regexp_extract_all
regexp_instr
regexp_like
regexp_substr
Why are the changes needed?
Add regexp_* functions to Scala, Python and Connect API.
Does this PR introduce any user-facing change?
'No'.
New feature.
How was this patch tested?
New test cases.