New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43773][CONNECT][PYTHON] Implement 'levenshtein(str1, str2[, threshold])' functions in python client #41296
Conversation
…' functions in python client
…' functions in python client
Waiting for #41293 |
python/pyspark/sql/functions.py
Outdated
"""Computes the Levenshtein distance of the two given strings. | ||
|
||
.. versionadded:: 1.5.0 | ||
|
||
.. versionchanged:: 3.4.0 | ||
Supports Spark Connect. | ||
|
||
.. versionchanged:: 3.5.0 | ||
Supports Spark Connect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer another versionadded
after parameter threshold
, you can refer to
spark/python/pyspark/sql/pandas/map_ops.py
Lines 55 to 67 in ab4693d
Parameters | |
---------- | |
func : function | |
a Python native function that takes an iterator of `pandas.DataFrame`\\s, and | |
outputs an iterator of `pandas.DataFrame`\\s. | |
schema : :class:`pyspark.sql.types.DataType` or str | |
the return type of the `func` in PySpark. The value can be either a | |
:class:`pyspark.sql.types.DataType` object or a DDL-formatted type string. | |
barrier : bool, optional, default True | |
Use barrier mode execution. | |
.. versionchanged: 3.5.0 | |
Added ``barrier`` argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done.
Co-authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
…' functions in python client
@panbingkun you would need |
Ok, let me try. Thanks! |
…' functions in python client
This is done. |
merged to master |
…r1, str2)' functions in python client ### What changes were proposed in this pull request? The pr aims to implement 'levenshtein(str1, str2[, threshold])' functions in python client ### Why are the changes needed? After Add a max distance argument to the levenshtein() function We have already implemented it on the scala side, so we need to align it on `pyspark`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manual testing python/run-tests --testnames 'python.pyspark.sql.tests.test_functions FunctionsTests.test_levenshtein_function' - Pass GA Closes apache#41296 from panbingkun/SPARK-43773. Lead-authored-by: panbingkun <pbk1982@gmail.com> Co-authored-by: panbingkun <84731559@qq.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
The pr aims to implement 'levenshtein(str1, str2[, threshold])' functions in python client
Why are the changes needed?
After Add a max distance argument to the levenshtein() function We have already implemented it on the scala side, so we need to align it on
pyspark
.Does this PR introduce any user-facing change?
No.
How was this patch tested?
python/run-tests --testnames 'python.pyspark.sql.tests.test_functions FunctionsTests.test_levenshtein_function'