Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43773][CONNECT][PYTHON] Implement 'levenshtein(str1, str2[, threshold])' functions in python client #41296

Closed
wants to merge 7 commits into from

Conversation

panbingkun
Copy link
Contributor

What changes were proposed in this pull request?

The pr aims to implement 'levenshtein(str1, str2[, threshold])' functions in python client

Why are the changes needed?

After Add a max distance argument to the levenshtein() function We have already implemented it on the scala side, so we need to align it on pyspark.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Manual testing
    python/run-tests --testnames 'python.pyspark.sql.tests.test_functions FunctionsTests.test_levenshtein_function'
  • Pass GA

@panbingkun
Copy link
Contributor Author

Waiting for #41293

"""Computes the Levenshtein distance of the two given strings.

.. versionadded:: 1.5.0

.. versionchanged:: 3.4.0
Supports Spark Connect.

.. versionchanged:: 3.5.0
Supports Spark Connect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer another versionadded after parameter threshold, you can refer to

Parameters
----------
func : function
a Python native function that takes an iterator of `pandas.DataFrame`\\s, and
outputs an iterator of `pandas.DataFrame`\\s.
schema : :class:`pyspark.sql.types.DataType` or str
the return type of the `func` in PySpark. The value can be either a
:class:`pyspark.sql.types.DataType` object or a DDL-formatted type string.
barrier : bool, optional, default True
Use barrier mode execution.
.. versionchanged: 3.5.0
Added ``barrier`` argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done.

python/pyspark/sql/tests/connect/test_connect_function.py Outdated Show resolved Hide resolved
@zhengruifeng
Copy link
Contributor

@panbingkun you would need dev/reformat-python to fix python linter issue

@panbingkun
Copy link
Contributor Author

@panbingkun you would need dev/reformat-python to fix python linter issue

Ok, let me try. Thanks!

@panbingkun
Copy link
Contributor Author

@panbingkun you would need dev/reformat-python to fix python linter issue

This is done.

@panbingkun panbingkun changed the title [SPARK-43773][PYTHON] Implement 'levenshtein(str1, str2[, threshold])' functions in python client [SPARK-43773][CONNECT][PYTHON] Implement 'levenshtein(str1, str2[, threshold])' functions in python client May 27, 2023
@zhengruifeng
Copy link
Contributor

merged to master

czxm pushed a commit to czxm/spark that referenced this pull request Jun 12, 2023
…r1, str2)' functions in python client

### What changes were proposed in this pull request?
The pr aims to implement 'levenshtein(str1, str2[, threshold])' functions in python client

### Why are the changes needed?
After Add a max distance argument to the levenshtein() function We have already implemented it on the scala side, so we need to align it on `pyspark`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manual testing
python/run-tests --testnames 'python.pyspark.sql.tests.test_functions FunctionsTests.test_levenshtein_function'
- Pass GA

Closes apache#41296 from panbingkun/SPARK-43773.

Lead-authored-by: panbingkun <pbk1982@gmail.com>
Co-authored-by: panbingkun <84731559@qq.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants