[SPARK-54722][PYTHON][SQL] Register Pandas Grouped Iter Aggregate UDF for SQL usage #53493
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds
SQL_GROUPED_AGG_PANDAS_ITER_UDFto the list of supported eval types inUDFRegistration.register()method, allowing users to register Pandas Grouped Iter Aggregate UDFs for SQL usage.Why are the changes needed?
Currently, the iterator API for grouped aggregate Pandas UDFs cannot be registered for SQL usage via
spark.udf.register(). This is inconsistent with other UDF types likeSQL_GROUPED_AGG_ARROW_ITER_UDFwhich is already supported.With this change, users can now register iterator-based grouped aggregate UDFs and use them in SQL queries:
Does this PR introduce any user-facing change?
Yes. Users can now register Pandas Grouped Iter Aggregate UDFs (
Iterator[pd.Series] -> scalar) for SQL usage.How was this patch tested?
Added a new test case
test_register_grouped_agg_iter_udfinpython/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py.Was this patch authored or co-authored using generative AI tooling?
No.