[SPARK-54722][PYTHON][SQL] Register Pandas Grouped Iter Aggregate UDF for SQL usage #53493

Yicong-Huang · 2025-12-17T00:29:00Z

What changes were proposed in this pull request?

This PR adds SQL_GROUPED_AGG_PANDAS_ITER_UDF to the list of supported eval types in UDFRegistration.register() method, allowing users to register Pandas Grouped Iter Aggregate UDFs for SQL usage.

Why are the changes needed?

Currently, the iterator API for grouped aggregate Pandas UDFs cannot be registered for SQL usage via spark.udf.register(). This is inconsistent with other UDF types like SQL_GROUPED_AGG_ARROW_ITER_UDF which is already supported.

With this change, users can now register iterator-based grouped aggregate UDFs and use them in SQL queries:

@pandas_udf("double")
def sum_iter_udf(it: Iterator[pd.Series]) -> float:
    total = 0.0
    for series in it:
        total += series.sum()
    return total

spark.udf.register("sum_iter_udf", sum_iter_udf)
spark.sql("SELECT sum_iter_udf(v) FROM table GROUP BY id")

Does this PR introduce any user-facing change?

Yes. Users can now register Pandas Grouped Iter Aggregate UDFs (Iterator[pd.Series] -> scalar) for SQL usage.

How was this patch tested?

Added a new test case test_register_grouped_agg_iter_udf in python/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py.

Was this patch authored or co-authored using generative AI tooling?

No.

zhengruifeng · 2025-12-18T02:24:38Z

merged to master

feat: register Pandas Grouped Iter Aggregate UDF for SQL usage

30c83c6

github-actions bot added SQL PYTHON labels Dec 17, 2025

test: add test case

034758a

Yicong-Huang changed the title ~~[SPARK-54722][PYTHON] Register Pandas Grouped Iter Aggregate UDF for SQL usage~~ [SPARK-54722][PYTHON][SQL] Register Pandas Grouped Iter Aggregate UDF for SQL usage Dec 17, 2025

allisonwang-db approved these changes Dec 17, 2025

View reviewed changes

Yicong-Huang added 2 commits December 17, 2025 11:36

fix: update error message

a21ff44

fix: connect version

73a3df1

github-actions bot added the CONNECT label Dec 17, 2025

zhengruifeng approved these changes Dec 18, 2025

View reviewed changes

zhengruifeng closed this in 81540fa Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54722][PYTHON][SQL] Register Pandas Grouped Iter Aggregate UDF for SQL usage #53493

[SPARK-54722][PYTHON][SQL] Register Pandas Grouped Iter Aggregate UDF for SQL usage #53493

Uh oh!

Yicong-Huang commented Dec 17, 2025

Uh oh!

zhengruifeng commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-54722][PYTHON][SQL] Register Pandas Grouped Iter Aggregate UDF for SQL usage #53493

[SPARK-54722][PYTHON][SQL] Register Pandas Grouped Iter Aggregate UDF for SQL usage #53493

Uh oh!

Conversation

Yicong-Huang commented Dec 17, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants