Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

What changes were proposed in this pull request?

  • spark.python.profile: Add SQL_GROUPED_AGG_PANDAS_ITER_UDF to the profiler warning list in udf.py so that when spark.python.profile is enabled, users will see appropriate warnings consistent with other iterator-based UDFs.
  • spark.sql.pyspark.udf.profiler: No changes needed. This UDF type already works correctly because it returns scalar (not iterator), so it uses the non-iterator profiler branch in wrap_perf_profiler and wrap_memory_profiler.

Why are the changes needed?

To make profilers support for SQL_GROUPED_AGG_PANDAS_ITER_UDF consistent with other UDFs.

Does this PR introduce any user-facing change?

Yes. When users enable spark.python.profile with SQL_GROUPED_AGG_PANDAS_ITER_UDF, they will now see a warning message consistent with other iterator-based UDFs.

How was this patch tested?

Added a test case test_perf_profiler_pandas_udf_grouped_agg_iter to verify that spark.sql.pyspark.udf.profiler works correctly with this UDF type. Also verified that the spark.python.profile profiler warning is triggered correctly in test_unsupported.

Was this patch authored or co-authored using generative AI tooling?

No.

@Yicong-Huang Yicong-Huang force-pushed the SPARK-54738/feat/add-profiler-support-for-grouped-iter-agg-udf branch from 5297bc1 to 9227760 Compare December 17, 2025 23:08
@github-actions github-actions bot removed the CORE label Dec 17, 2025
@zhengruifeng
Copy link
Contributor

merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants