[SPARK-55579][PYTHON][4.1] Rename PySpark error classes to be eval-type-agnostic by Yicong-Huang · Pull Request #55147 · apache/spark

Yicong-Huang · 2026-04-01T22:42:33Z

What changes were proposed in this pull request?

Backport of #54996 to branch-4.1.

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

Old Name	New Name
`PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS`	`OUTPUT_EXCEEDS_INPUT_ROWS`
`RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF`	`RESULT_ROWS_MISMATCH`
`STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF`	`INPUT_NOT_FULLY_CONSUMED`
`RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF`	`RESULT_COLUMN_SCHEMA_MISMATCH`
`RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF`	`RESULT_COLUMN_NAMES_MISMATCH`
`RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF`	`RESULT_COLUMN_NAMES_MISMATCH` (merged)

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests updated to match new error class names and messages.

Was this patch authored or co-authored using generative AI tooling?

No.

…nostic Rename six PySpark error conditions to be generic and not tied to specific UDF eval types: | Old Name | New Name | |---|---| | `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` | | `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` | | `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` | | `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) | Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data"). These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion. Part of [SPARK-55388](https://issues.apache.org/jira/browse/SPARK-55388). Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references. Existing tests updated to match new error condition names and messages. No Closes apache#54996 from Yicong-Huang/SPARK-55579/rename-error-classes. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

…pe-agnostic ### What changes were proposed in this pull request? Backport of #54996 to branch-4.1. Rename six PySpark error conditions to be generic and not tied to specific UDF eval types: | Old Name | New Name | |---|---| | `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` | | `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` | | `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` | | `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) | Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data"). ### Why are the changes needed? These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests updated to match new error class names and messages. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #55147 from Yicong-Huang/SPARK-55579-backport-4.1. Lead-authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Co-authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

zhengruifeng · 2026-04-02T12:08:20Z

merged to 4.1 for CI

Yicong-Huang added 2 commits April 1, 2026 22:41

fix: sort error-conditions.json alphabetically

af293e3

zhengruifeng closed this Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55579][PYTHON][4.1] Rename PySpark error classes to be eval-type-agnostic#55147

[SPARK-55579][PYTHON][4.1] Rename PySpark error classes to be eval-type-agnostic#55147
Yicong-Huang wants to merge 2 commits intoapache:branch-4.1from
Yicong-Huang:SPARK-55579-backport-4.1

Yicong-Huang commented Apr 1, 2026

Uh oh!

zhengruifeng commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yicong-Huang commented Apr 1, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants