Skip to content

[SPARK-55579][PYTHON][4.0] Rename PySpark error classes to be eval-type-agnostic#55169

Closed
Yicong-Huang wants to merge 1 commit intoapache:branch-4.0from
Yicong-Huang:SPARK-55579-backport-4.0
Closed

[SPARK-55579][PYTHON][4.0] Rename PySpark error classes to be eval-type-agnostic#55169
Yicong-Huang wants to merge 1 commit intoapache:branch-4.0from
Yicong-Huang:SPARK-55579-backport-4.0

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Backport of #54996 to branch-4.0.

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

Old Name New Name
PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS OUTPUT_EXCEEDS_INPUT_ROWS
RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF RESULT_ROWS_MISMATCH
STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF INPUT_NOT_FULLY_CONSUMED
RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF RESULT_COLUMN_SCHEMA_MISMATCH
RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF RESULT_COLUMN_NAMES_MISMATCH
RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF RESULT_COLUMN_NAMES_MISMATCH (merged)

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests updated to match new error class names and messages.

Was this patch authored or co-authored using generative AI tooling?

No.

…nostic

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

| Old Name | New Name |
|---|---|
| `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` |
| `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` |
| `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` |
| `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) |

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Part of [SPARK-55388](https://issues.apache.org/jira/browse/SPARK-55388).

Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references.

Existing tests updated to match new error condition names and messages.

No

Closes apache#54996 from Yicong-Huang/SPARK-55579/rename-error-classes.

Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@Yicong-Huang Yicong-Huang force-pushed the SPARK-55579-backport-4.0 branch from 3166216 to def68f3 Compare April 2, 2026 21:25
zhengruifeng pushed a commit that referenced this pull request Apr 3, 2026
…pe-agnostic

### What changes were proposed in this pull request?

Backport of #54996 to branch-4.0.

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

| Old Name | New Name |
|---|---|
| `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` |
| `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` |
| `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` |
| `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) |

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

### Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests updated to match new error class names and messages.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #55169 from Yicong-Huang/SPARK-55579-backport-4.0.

Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@zhengruifeng
Copy link
Copy Markdown
Contributor

merged to 4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants