Skip to content

[SPARK-55579][PYTHON][4.1] Rename PySpark error classes to be eval-type-agnostic#55147

Closed
Yicong-Huang wants to merge 2 commits intoapache:branch-4.1from
Yicong-Huang:SPARK-55579-backport-4.1
Closed

[SPARK-55579][PYTHON][4.1] Rename PySpark error classes to be eval-type-agnostic#55147
Yicong-Huang wants to merge 2 commits intoapache:branch-4.1from
Yicong-Huang:SPARK-55579-backport-4.1

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Backport of #54996 to branch-4.1.

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

Old Name New Name
PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS OUTPUT_EXCEEDS_INPUT_ROWS
RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF RESULT_ROWS_MISMATCH
STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF INPUT_NOT_FULLY_CONSUMED
RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF RESULT_COLUMN_SCHEMA_MISMATCH
RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF RESULT_COLUMN_NAMES_MISMATCH
RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF RESULT_COLUMN_NAMES_MISMATCH (merged)

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests updated to match new error class names and messages.

Was this patch authored or co-authored using generative AI tooling?

No.

…nostic

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

| Old Name | New Name |
|---|---|
| `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` |
| `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` |
| `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` |
| `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) |

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Part of [SPARK-55388](https://issues.apache.org/jira/browse/SPARK-55388).

Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references.

Existing tests updated to match new error condition names and messages.

No

Closes apache#54996 from Yicong-Huang/SPARK-55579/rename-error-classes.

Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
zhengruifeng pushed a commit that referenced this pull request Apr 2, 2026
…pe-agnostic

### What changes were proposed in this pull request?

Backport of #54996 to branch-4.1.

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

| Old Name | New Name |
|---|---|
| `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` |
| `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` |
| `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` |
| `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) |

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

### Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests updated to match new error class names and messages.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #55147 from Yicong-Huang/SPARK-55579-backport-4.1.

Lead-authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Co-authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@zhengruifeng
Copy link
Copy Markdown
Contributor

merged to 4.1 for CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants