[SPARK-55723][PYTHON] Generalize enforce_schema error to PySparkTypeError by Yicong-Huang · Pull Request #54736 · apache/spark

Yicong-Huang · 2026-03-10T19:06:39Z

What changes were proposed in this pull request?

Replace PySparkRuntimeError with RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF error class in enforce_schema and ArrowStreamArrowUDTFSerializer with a general PySparkTypeError that reports column name, expected type, and actual type without being specific to any UDF type.

Why are the changes needed?

The RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF error class was UDTF-specific, but enforce_schema is a general utility used across UDF types. The error message ("Column names ... do not match specified schema") was also misleading -- the actual failure is a type cast error, not a column name mismatch.

Does this PR introduce any user-facing change?

Yes. The error type changes from PySparkRuntimeError to PySparkTypeError, and the message now accurately describes the type mismatch:

Before:

PySparkRuntimeError: [RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF] Column names of the returned pyarrow.Table or pyarrow.RecordBatch do not match specified schema. Expected: int32 Actual: string

After:

PySparkTypeError: Result type of column 'id' does not match the expected type. Expected: int32, got: string.

How was this patch tested?

Updated existing test in test_arrow_udtf.py.

Was this patch authored or co-authored using generative AI tooling?

No

Yicong-Huang · 2026-03-10T19:10:33Z

cc @gaogaotiantian

gaogaotiantian · 2026-03-12T21:07:48Z

I'm okay with this in general, but what's our policy to remove an error class? Can we just do it? @HyukjinKwon and @zhengruifeng

allisonwang-db · 2026-03-17T22:32:01Z

python/pyspark/sql/pandas/serializers.py

+                            raise PySparkTypeError(
+                                f"Result type of column '{field.name}' does not "


We don't want to use error class here?

nothing against error class. it's just the previous RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF error class was too UDTF specific.

And this falls perfectly into type error range so I feel creating a new error class is not necessary. Any strong objection on using general PySparkTypeError?

zhengruifeng · 2026-03-19T03:36:38Z

I'm okay with this in general, but what's our policy to remove an error class? Can we just do it? @HyukjinKwon and @zhengruifeng

we don't have a strict policy on removing error class, and I think normally it won't be treated as behavior change (it should be rare case that a job relies on the error class)

I think it is in general fine to remove duplicated ones and reuse existing ones

HyukjinKwon · 2026-03-19T10:25:36Z

Merged to master.

…rror ### What changes were proposed in this pull request? Replace `PySparkRuntimeError` with `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF` error class in `enforce_schema` and `ArrowStreamArrowUDTFSerializer` with a general `PySparkTypeError` that reports column name, expected type, and actual type without being specific to any UDF type. ### Why are the changes needed? The `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF` error class was UDTF-specific, but `enforce_schema` is a general utility used across UDF types. The error message ("Column names ... do not match specified schema") was also misleading -- the actual failure is a type cast error, not a column name mismatch. ### Does this PR introduce _any_ user-facing change? Yes. The error type changes from `PySparkRuntimeError` to `PySparkTypeError`, and the message now accurately describes the type mismatch: **Before:** ``` PySparkRuntimeError: [RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDTF] Column names of the returned pyarrow.Table or pyarrow.RecordBatch do not match specified schema. Expected: int32 Actual: string ``` **After:** ``` PySparkTypeError: Result type of column 'id' does not match the expected type. Expected: int32, got: string. ``` ### How was this patch tested? Updated existing test in `test_arrow_udtf.py`. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#54736 from Yicong-Huang/SPARK-55723/fix/enforce-schema-error. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

fix: generalize enforce_schema error to PySparkTypeError

152e529

allisonwang-db reviewed Mar 17, 2026

View reviewed changes

zhengruifeng requested a review from HyukjinKwon March 19, 2026 03:36

HyukjinKwon approved these changes Mar 19, 2026

View reviewed changes

HyukjinKwon closed this in d0bfd39 Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55723][PYTHON] Generalize enforce_schema error to PySparkTypeError#54736

[SPARK-55723][PYTHON] Generalize enforce_schema error to PySparkTypeError#54736
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-55723/fix/enforce-schema-error

Yicong-Huang commented Mar 10, 2026 •

edited

Loading

Uh oh!

Yicong-Huang commented Mar 10, 2026

Uh oh!

gaogaotiantian commented Mar 12, 2026

Uh oh!

allisonwang-db Mar 17, 2026

Uh oh!

Yicong-Huang Mar 17, 2026

Uh oh!

zhengruifeng commented Mar 19, 2026

Uh oh!

HyukjinKwon commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		raise PySparkTypeError(
		f"Result type of column '{field.name}' does not "

Conversation

Yicong-Huang commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Yicong-Huang commented Mar 10, 2026

Uh oh!

gaogaotiantian commented Mar 12, 2026

Uh oh!

allisonwang-db Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Yicong-Huang Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Mar 19, 2026

Uh oh!

HyukjinKwon commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Yicong-Huang commented Mar 10, 2026 •

edited

Loading