Skip to content

[SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF#55978

Closed
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56937
Closed

[SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF#55978
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56937

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 19, 2026

What changes were proposed in this pull request?

In verify_arrow_result (python/pyspark/worker.py), the positional branch zips expected and actual columns without a length check, silently truncating to the shorter list. This PR raises RESULT_COLUMN_SCHEMA_MISMATCH on length mismatch.

Why are the changes needed?

Latent since SPARK-40559. Under assignColumnsByName=false, a UDF returning the wrong number of columns either silently drops data (too many) or surfaces a JVM ArrayIndexOutOfBoundsException (too few). The name-based branch already raises a friendly error; positional should be symmetric.

Affects SQL_GROUPED_MAP_ARROW_UDF, SQL_GROUPED_MAP_ARROW_ITER_UDF, SQL_COGROUPED_MAP_ARROW_UDF.

Does this PR introduce any user-facing change?

Yes. Wrong column count under positional mode now raises RESULT_COLUMN_SCHEMA_MISMATCH instead of silent truncation or a JVM error.

How was this patch tested?

Added test_apply_in_arrow_returning_wrong_column_count_positional_assignment in test_arrow_grouped_map.py (covers iterator variant via function_variations) and test_arrow_cogrouped_map.py, exercising both too-many and too-few columns. Full grouped/cogrouped Arrow map suites pass.

Was this patch authored or co-authored using generative AI tooling?

No.

@Yicong-Huang Yicong-Huang changed the title [SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF (positional mode) [SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF May 19, 2026
@zhengruifeng
Copy link
Copy Markdown
Contributor

does these query also fail before this PR?
is this only a change in error message?

@Yicong-Huang
Copy link
Copy Markdown
Contributor Author

does these query also fail before this PR?

is this only a change in error message?

No. They either silently drops data (too many) or surfaces a JVM ArrayIndexOutOfBoundsException (too few).

zhengruifeng pushed a commit that referenced this pull request May 20, 2026
…ped/cogrouped map UDF

### What changes were proposed in this pull request?

In `verify_arrow_result` (`python/pyspark/worker.py`), the positional branch zips expected and actual columns without a length check, silently truncating to the shorter list. This PR raises `RESULT_COLUMN_SCHEMA_MISMATCH` on length mismatch.

### Why are the changes needed?

Latent since SPARK-40559. Under `assignColumnsByName=false`, a UDF returning the wrong number of columns either silently drops data (too many) or surfaces a JVM `ArrayIndexOutOfBoundsException` (too few). The name-based branch already raises a friendly error; positional should be symmetric.

Affects `SQL_GROUPED_MAP_ARROW_UDF`, `SQL_GROUPED_MAP_ARROW_ITER_UDF`, `SQL_COGROUPED_MAP_ARROW_UDF`.

### Does this PR introduce _any_ user-facing change?

Yes. Wrong column count under positional mode now raises `RESULT_COLUMN_SCHEMA_MISMATCH` instead of silent truncation or a JVM error.

### How was this patch tested?

Added `test_apply_in_arrow_returning_wrong_column_count_positional_assignment` in `test_arrow_grouped_map.py` (covers iterator variant via `function_variations`) and `test_arrow_cogrouped_map.py`, exercising both too-many and too-few columns. Full grouped/cogrouped Arrow map suites pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #55978 from Yicong-Huang/SPARK-56937.

Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(cherry picked from commit 4306d02)
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
@zhengruifeng
Copy link
Copy Markdown
Contributor

thanks, merged into master/4.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants