[SPARK-56081][PS] Align idxmax and idxmin NA handling with pandas 3 by ueshin · Pull Request #54908 · apache/spark

ueshin · 2026-03-19T18:58:11Z

What changes were proposed in this pull request?

This PR updates pandas-on-Spark idxmax and idxmin behavior to follow the pandas 3 semantics when NA values are involved.

In python/pyspark/pandas/series.py, Series.idxmax(skipna=False) and Series.idxmin(skipna=False) now raise ValueError on pandas 3 instead of returning np.nan with a FutureWarning, while preserving the existing behavior on pandas versions below 3.0.

In python/pyspark/pandas/frame.py, DataFrame.idxmax(axis=1) and DataFrame.idxmin(axis=1) now raise when pandas 3 would reject rows with all-NA values, instead of silently materializing None.

The related pandas-on-Spark tests were updated to assert the version-specific behavior in:

python/pyspark/pandas/tests/computation/test_idxmax_idxmin.py
python/pyspark/pandas/tests/series/test_index.py

Why are the changes needed?

pandas 3 changed the NA handling for idxmax and idxmin. The existing pandas-on-Spark implementation still followed the older behavior in these paths, which caused mismatches against pandas 3 expectations.

Aligning these code paths keeps pandas-on-Spark behavior consistent with the pandas version it is running against and makes the failure mode explicit instead of returning a deprecated result.

Does this PR introduce any user-facing change?

Yes.

When pandas 3 is used:

Series.idxmax(skipna=False) and Series.idxmin(skipna=False) now raise ValueError when an NA is encountered instead of returning np.nan.
DataFrame.idxmax(axis=1) and DataFrame.idxmin(axis=1) now raise on rows with all NA values instead of returning a null result.

For pandas versions below 3.0, the existing behavior is preserved.

How was this patch tested?

Updated the related tests for the pandas-version-specific idxmax and idxmin behavior.

Was this patch authored or co-authored using generative AI tooling?

No.

ueshin · 2026-03-19T18:58:21Z

cc @gaogaotiantian @HyukjinKwon @zhengruifeng

HyukjinKwon · 2026-03-19T23:26:10Z

Merged to master.

### What changes were proposed in this pull request? This PR updates pandas-on-Spark `idxmax` and `idxmin` behavior to follow the pandas 3 semantics when NA values are involved. In `python/pyspark/pandas/series.py`, `Series.idxmax(skipna=False)` and `Series.idxmin(skipna=False)` now raise `ValueError` on pandas 3 instead of returning `np.nan` with a `FutureWarning`, while preserving the existing behavior on pandas versions below 3.0. In `python/pyspark/pandas/frame.py`, `DataFrame.idxmax(axis=1)` and `DataFrame.idxmin(axis=1)` now raise when pandas 3 would reject rows with all-NA values, instead of silently materializing `None`. The related pandas-on-Spark tests were updated to assert the version-specific behavior in: - `python/pyspark/pandas/tests/computation/test_idxmax_idxmin.py` - `python/pyspark/pandas/tests/series/test_index.py` ### Why are the changes needed? pandas 3 changed the NA handling for `idxmax` and `idxmin`. The existing pandas-on-Spark implementation still followed the older behavior in these paths, which caused mismatches against pandas 3 expectations. Aligning these code paths keeps pandas-on-Spark behavior consistent with the pandas version it is running against and makes the failure mode explicit instead of returning a deprecated result. ### Does this PR introduce _any_ user-facing change? Yes. When pandas 3 is used: - `Series.idxmax(skipna=False)` and `Series.idxmin(skipna=False)` now raise `ValueError` when an NA is encountered instead of returning `np.nan`. - `DataFrame.idxmax(axis=1)` and `DataFrame.idxmin(axis=1)` now raise on rows with all NA values instead of returning a null result. For pandas versions below 3.0, the existing behavior is preserved. ### How was this patch tested? Updated the related tests for the pandas-version-specific `idxmax` and `idxmin` behavior. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#54908 from ueshin/issues/SPARK-56081/idxmax_idxmin. Authored-by: Takuya Ueshin <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

Align idxmax and idxmin NA handling with pandas 3

26b400f

Fix.

5320af1

ueshin closed this Mar 19, 2026

ueshin reopened this Mar 19, 2026

ueshin added 2 commits March 19, 2026 14:04

Test

e5a19f1

Fix.

5ca3da0

HyukjinKwon approved these changes Mar 19, 2026

View reviewed changes

HyukjinKwon closed this in b2536be Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56081][PS] Align idxmax and idxmin NA handling with pandas 3#54908

[SPARK-56081][PS] Align idxmax and idxmin NA handling with pandas 3#54908
ueshin wants to merge 4 commits intoapache:masterfrom
ueshin:issues/SPARK-56081/idxmax_idxmin

ueshin commented Mar 19, 2026

Uh oh!

ueshin commented Mar 19, 2026

Uh oh!

HyukjinKwon commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ueshin commented Mar 19, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

ueshin commented Mar 19, 2026

Uh oh!

HyukjinKwon commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants