Skip to content

[SPARK-56118][PS] Match pandas 3.0 bool handling in GroupBy.quantile#54929

Closed
ueshin wants to merge 2 commits intoapache:masterfrom
ueshin:issues/SPARK-56118/quantile
Closed

[SPARK-56118][PS] Match pandas 3.0 bool handling in GroupBy.quantile#54929
ueshin wants to merge 2 commits intoapache:masterfrom
ueshin:issues/SPARK-56118/quantile

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Mar 20, 2026

What changes were proposed in this pull request?

This PR updates pandas API on Spark GroupBy.quantile to align its bool-dtype behavior with pandas 3.0.

Currently, GroupBy.quantile allows bool input and emits a FutureWarning. With this change, pandas API on Spark keeps that warning-based behavior when running against pandas versions earlier than 3.0, but raises TypeError("Cannot use quantile with bool dtype") when running with pandas 3.0 or later.

The related groupby quantile tests were also updated to cover both version-dependent paths:

  • normal quantile behavior for numeric data
  • warning-compatible behavior for bool data on pandas < 3.0
  • TypeError for bool data on pandas >= 3.0
  • existing invalid q input checks

Why are the changes needed?

pandas 3.0 no longer allows quantile on bool dtype. pandas API on Spark should follow that behavior so groupby quantile results stay consistent with the pandas version it targets.

Without this change, pandas API on Spark would continue accepting bool input under pandas 3.0 and diverge from pandas behavior.

Does this PR introduce any user-facing change?

Yes, it will behave more like pandas 3.

How was this patch tested?

Updated the related tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@ueshin
Copy link
Member Author

ueshin commented Mar 20, 2026

@ueshin ueshin force-pushed the issues/SPARK-56118/quantile branch from 8f20c92 to 4b23626 Compare March 20, 2026 22:24
@ueshin ueshin force-pushed the issues/SPARK-56118/quantile branch from 4b23626 to 34b83e5 Compare March 20, 2026 22:25
@dongjoon-hyun
Copy link
Member

Merged to master.

terana pushed a commit to terana/spark that referenced this pull request Mar 23, 2026
### What changes were proposed in this pull request?

This PR updates pandas API on Spark `GroupBy.quantile` to align its bool-dtype behavior with pandas 3.0.

Currently, `GroupBy.quantile` allows bool input and emits a `FutureWarning`. With this change, pandas API on Spark keeps that warning-based behavior when running against pandas versions earlier than 3.0, but raises `TypeError("Cannot use quantile with bool dtype")` when running with pandas 3.0 or later.

The related groupby quantile tests were also updated to cover both version-dependent paths:
- normal quantile behavior for numeric data
- warning-compatible behavior for bool data on pandas < 3.0
- `TypeError` for bool data on pandas >= 3.0
- existing invalid `q` input checks

### Why are the changes needed?

pandas 3.0 no longer allows `quantile` on bool dtype. pandas API on Spark should follow that behavior so groupby quantile results stay consistent with the pandas version it targets.

Without this change, pandas API on Spark would continue accepting bool input under pandas 3.0 and diverge from pandas behavior.

### Does this PR introduce _any_ user-facing change?

Yes, it will behave more like pandas 3.

### How was this patch tested?

Updated the related tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#54929 from ueshin/issues/SPARK-56118/quantile.

Authored-by: Takuya Ueshin <ueshin@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants