[SPARK-56188][PS] Align Series.map({}) with pandas 3 empty-dict behavior by ueshin · Pull Request #54991 · apache/spark

ueshin · 2026-03-24T22:41:35Z

What changes were proposed in this pull request?

This PR updates pandas-on-Spark Series.map for the pandas 3 case when the mapper is an empty plain dict.

In pandas 3, pandas.Series.map({}) returns an all-NaN float64 Series. pandas-on-Spark was preserving the input Spark string type for this path, so a string Series produced a pandas StringDtype result instead. This PR adds a pandas-3-only branch in pyspark.pandas.Series.map so Series.map({}) returns a null DoubleType column for an empty plain dict, which materializes as the same all-NaN float64 result as pandas.

The related test was updated to check the pandas 3 expectation directly, while keeping the pre-pandas-3 expectation unchanged. The existing defaultdict coverage remains in place so dict subclasses with __missing__ continue to follow the existing behavior. This patch also includes a small test adjustment so the local helper used in test_map only uppercases string inputs.

Why are the changes needed?

Without this change, pandas-on-Spark diverges from pandas 3 for Series.map({}). For example, a string Series currently returns a pandas StringDtype result in pandas-on-Spark, while pandas returns an all-NaN float64 Series.

Aligning this narrow case avoids a pandas-version-specific mismatch in pandas-on-Spark behavior and keeps the test expectation consistent with pandas 3 internals.

Does this PR introduce any user-facing change?

Yes, it will behave more like pandas 3.

How was this patch tested?

Updated the related tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex (GPT-5)

dongjoon-hyun

+1, LGTM.

dongjoon-hyun · 2026-03-25T04:43:18Z

Merged to master for Apache Spark 4.2.0.

Align Series.map({}) with pandas 3 empty-dict behavior

27ac2f6

HyukjinKwon approved these changes Mar 24, 2026

View reviewed changes

dongjoon-hyun approved these changes Mar 25, 2026

View reviewed changes

dongjoon-hyun closed this in f9b838b Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56188][PS] Align Series.map({}) with pandas 3 empty-dict behavior#54991

[SPARK-56188][PS] Align Series.map({}) with pandas 3 empty-dict behavior#54991
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56188/series_map

ueshin commented Mar 24, 2026

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ueshin commented Mar 24, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants