Skip to content

[SPARK-56188][PS] Align Series.map({}) with pandas 3 empty-dict behavior#54991

Closed
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56188/series_map
Closed

[SPARK-56188][PS] Align Series.map({}) with pandas 3 empty-dict behavior#54991
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56188/series_map

Conversation

@ueshin
Copy link
Copy Markdown
Member

@ueshin ueshin commented Mar 24, 2026

What changes were proposed in this pull request?

This PR updates pandas-on-Spark Series.map for the pandas 3 case when the mapper is an empty plain dict.

In pandas 3, pandas.Series.map({}) returns an all-NaN float64 Series. pandas-on-Spark was preserving the input Spark string type for this path, so a string Series produced a pandas StringDtype result instead. This PR adds a pandas-3-only branch in pyspark.pandas.Series.map so Series.map({}) returns a null DoubleType column for an empty plain dict, which materializes as the same all-NaN float64 result as pandas.

The related test was updated to check the pandas 3 expectation directly, while keeping the pre-pandas-3 expectation unchanged. The existing defaultdict coverage remains in place so dict subclasses with __missing__ continue to follow the existing behavior. This patch also includes a small test adjustment so the local helper used in test_map only uppercases string inputs.

Why are the changes needed?

Without this change, pandas-on-Spark diverges from pandas 3 for Series.map({}). For example, a string Series currently returns a pandas StringDtype result in pandas-on-Spark, while pandas returns an all-NaN float64 Series.

Aligning this narrow case avoids a pandas-version-specific mismatch in pandas-on-Spark behavior and keeps the test expectation consistent with pandas 3 internals.

Does this PR introduce any user-facing change?

Yes, it will behave more like pandas 3.

How was this patch tested?

Updated the related tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex (GPT-5)

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Copy Markdown
Member

Merged to master for Apache Spark 4.2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants