Skip to content

[SPARK-56014][PS][TESTS] Fix to_numeric ignore test for pandas 3.0#54836

Closed
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56014/to_numeric
Closed

[SPARK-56014][PS][TESTS] Fix to_numeric ignore test for pandas 3.0#54836
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56014/to_numeric

Conversation

@ueshin
Copy link
Copy Markdown
Member

@ueshin ueshin commented Mar 16, 2026

What changes were proposed in this pull request?

This PR updates pyspark.pandas.tests.test_namespace.NamespaceTests.test_to_numeric for the pandas 3.0 behavior of to_numeric(..., errors="ignore") with non-Series inputs.

In this code path, ps.to_numeric delegates to pd.to_numeric for non-Series inputs. The existing test assumed that errors="ignore" returns the original input, but pandas 3.0 now raises ValueError("invalid error value specified") instead.

This patch makes the test follow the pandas version in use:

  • for pandas < 3.0.0, keep the existing equality check
  • for pandas >= 3.0.0, assert the ValueError

No implementation behavior is changed.

Why are the changes needed?

The current test fails under the pandas 3.0 test environment because its expectation no longer matches upstream pandas behavior.

Since pandas-on-Spark delegates this non-Series case to pandas, the test should reflect the version-specific pandas behavior rather than hard-coding the pre-3.0 result.

Does this PR introduce any user-facing change?

Yes, it will behave more like pandas 3.

How was this patch tested?

Updated the related test.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex (GPT-5)

@ueshin
Copy link
Copy Markdown
Member Author

ueshin commented Mar 16, 2026

pd.to_numeric(data, errors="ignore"), ps.to_numeric(data, errors="ignore")
)
else:
with self.assertRaisesRegex(ValueError, "invalid error value specified"):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the default value was raise, this looks okay to me.

BTW, do you think we need to re-fresh the doc, @ueshin ? The docs says ignore doesn't work for pandas-on-Spark Series args.

def to_numeric(arg, errors="raise"):
"""
Convert argument to a numeric type.
Parameters
----------
arg : scalar, list, tuple, 1-d array, or Series
Argument to be converted.
errors : {'raise', 'coerce'}, default 'raise'
* If 'coerce', then invalid parsing will be set as NaN.
* If 'raise', then invalid parsing will raise an exception.
* If 'ignore', then invalid parsing will return the input.
.. note:: 'ignore' doesn't work yet when `arg` is pandas-on-Spark Series.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, eventually we should update the docs, but so far pandas 3 is not fully supported and I'm still not sure whether we can make it by 4.2.0 release, so I think we should keep it as-is for now.

Also I see another PR to show warning with pandas 3.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it~

Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except one comment above.

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @ueshin .

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants