[SPARK-56014][PS][TESTS] Fix to_numeric ignore test for pandas 3.0 by ueshin · Pull Request #54836 · apache/spark

ueshin · 2026-03-16T20:52:46Z

What changes were proposed in this pull request?

This PR updates pyspark.pandas.tests.test_namespace.NamespaceTests.test_to_numeric for the pandas 3.0 behavior of to_numeric(..., errors="ignore") with non-Series inputs.

In this code path, ps.to_numeric delegates to pd.to_numeric for non-Series inputs. The existing test assumed that errors="ignore" returns the original input, but pandas 3.0 now raises ValueError("invalid error value specified") instead.

This patch makes the test follow the pandas version in use:

for pandas < 3.0.0, keep the existing equality check
for pandas >= 3.0.0, assert the ValueError

No implementation behavior is changed.

Why are the changes needed?

The current test fails under the pandas 3.0 test environment because its expectation no longer matches upstream pandas behavior.

Since pandas-on-Spark delegates this non-Series case to pandas, the test should reflect the version-specific pandas behavior rather than hard-coding the pre-3.0 result.

Does this PR introduce any user-facing change?

Yes, it will behave more like pandas 3.

How was this patch tested?

Updated the related test.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex (GPT-5)

ueshin · 2026-03-16T20:52:56Z

cc @gaogaotiantian @HyukjinKwon @zhengruifeng

dongjoon-hyun · 2026-03-16T21:31:16Z

python/pyspark/pandas/tests/test_namespace.py

+                pd.to_numeric(data, errors="ignore"), ps.to_numeric(data, errors="ignore")
+            )
+        else:
+            with self.assertRaisesRegex(ValueError, "invalid error value specified"):


Since the default value was raise, this looks okay to me.

BTW, do you think we need to re-fresh the doc, @ueshin ? The docs says ignore doesn't work for pandas-on-Spark Series args.

spark/python/pyspark/pandas/namespace.py

Lines 3595 to 3608 in 7eef6f7

def to_numeric(arg, errors="raise"):

"""

Convert argument to a numeric type.

Parameters

----------

arg : scalar, list, tuple, 1-d array, or Series

Argument to be converted.

errors : {'raise', 'coerce'}, default 'raise'

* If 'coerce', then invalid parsing will be set as NaN.

* If 'raise', then invalid parsing will raise an exception.

* If 'ignore', then invalid parsing will return the input.

.. note:: 'ignore' doesn't work yet when `arg` is pandas-on-Spark Series.

Yes, eventually we should update the docs, but so far pandas 3 is not fully supported and I'm still not sure whether we can make it by 4.2.0 release, so I think we should keep it as-is for now.

Also I see another PR to show warning with pandas 3.

[SPARK-55965][PYTHON] Add warning when pandas >= 3.0.0 is used with PySpark #54760

HyukjinKwon

Looks good except one comment above.

dongjoon-hyun

+1, LGTM. Thank you, @ueshin .

HyukjinKwon · 2026-03-16T23:26:00Z

Merged to master.

Fix to_numeric ignore test for pandas 3.0

1602fea

dongjoon-hyun reviewed Mar 16, 2026

View reviewed changes

HyukjinKwon approved these changes Mar 16, 2026

View reviewed changes

dongjoon-hyun approved these changes Mar 16, 2026

View reviewed changes

HyukjinKwon closed this in 6f3ece6 Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56014][PS][TESTS] Fix to_numeric ignore test for pandas 3.0#54836

[SPARK-56014][PS][TESTS] Fix to_numeric ignore test for pandas 3.0#54836
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56014/to_numeric

ueshin commented Mar 16, 2026

Uh oh!

ueshin commented Mar 16, 2026

Uh oh!

dongjoon-hyun Mar 16, 2026

Uh oh!

ueshin Mar 16, 2026

Uh oh!

dongjoon-hyun Mar 16, 2026

Uh oh!

HyukjinKwon left a comment

Uh oh!

dongjoon-hyun left a comment

Uh oh!

HyukjinKwon commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def to_numeric(arg, errors="raise"):
	"""
	Convert argument to a numeric type.

	Parameters
	----------
	arg : scalar, list, tuple, 1-d array, or Series
	Argument to be converted.
	errors : {'raise', 'coerce'}, default 'raise'
	* If 'coerce', then invalid parsing will be set as NaN.
	* If 'raise', then invalid parsing will raise an exception.
	* If 'ignore', then invalid parsing will return the input.

	.. note:: 'ignore' doesn't work yet when `arg` is pandas-on-Spark Series.

Conversation

ueshin commented Mar 16, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

ueshin commented Mar 16, 2026

Uh oh!

dongjoon-hyun Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

ueshin Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants