[SPARK-56014][PS][TESTS] Fix to_numeric ignore test for pandas 3.0#54836
[SPARK-56014][PS][TESTS] Fix to_numeric ignore test for pandas 3.0#54836ueshin wants to merge 1 commit intoapache:masterfrom
Conversation
| pd.to_numeric(data, errors="ignore"), ps.to_numeric(data, errors="ignore") | ||
| ) | ||
| else: | ||
| with self.assertRaisesRegex(ValueError, "invalid error value specified"): |
There was a problem hiding this comment.
Since the default value was raise, this looks okay to me.
BTW, do you think we need to re-fresh the doc, @ueshin ? The docs says ignore doesn't work for pandas-on-Spark Series args.
spark/python/pyspark/pandas/namespace.py
Lines 3595 to 3608 in 7eef6f7
There was a problem hiding this comment.
Yes, eventually we should update the docs, but so far pandas 3 is not fully supported and I'm still not sure whether we can make it by 4.2.0 release, so I think we should keep it as-is for now.
Also I see another PR to show warning with pandas 3.
HyukjinKwon
left a comment
There was a problem hiding this comment.
Looks good except one comment above.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @ueshin .
|
Merged to master. |
What changes were proposed in this pull request?
This PR updates
pyspark.pandas.tests.test_namespace.NamespaceTests.test_to_numericfor the pandas 3.0 behavior ofto_numeric(..., errors="ignore")with non-Series inputs.In this code path,
ps.to_numericdelegates topd.to_numericfor non-Series inputs. The existing test assumed thaterrors="ignore"returns the original input, but pandas 3.0 now raisesValueError("invalid error value specified")instead.This patch makes the test follow the pandas version in use:
< 3.0.0, keep the existing equality check>= 3.0.0, assert theValueErrorNo implementation behavior is changed.
Why are the changes needed?
The current test fails under the pandas 3.0 test environment because its expectation no longer matches upstream pandas behavior.
Since pandas-on-Spark delegates this non-Series case to pandas, the test should reflect the version-specific pandas behavior rather than hard-coding the pre-3.0 result.
Does this PR introduce any user-facing change?
Yes, it will behave more like pandas 3.
How was this patch tested?
Updated the related test.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: OpenAI Codex (GPT-5)