[SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series #34687

Yikun · 2021-11-23T03:15:42Z

What changes were proposed in this pull request?

This patch has changes as below to follow the pandas behavior:

Add nan value process in _non_fractional_astype: Follow the pandas to_string covert method, it should be "NaN" rather than str(np.nan)("nan")， which is covered by self.assert_eq(pser.astype(str), psser.astype(str)).
Add null value process in rpow, which is covered by def test_rpow(self)
Add index_ops.hasnans in astype, which is covered by test_astype.

This patch also move numeric_w_nan_pdf into numeric_pdf, that means all float_nan/decimal_nan separated test case have been cleaned up and merged into numeric test.

Why are the changes needed?

Follow the pandas behavior

Does this PR introduce any user-facing change?

Yes, correct the null value result to follow the pandas behavior

How was this patch tested?

ut to cover all changes
Passed all python test case with pandas v1.1.x
Passed all python test case with pandas v1.2.x

HyukjinKwon · 2021-11-23T03:20:47Z

python/pyspark/pandas/tests/data_type_ops/testing_utils.py

+            "decimal",
+            "float_nan",
+        ]
+        if LooseVersion(pd.__version__) >= LooseVersion("1.3.0"):


Just to clarify, old versions of pandas fails same with/without this PR, right? Supporting it only w/ 1.3.0 is okay as it's not a regression at least.

pandas on spark with old versions of pandas fails same with/without this PR. (it's very weird, https://gist.github.com/Yikun/6b88920652fc535b336a03746fe3b04f), I added note: # Skip decimal_nan test before v1.3.0, it not supported by pandas on spark yet.

pandas on spark with v1.3.0+ of pandas passed with this PR.

old versions of pandas (native pandas) support decimal.

This PR only enable the test case of pandas on spark with pandas v1.3.0.

Thanks the changes look pretty good to me. Thanks for redoing this @Yikun.

SparkQA · 2021-11-23T04:23:51Z

Test build #145529 has finished for PR 34687 at commit 4d9dc87.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-23T04:35:56Z

Test build #145532 has finished for PR 34687 at commit b5a1008.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-23T04:44:32Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50001/

SparkQA · 2021-11-23T04:59:46Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50004/

SparkQA · 2021-11-23T05:29:18Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50001/

SparkQA · 2021-11-23T06:02:02Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50004/

SparkQA · 2021-11-23T07:28:21Z

Test build #145537 has finished for PR 34687 at commit a643b5e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-23T07:43:31Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50010/

SparkQA · 2021-11-23T08:28:11Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50010/

HyukjinKwon · 2021-11-23T09:52:24Z

Merged to master.

Support arithmetic operations of decimal(nan) series

040e195

github-actions bot added CORE PYTHON labels Nov 23, 2021

Yikun mentioned this pull request Nov 23, 2021

[SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series #34314

Closed

HyukjinKwon reviewed Nov 23, 2021

View reviewed changes

Yikun force-pushed the SPARK-36337-skip branch from 4d9dc87 to b5a1008 Compare November 23, 2021 03:59

Skip decimal nan test before v1.3.0

a643b5e

Yikun force-pushed the SPARK-36337-skip branch from b5a1008 to a643b5e Compare November 23, 2021 06:29

Yikun marked this pull request as ready for review November 23, 2021 08:14

HyukjinKwon approved these changes Nov 23, 2021

View reviewed changes

HyukjinKwon closed this in 3235edb Nov 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series #34687

[SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series #34687

Yikun commented Nov 23, 2021

HyukjinKwon Nov 23, 2021

Yikun Nov 23, 2021 •

edited

HyukjinKwon Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

HyukjinKwon commented Nov 23, 2021

[SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series #34687

[SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series #34687

Conversation

Yikun commented Nov 23, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon Nov 23, 2021

Choose a reason for hiding this comment

Yikun Nov 23, 2021 • edited

Choose a reason for hiding this comment

HyukjinKwon Nov 23, 2021

Choose a reason for hiding this comment

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

HyukjinKwon commented Nov 23, 2021

Yikun Nov 23, 2021 •

edited