Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series #34687

Closed
wants to merge 2 commits into from

Conversation

Yikun
Copy link
Member

@Yikun Yikun commented Nov 23, 2021

What changes were proposed in this pull request?

This patch has changes as below to follow the pandas behavior:

  • Add nan value process in _non_fractional_astype: Follow the pandas to_string covert method, it should be "NaN" rather than str(np.nan)("nan"), which is covered by self.assert_eq(pser.astype(str), psser.astype(str)).
  • Add null value process in rpow, which is covered by def test_rpow(self)
  • Add index_ops.hasnans in astype, which is covered by test_astype.

This patch also move numeric_w_nan_pdf into numeric_pdf, that means all float_nan/decimal_nan separated test case have been cleaned up and merged into numeric test.

Why are the changes needed?

Follow the pandas behavior

Does this PR introduce any user-facing change?

Yes, correct the null value result to follow the pandas behavior

How was this patch tested?

  1. ut to cover all changes
  2. Passed all python test case with pandas v1.1.x
  3. Passed all python test case with pandas v1.2.x

"decimal",
"float_nan",
]
if LooseVersion(pd.__version__) >= LooseVersion("1.3.0"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, old versions of pandas fails same with/without this PR, right? Supporting it only w/ 1.3.0 is okay as it's not a regression at least.

Copy link
Member Author

@Yikun Yikun Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • pandas on spark with old versions of pandas fails same with/without this PR. (it's very weird, https://gist.github.com/Yikun/6b88920652fc535b336a03746fe3b04f), I added note: # Skip decimal_nan test before v1.3.0, it not supported by pandas on spark yet.
  • pandas on spark with v1.3.0+ of pandas passed with this PR.
  • old versions of pandas (native pandas) support decimal.

This PR only enable the test case of pandas on spark with pandas v1.3.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks the changes look pretty good to me. Thanks for redoing this @Yikun.

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Test build #145529 has finished for PR 34687 at commit 4d9dc87.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Test build #145532 has finished for PR 34687 at commit b5a1008.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50001/

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50004/

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50001/

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50004/

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Test build #145537 has finished for PR 34687 at commit a643b5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50010/

@Yikun Yikun marked this pull request as ready for review November 23, 2021 08:14
@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50010/

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants