New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series #34687
Conversation
"decimal", | ||
"float_nan", | ||
] | ||
if LooseVersion(pd.__version__) >= LooseVersion("1.3.0"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, old versions of pandas fails same with/without this PR, right? Supporting it only w/ 1.3.0 is okay as it's not a regression at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- pandas on spark with old versions of pandas fails same with/without this PR. (it's very weird, https://gist.github.com/Yikun/6b88920652fc535b336a03746fe3b04f), I added note:
# Skip decimal_nan test before v1.3.0, it not supported by pandas on spark yet.
- pandas on spark with v1.3.0+ of pandas passed with this PR.
- old versions of pandas (native pandas) support decimal.
This PR only enable the test case of pandas on spark with pandas v1.3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks the changes look pretty good to me. Thanks for redoing this @Yikun.
4d9dc87
to
b5a1008
Compare
Test build #145529 has finished for PR 34687 at commit
|
Test build #145532 has finished for PR 34687 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status failure |
b5a1008
to
a643b5e
Compare
Test build #145537 has finished for PR 34687 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Merged to master. |
What changes were proposed in this pull request?
This patch has changes as below to follow the pandas behavior:
"NaN"
rather thanstr(np.nan)
("nan"
), which is covered byself.assert_eq(pser.astype(str), psser.astype(str))
.def test_rpow(self)
astype
, which is covered bytest_astype
.This patch also move
numeric_w_nan_pdf
intonumeric_pdf
, that means all float_nan/decimal_nan separated test case have been cleaned up and merged into numeric test.Why are the changes needed?
Follow the pandas behavior
Does this PR introduce any user-facing change?
Yes, correct the null value result to follow the pandas behavior
How was this patch tested?