-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 #37955
Conversation
what about also changing |
Maybe we don't need to set an upper bound for pandas in |
I think the upper bound is there to avoid pulling in later pandas versions that may not be tested/supported yet? |
As Sean mentioned, it seems to fail. Do we need to fix the failures or to set the upperbound (< 1.5)? cc @HyukjinKwon , too. |
Yeah, I think we should make the test pass since pandas-on-Spark should follow the behavior of latest pandas. Let me take a look. Thanks! |
Generally speaking almost each pandas upgrade will cause PS CI to fail, for the stability of CI, our strategy was using @itholic FYI, these two testcase are failed due to:
not having a more deep look, you could do more invistigation and fix or fix test, just like SPARK-38819. |
I was hit by the conflicts several times caused by the version differences between CI and |
Thanks for the comments! Let me investigate the test failure and make an umbrella ticket if there are many failures. If there is few failures, let me handle them in this PR at once. |
35087c2
to
cf8c95d
Compare
@@ -867,7 +867,7 @@ def isin(self: IndexOpsLike, values: Sequence[Any]) -> IndexOpsLike: | |||
Name: animal, dtype: bool | |||
|
|||
>>> s.rename("a").to_frame().set_index("a").index.isin(['lama']) | |||
Index([True, False, True, False, True, False], dtype='object', name='a') | |||
Index([True, False, True, False, True, False], dtype='bool', name='a') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I included the fix for SPARK-40896 here and below since it's sort of minor fix, and I believe it's the last one.
python/pyspark/pandas/strings.py
Outdated
0 0-1 | ||
0 -01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug was fixed in pandas 1.5.0
pandas-dev/pandas#20868
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we skip this docetsts so the tsets can pass with lower pandas versions too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Test passed. |
Thanks for the review, @HyukjinKwon and @zhengruifeng ! Just addressed the comments |
CI passed! |
Merged into master, thank you @itholic for doing this! |
Thank you all! |
### What changes were proposed in this pull request? This PR proposes to upgrade pandas version to 1.5.0 since the new pandas version is released. Please refer to [What's new in 1.5.0](https://pandas.pydata.org/docs/whatsnew/v1.5.0.html) for more detail. ### Why are the changes needed? Pandas API on Spark should follow the latest pandas. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The existing tests should pass Closes apache#37955 from itholic/SPARK-40512. Authored-by: itholic <haejoon.lee@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
This PR proposes to upgrade pandas version to 1.5.0 since the new pandas version is released.
Please refer to What's new in 1.5.0 for more detail.
Why are the changes needed?
Pandas API on Spark should follow the latest pandas.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
The existing tests should pass