Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 #37955

Closed
wants to merge 13 commits into from

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Sep 21, 2022

What changes were proposed in this pull request?

This PR proposes to upgrade pandas version to 1.5.0 since the new pandas version is released.

Please refer to What's new in 1.5.0 for more detail.

Why are the changes needed?

Pandas API on Spark should follow the latest pandas.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The existing tests should pass

@zhengruifeng
Copy link
Contributor

what about also changing dev/requirements.txt

@itholic
Copy link
Contributor Author

itholic commented Sep 21, 2022

what about also changing dev/requirements.txt

Maybe we don't need to set an upper bound for pandas in dev/requirements.txt because pip install -r dev/requirements.txt always install the latest pandas when the version is not specified??

@srowen
Copy link
Member

srowen commented Sep 21, 2022

I think the upper bound is there to avoid pulling in later pandas versions that may not be tested/supported yet?
Indeed, it looks like tests don't quite pass with 1.5.0 yet

@dongjoon-hyun
Copy link
Member

As Sean mentioned, it seems to fail. Do we need to fix the failures or to set the upperbound (< 1.5)?

cc @HyukjinKwon , too.

@itholic
Copy link
Contributor Author

itholic commented Sep 22, 2022

Yeah, I think we should make the test pass since pandas-on-Spark should follow the behavior of latest pandas.

Let me take a look. Thanks!

@Yikun
Copy link
Member

Yikun commented Sep 22, 2022

Generally speaking almost each pandas upgrade will cause PS CI to fail, for the stability of CI, our strategy was using <= to pin specific version (it's equal to == in our case), and upgrade to specific version manually.

@itholic FYI, these two testcase are failed due to:

not having a more deep look, you could do more invistigation and fix or fix test, just like SPARK-38819.

@zhengruifeng
Copy link
Contributor

what about also changing dev/requirements.txt

Maybe we don't need to set an upper bound for pandas in dev/requirements.txt because pip install -r dev/requirements.txt always install the latest pandas when the version is not specified??

I was hit by the conflicts several times caused by the version differences between CI and dev/requirements.txt, but in the mean time, it keep us aware of the dependency updates, so fine to let it alone here.

@itholic itholic marked this pull request as draft September 22, 2022 01:47
@itholic
Copy link
Contributor Author

itholic commented Sep 22, 2022

Thanks for the comments!

Let me investigate the test failure and make an umbrella ticket if there are many failures.

If there is few failures, let me handle them in this PR at once.

@itholic itholic changed the title [SPARK-40512][PS][INFRA] Upgrade pandas to 1.5.0 [SPARK-40512][SPARK-40576][PS][INFRA] Upgrade pandas to 1.5.0 Oct 24, 2022
@@ -867,7 +867,7 @@ def isin(self: IndexOpsLike, values: Sequence[Any]) -> IndexOpsLike:
Name: animal, dtype: bool

>>> s.rename("a").to_frame().set_index("a").index.isin(['lama'])
Index([True, False, True, False, True, False], dtype='object', name='a')
Index([True, False, True, False, True, False], dtype='bool', name='a')
Copy link
Contributor Author

@itholic itholic Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I included the fix for SPARK-40896 here and below since it's sort of minor fix, and I believe it's the last one.

@itholic itholic changed the title [SPARK-40512][SPARK-40576][PS][INFRA] Upgrade pandas to 1.5.0 [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 Oct 24, 2022
0 0-1
0 -01
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug was fixed in pandas 1.5.0
pandas-dev/pandas#20868

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip this docetsts so the tsets can pass with lower pandas versions too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@itholic itholic marked this pull request as ready for review October 26, 2022 11:06
@itholic
Copy link
Contributor Author

itholic commented Oct 26, 2022

Test passed.

dev/requirements.txt Outdated Show resolved Hide resolved
@itholic
Copy link
Contributor Author

itholic commented Oct 27, 2022

Thanks for the review, @HyukjinKwon and @zhengruifeng !

Just addressed the comments

@itholic
Copy link
Contributor Author

itholic commented Oct 28, 2022

CI passed!

@zhengruifeng
Copy link
Contributor

Merged into master, thank you @itholic for doing this!

@dongjoon-hyun
Copy link
Member

Thank you all!

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

This PR proposes to upgrade pandas version to 1.5.0 since the new pandas version is released.

Please refer to [What's new in 1.5.0](https://pandas.pydata.org/docs/whatsnew/v1.5.0.html) for more detail.

### Why are the changes needed?

Pandas API on Spark should follow the latest pandas.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The existing tests should pass

Closes apache#37955 from itholic/SPARK-40512.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@itholic itholic deleted the SPARK-40512 branch April 22, 2023 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants