[SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 #37955

itholic · 2022-09-21T06:06:32Z

What changes were proposed in this pull request?

This PR proposes to upgrade pandas version to 1.5.0 since the new pandas version is released.

Please refer to What's new in 1.5.0 for more detail.

Why are the changes needed?

Pandas API on Spark should follow the latest pandas.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The existing tests should pass

zhengruifeng · 2022-09-21T06:32:10Z

what about also changing dev/requirements.txt

itholic · 2022-09-21T06:55:35Z

what about also changing dev/requirements.txt

Maybe we don't need to set an upper bound for pandas in dev/requirements.txt because pip install -r dev/requirements.txt always install the latest pandas when the version is not specified??

srowen · 2022-09-21T13:32:28Z

I think the upper bound is there to avoid pulling in later pandas versions that may not be tested/supported yet?
Indeed, it looks like tests don't quite pass with 1.5.0 yet

dongjoon-hyun · 2022-09-21T16:22:13Z

As Sean mentioned, it seems to fail. Do we need to fix the failures or to set the upperbound (< 1.5)?

cc @HyukjinKwon , too.

itholic · 2022-09-22T00:40:52Z

Yeah, I think we should make the test pass since pandas-on-Spark should follow the behavior of latest pandas.

Let me take a look. Thanks!

Yikun · 2022-09-22T01:29:33Z

Generally speaking almost each pandas upgrade will cause PS CI to fail, for the stability of CI, our strategy was using <= to pin specific version (it's equal to == in our case), and upgrade to specific version manually.

@itholic FYI, these two testcase are failed due to:

python/pyspark/pandas/tests/indexes/test_category.py.test_append: pandas-dev/pandas@c7b470c
python/pyspark/pandas/tests/indexes/test_base.py.test_to_frame: pandas-dev/pandas@7dbfe9f

not having a more deep look, you could do more invistigation and fix or fix test, just like SPARK-38819.

zhengruifeng · 2022-09-22T01:35:04Z

what about also changing dev/requirements.txt

Maybe we don't need to set an upper bound for pandas in dev/requirements.txt because pip install -r dev/requirements.txt always install the latest pandas when the version is not specified??

I was hit by the conflicts several times caused by the version differences between CI and dev/requirements.txt, but in the mean time, it keep us aware of the dependency updates, so fine to let it alone here.

itholic · 2022-09-22T01:51:25Z

Thanks for the comments!

Let me investigate the test failure and make an umbrella ticket if there are many failures.

If there is few failures, let me handle them in this PR at once.

…0512

itholic · 2022-10-24T08:28:03Z

python/pyspark/pandas/base.py

@@ -867,7 +867,7 @@ def isin(self: IndexOpsLike, values: Sequence[Any]) -> IndexOpsLike:
        Name: animal, dtype: bool

        >>> s.rename("a").to_frame().set_index("a").index.isin(['lama'])
-        Index([True, False, True, False, True, False], dtype='object', name='a')
+        Index([True, False, True, False, True, False], dtype='bool', name='a')


FYI: I included the fix for SPARK-40896 here and below since it's sort of minor fix, and I believe it's the last one.

itholic · 2022-10-25T05:00:13Z

python/pyspark/pandas/strings.py

-        0     0-1
+        0     -01


This bug was fixed in pandas 1.5.0
pandas-dev/pandas#20868

Can we skip this docetsts so the tsets can pass with lower pandas versions too?

…0512

itholic · 2022-10-26T11:07:02Z

Test passed.

python/pyspark/pandas/indexes/datetimes.py

python/pyspark/pandas/base.py

dev/requirements.txt

itholic · 2022-10-27T05:05:55Z

Thanks for the review, @HyukjinKwon and @zhengruifeng !

Just addressed the comments

itholic · 2022-10-28T01:49:09Z

CI passed!

zhengruifeng · 2022-10-28T03:24:47Z

Merged into master, thank you @itholic for doing this!

dongjoon-hyun · 2022-10-28T05:17:31Z

Thank you all!

### What changes were proposed in this pull request? This PR proposes to upgrade pandas version to 1.5.0 since the new pandas version is released. Please refer to [What's new in 1.5.0](https://pandas.pydata.org/docs/whatsnew/v1.5.0.html) for more detail. ### Why are the changes needed? Pandas API on Spark should follow the latest pandas. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The existing tests should pass Closes apache#37955 from itholic/SPARK-40512. Authored-by: itholic <haejoon.lee@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

[SPARK-40512][PS][INFRA] Upgrade pandas to 1.5.0

cf8c95d

github-actions bot added BUILD CORE PANDAS API ON SPARK PYTHON labels Sep 21, 2022

itholic marked this pull request as draft September 22, 2022 01:47

itholic force-pushed the SPARK-40512 branch from 35087c2 to cf8c95d Compare September 27, 2022 06:23

itholic added 5 commits September 27, 2022 15:25

set upperbound for now

f65e1c3

Merge branch 'master' of https://github.com/apache/spark into SPARK-4…

676ae90

…0512

Merge branch 'master' of https://github.com/apache/spark into SPARK-4…

a17deb0

…0512

Merge branch 'master' of https://github.com/apache/spark into SPARK-4…

9910c8f

…0512

Merge branch 'master' of https://github.com/apache/spark into SPARK-4…

bedcf3c

…0512

itholic changed the title ~~[SPARK-40512][PS][INFRA] Upgrade pandas to 1.5.0~~ [SPARK-40512][SPARK-40576][PS][INFRA] Upgrade pandas to 1.5.0 Oct 24, 2022

itholic added 2 commits October 24, 2022 17:21

fix test

69479ac

revert

ba58fe8

itholic commented Oct 24, 2022

View reviewed changes

itholic changed the title ~~[SPARK-40512][SPARK-40576][PS][INFRA] Upgrade pandas to 1.5.0~~ [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 Oct 24, 2022

itholic added 2 commits October 25, 2022 10:15

more fix

0bba699

bug fixed

8ab7798

itholic commented Oct 25, 2022

View reviewed changes

Merge branch 'master' of https://github.com/apache/spark into SPARK-4…

2a109c1

…0512

itholic marked this pull request as ready for review October 26, 2022 11:06

HyukjinKwon approved these changes Oct 26, 2022

View reviewed changes

HyukjinKwon reviewed Oct 26, 2022

View reviewed changes

python/pyspark/pandas/indexes/datetimes.py Show resolved Hide resolved

HyukjinKwon reviewed Oct 26, 2022

View reviewed changes

python/pyspark/pandas/base.py Show resolved Hide resolved

HyukjinKwon reviewed Oct 26, 2022

View reviewed changes

python/pyspark/pandas/base.py Show resolved Hide resolved

zhengruifeng reviewed Oct 26, 2022

View reviewed changes

dev/requirements.txt Outdated Show resolved Hide resolved

resolve the comments

15707b3

resolve the comments

e596e07

zhengruifeng approved these changes Oct 28, 2022

View reviewed changes

zhengruifeng closed this in cf086b1 Oct 28, 2022

itholic deleted the SPARK-40512 branch April 22, 2023 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 #37955

[SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 #37955

itholic commented Sep 21, 2022 •

edited

zhengruifeng commented Sep 21, 2022

itholic commented Sep 21, 2022 •

edited

srowen commented Sep 21, 2022

dongjoon-hyun commented Sep 21, 2022

itholic commented Sep 22, 2022 •

edited

Yikun commented Sep 22, 2022 •

edited

zhengruifeng commented Sep 22, 2022

itholic commented Sep 22, 2022

itholic Oct 24, 2022 •

edited

itholic Oct 25, 2022

HyukjinKwon Oct 26, 2022

zhengruifeng Oct 26, 2022

itholic commented Oct 26, 2022

itholic commented Oct 27, 2022

itholic commented Oct 28, 2022

zhengruifeng commented Oct 28, 2022

dongjoon-hyun commented Oct 28, 2022

[SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 #37955

[SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 #37955

Conversation

itholic commented Sep 21, 2022 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

zhengruifeng commented Sep 21, 2022

itholic commented Sep 21, 2022 • edited

srowen commented Sep 21, 2022

dongjoon-hyun commented Sep 21, 2022

itholic commented Sep 22, 2022 • edited

Yikun commented Sep 22, 2022 • edited

zhengruifeng commented Sep 22, 2022

itholic commented Sep 22, 2022

itholic Oct 24, 2022 • edited

Choose a reason for hiding this comment

itholic Oct 25, 2022

Choose a reason for hiding this comment

HyukjinKwon Oct 26, 2022

Choose a reason for hiding this comment

zhengruifeng Oct 26, 2022

Choose a reason for hiding this comment

itholic commented Oct 26, 2022

itholic commented Oct 27, 2022

itholic commented Oct 28, 2022

zhengruifeng commented Oct 28, 2022

dongjoon-hyun commented Oct 28, 2022

itholic commented Sep 21, 2022 •

edited

itholic commented Sep 21, 2022 •

edited

itholic commented Sep 22, 2022 •

edited

Yikun commented Sep 22, 2022 •

edited

itholic Oct 24, 2022 •

edited