[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test #44778

panbingkun · 2024-01-18T03:15:00Z

What changes were proposed in this pull request?

The pr aims to fix pypy3 python tests.

Why are the changes needed?

Currently scheduled job fails (with PyPy3), we should fix it to improve test coverage.

Does this PR introduce any user-facing change?

No, test-only.

How was this patch tested?

Pass GA
Manually test.

Was this patch authored or co-authored using generative AI tooling?

No.

HyukjinKwon · 2024-01-18T04:21:19Z

python/pyspark/sql/tests/test_udf.py

@@ -917,6 +923,7 @@ def test_complex_return_types(self):
        self.assertEqual(row[1], {"a": "b"})
        self.assertEqual(row[2], Row(col1=1, col2=2))

+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
    def test_named_arguments(self):


hm, I don't think this uses Arrow.

Okay, let me check again.

After testing, we found that the assertDataFrameEqual method used in this UT requires it.

@itholic can you make this test not requiring that method?

Let me take a look. Do we want to fix it separately after this PR merging or before merging??

Oh.... seems like we got another problem that we also try import pyspark.pandas in assertDataFrameEqual not only pandas. Let me try to made a quick fix.

Just realized that we're checking pandas and pyarrow when importing pyspark.pandas as below, and exit the program when it's not installed:

# pyspark/pandas/__init__.py try: require_minimum_pandas_version() require_minimum_pyarrow_version() except ImportError as e: if os.environ.get("SPARK_TESTING"): warnings.warn(str(e)) sys.exit() else: raise

@HyukjinKwon Maybe should we add some flag or something here to enable running test without pandas and pyarrow within specific system?

yeah we can do

Made a quick fix here: #44864

After this is merged, I will rebase and test it again.

python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py

panbingkun · 2024-01-18T10:41:33Z

For a record:
1.At present, the pyspark test based on pypy3 has turned green, and the corresponding GA running workflow is:
https://github.com/panbingkun/spark/runs/20605281376

2.Let's remove @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message) from the test_named_arguments method and try it out.
Based on my repeated attempts in another PR, the following should have caused this UT to fail. Let's double check again.

spark/python/pyspark/sql/tests/test_udf.py

Line 938 in 38fc127

assertDataFrameEqual(df, [Row(0), Row(101)])

After testing, we found that it failed, as follows:
https://github.com/panbingkun/spark/actions/runs/7568820365/job/20610860492

3.So, Let's restore it first. cc @HyukjinKwon

This reverts commit 347c6bc.

panbingkun · 2024-01-18T12:06:38Z

.github/workflows/build_and_test.yml

@@ -367,7 +367,7 @@ jobs:
            pyspark-pandas-connect-part3
    env:
      MODULES_TO_TEST: ${{ matrix.modules }}
-      PYTHON_TO_TEST: 'python3.9'
+      PYTHON_TO_TEST: 'pypy3'


After obtaining Approval, I will restore this

panbingkun · 2024-01-18T12:07:21Z

cc @dongjoon-hyun

dev/sparktestsupport/modules.py

python/pyspark/sql/tests/test_utils.py

python/pyspark/sql/tests/test_udtf.py

python/pyspark/sql/tests/test_utils.py

HyukjinKwon · 2024-01-30T03:36:02Z

Otherwise looks pretty good. Thanks for driving this @panbingkun

dongjoon-hyun

Thank you so much, @panbingkun and all! This looks much improved.

xinrong-meng · 2024-01-30T19:07:19Z

Looks great, thank you @panbingkun!

dongjoon-hyun

+1, LGTM.

@panbingkun , could you recover PYTHON_TO_TEST environment?

https://github.com/apache/spark/pull/44778/files#r1457349381

panbingkun · 2024-01-31T01:25:43Z

+1, LGTM.

@panbingkun , could you recover PYTHON_TO_TEST environment?

https://github.com/apache/spark/pull/44778/files#r1457349381

Okay, let me do this(https://github.com/apache/spark/pull/44778/files#r1472155598) first

HyukjinKwon · 2024-01-31T01:40:23Z

Can we check #44778 (comment) before we merge? This is the last one left that I think is not related to PyArrow.

panbingkun · 2024-01-31T02:02:30Z

Can we check #44778 (comment) before we merge? This is the last one left that I think is not related to PyArrow.

Sure, I am testing it with "pypy" in platform.python_implementation().lower().
After this success, I will remove "pypy" in platform.python_implementation().lower() from the test_err_udf_init method and reproduce the issue.
I vaguely remember that this issue is similar to test_python_udf_segfault , but let me double check again and wait a moment.

HyukjinKwon · 2024-01-31T06:10:55Z

Merged to master.

panbingkun · 2024-01-31T06:31:54Z

Merged to master.

Let me continue to observe the next build_python scheduling execution. Thanks all! ❤️

panbingkun · 2024-02-01T00:18:02Z

https://github.com/apache/spark/actions/runs/7728043635

Finally it turned green. 😄

[SPARK-46753][PYTHON][DOCS] Fix pypy3 python test

2b7138c

github-actions bot added SQL INFRA PYTHON labels Jan 18, 2024

HyukjinKwon reviewed Jan 18, 2024

View reviewed changes

python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py Outdated Show resolved Hide resolved

add test_utils

9bfb1ef

panbingkun changed the title ~~[SPARK-46753][PYTHON][DOCS] Fix pypy3 python test~~ [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test Jan 18, 2024

exclude pyspark_testing when on pypy

357e27d

github-actions bot added the BUILD label Jan 18, 2024

panbingkun added 2 commits January 18, 2024 18:45

test test_named_arguments

347c6bc

Revert "test test_named_arguments"

8a5c890

This reverts commit 347c6bc.

panbingkun commented Jan 18, 2024

View reviewed changes

panbingkun marked this pull request as ready for review January 18, 2024 12:06

dongjoon-hyun reviewed Jan 18, 2024

View reviewed changes

dev/sparktestsupport/modules.py Outdated Show resolved Hide resolved

panbingkun added 2 commits January 19, 2024 10:06

Merge branch 'master' into SPARK-46753

4093e5a

fix utils.py

581efcf

panbingkun requested a review from HyukjinKwon January 22, 2024 08:36

panbingkun added 2 commits January 23, 2024 16:24

[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test

9d905aa

Merge branch 'master' into SPARK-46753

4b1a9e1

github-actions bot removed the BUILD label Jan 23, 2024

[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test

272c35f

HyukjinKwon reviewed Jan 24, 2024

View reviewed changes

python/pyspark/sql/tests/test_utils.py Show resolved Hide resolved

allisonwang-db reviewed Jan 24, 2024

View reviewed changes

python/pyspark/sql/tests/test_udtf.py Outdated Show resolved Hide resolved

itholic mentioned this pull request Jan 24, 2024

[SPARK-46728][PYTHON] Check Pandas installation properly #44745

Closed

panbingkun added 2 commits January 24, 2024 09:42

Merge branch 'master' into SPARK-46753

b7cee5d

[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test

a97b020

itholic mentioned this pull request Jan 24, 2024

[SPARK-46824][PS][BUILD] Enable Pandas-on-Spark test without optional dependency on PyPy #44864

Closed

panbingkun added 4 commits January 29, 2024 19:59

fix connectutils.py

34224cc

fix test_utils.py

543b9ce

fix test_utils.py

d1554ab

fix test_udf_profiler.py

05cd67c

HyukjinKwon reviewed Jan 30, 2024

View reviewed changes

python/pyspark/sql/tests/test_utils.py Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jan 30, 2024

View reviewed changes

panbingkun added 3 commits January 30, 2024 15:09

reproduce test_udf.py issue

396c9a0

try connectutils.py

fce95ca

restore connectutils.py

d7b4aba

dongjoon-hyun approved these changes Jan 30, 2024

View reviewed changes

panbingkun added 2 commits January 31, 2024 09:17

Merge branch 'master' into SPARK-46753

aedfe9c

fix test_udf.py

03d0676

fix test_udf.py

4e9be48

panbingkun added 3 commits January 31, 2024 10:37

fix test_udf.py

5bdb2a3

Merge branch 'master' into SPARK-46753

fdac041

fix

82b5787

HyukjinKwon approved these changes Jan 31, 2024

View reviewed changes

Update .github/workflows/build_and_test.yml

60fd53a

itholic approved these changes Jan 31, 2024

View reviewed changes

github-actions bot removed the INFRA label Jan 31, 2024

HyukjinKwon closed this in 0871a6f Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test #44778

[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test #44778

panbingkun commented Jan 18, 2024 •

edited

HyukjinKwon Jan 18, 2024

panbingkun Jan 18, 2024

panbingkun Jan 18, 2024

HyukjinKwon Jan 24, 2024

itholic Jan 24, 2024

itholic Jan 24, 2024 •

edited

itholic Jan 24, 2024 •

edited

HyukjinKwon Jan 24, 2024

itholic Jan 24, 2024

panbingkun Jan 24, 2024

panbingkun commented Jan 18, 2024 •

edited

panbingkun Jan 18, 2024

panbingkun commented Jan 18, 2024

HyukjinKwon commented Jan 30, 2024

dongjoon-hyun left a comment

xinrong-meng commented Jan 30, 2024

dongjoon-hyun left a comment

panbingkun commented Jan 31, 2024

HyukjinKwon commented Jan 31, 2024

panbingkun commented Jan 31, 2024

HyukjinKwon commented Jan 31, 2024

panbingkun commented Jan 31, 2024

panbingkun commented Feb 1, 2024

[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test #44778

[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test #44778

Conversation

panbingkun commented Jan 18, 2024 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itholic Jan 24, 2024 • edited

Choose a reason for hiding this comment

itholic Jan 24, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panbingkun commented Jan 18, 2024 • edited

Choose a reason for hiding this comment

panbingkun commented Jan 18, 2024

HyukjinKwon commented Jan 30, 2024

dongjoon-hyun left a comment

Choose a reason for hiding this comment

xinrong-meng commented Jan 30, 2024

dongjoon-hyun left a comment

Choose a reason for hiding this comment

panbingkun commented Jan 31, 2024

HyukjinKwon commented Jan 31, 2024

panbingkun commented Jan 31, 2024

HyukjinKwon commented Jan 31, 2024

panbingkun commented Jan 31, 2024

panbingkun commented Feb 1, 2024

panbingkun commented Jan 18, 2024 •

edited

itholic Jan 24, 2024 •

edited

itholic Jan 24, 2024 •

edited

panbingkun commented Jan 18, 2024 •

edited