Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test #44778

Closed
wants to merge 44 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jan 18, 2024

What changes were proposed in this pull request?

The pr aims to fix pypy3 python tests.

Why are the changes needed?

Currently scheduled job fails (with PyPy3), we should fix it to improve test coverage.

Does this PR introduce any user-facing change?

No, test-only.

How was this patch tested?

  • Pass GA
  • Manually test.

Was this patch authored or co-authored using generative AI tooling?

No.

@@ -917,6 +923,7 @@ def test_complex_return_types(self):
self.assertEqual(row[1], {"a": "b"})
self.assertEqual(row[2], Row(col1=1, col2=2))

@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
def test_named_arguments(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I don't think this uses Arrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me check again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing, we found that the assertDataFrameEqual method used in this UT requires it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic can you make this test not requiring that method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a look. Do we want to fix it separately after this PR merging or before merging??

Copy link
Contributor

@itholic itholic Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.... seems like we got another problem that we also try import pyspark.pandas in assertDataFrameEqual not only pandas. Let me try to made a quick fix.

Copy link
Contributor

@itholic itholic Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that we're checking pandas and pyarrow when importing pyspark.pandas as below, and exit the program when it's not installed:

# pyspark/pandas/__init__.py
try:
    require_minimum_pandas_version()
    require_minimum_pyarrow_version()
except ImportError as e:
    if os.environ.get("SPARK_TESTING"):
        warnings.warn(str(e))
        sys.exit()
    else:
        raise

@HyukjinKwon Maybe should we add some flag or something here to enable running test without pandas and pyarrow within specific system?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a quick fix here: #44864

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this is merged, I will rebase and test it again.

@panbingkun panbingkun changed the title [SPARK-46753][PYTHON][DOCS] Fix pypy3 python test [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test Jan 18, 2024
@github-actions github-actions bot added the BUILD label Jan 18, 2024
@panbingkun
Copy link
Contributor Author

panbingkun commented Jan 18, 2024

For a record:
1.At present, the pyspark test based on pypy3 has turned green, and the corresponding GA running workflow is:
https://github.com/panbingkun/spark/runs/20605281376

2.Let's remove @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message) from the test_named_arguments method and try it out.
Based on my repeated attempts in another PR, the following should have caused this UT to fail. Let's double check again.

assertDataFrameEqual(df, [Row(0), Row(101)])

After testing, we found that it failed, as follows:
https://github.com/panbingkun/spark/actions/runs/7568820365/job/20610860492
image
image

3.So, Let's restore it first. cc @HyukjinKwon

@@ -367,7 +367,7 @@ jobs:
pyspark-pandas-connect-part3
env:
MODULES_TO_TEST: ${{ matrix.modules }}
PYTHON_TO_TEST: 'python3.9'
PYTHON_TO_TEST: 'pypy3'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After obtaining Approval, I will restore this

@panbingkun panbingkun marked this pull request as ready for review January 18, 2024 12:06
@panbingkun
Copy link
Contributor Author

cc @dongjoon-hyun

@github-actions github-actions bot removed the BUILD label Jan 23, 2024
@HyukjinKwon
Copy link
Member

Otherwise looks pretty good. Thanks for driving this @panbingkun

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much, @panbingkun and all! This looks much improved.

@xinrong-meng
Copy link
Member

Looks great, thank you @panbingkun!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@panbingkun , could you recover PYTHON_TO_TEST environment?

@panbingkun
Copy link
Contributor Author

+1, LGTM.

@panbingkun , could you recover PYTHON_TO_TEST environment?

Okay, let me do this(https://github.com/apache/spark/pull/44778/files#r1472155598) first

@HyukjinKwon
Copy link
Member

Can we check #44778 (comment) before we merge? This is the last one left that I think is not related to PyArrow.

@panbingkun
Copy link
Contributor Author

Can we check #44778 (comment) before we merge? This is the last one left that I think is not related to PyArrow.

Sure, I am testing it with "pypy" in platform.python_implementation().lower().
After this success, I will remove "pypy" in platform.python_implementation().lower() from the test_err_udf_init method and reproduce the issue.
I vaguely remember that this issue is similar to test_python_udf_segfault , but let me double check again and wait a moment.

@github-actions github-actions bot removed the INFRA label Jan 31, 2024
@HyukjinKwon
Copy link
Member

Merged to master.

@panbingkun
Copy link
Contributor Author

Merged to master.

Let me continue to observe the next build_python scheduling execution. Thanks all! ❤️

@panbingkun
Copy link
Contributor Author

https://github.com/apache/spark/actions/runs/7728043635
image

Finally it turned green. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants