-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46753][PYTHON][TESTS] Fix pypy3 python test #44778
Conversation
python/pyspark/sql/tests/test_udf.py
Outdated
@@ -917,6 +923,7 @@ def test_complex_return_types(self): | |||
self.assertEqual(row[1], {"a": "b"}) | |||
self.assertEqual(row[2], Row(col1=1, col2=2)) | |||
|
|||
@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message) | |||
def test_named_arguments(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I don't think this uses Arrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let me check again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing, we found that the assertDataFrameEqual
method used in this UT requires it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic can you make this test not requiring that method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a look. Do we want to fix it separately after this PR merging or before merging??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh.... seems like we got another problem that we also try import pyspark.pandas
in assertDataFrameEqual
not only pandas
. Let me try to made a quick fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that we're checking pandas
and pyarrow
when importing pyspark.pandas
as below, and exit the program when it's not installed:
# pyspark/pandas/__init__.py
try:
require_minimum_pandas_version()
require_minimum_pyarrow_version()
except ImportError as e:
if os.environ.get("SPARK_TESTING"):
warnings.warn(str(e))
sys.exit()
else:
raise
@HyukjinKwon Maybe should we add some flag or something here to enable running test without pandas
and pyarrow
within specific system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we can do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a quick fix here: #44864
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this is merged, I will rebase and test it again.
For a record: 2.Let's remove spark/python/pyspark/sql/tests/test_udf.py Line 938 in 38fc127
After testing, we found that it failed, as follows: 3.So, Let's restore it first. cc @HyukjinKwon |
This reverts commit 347c6bc.
.github/workflows/build_and_test.yml
Outdated
@@ -367,7 +367,7 @@ jobs: | |||
pyspark-pandas-connect-part3 | |||
env: | |||
MODULES_TO_TEST: ${{ matrix.modules }} | |||
PYTHON_TO_TEST: 'python3.9' | |||
PYTHON_TO_TEST: 'pypy3' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After obtaining Approval
, I will restore this
Otherwise looks pretty good. Thanks for driving this @panbingkun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much, @panbingkun and all! This looks much improved.
Looks great, thank you @panbingkun! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
@panbingkun , could you recover PYTHON_TO_TEST
environment?
Okay, let me do this(https://github.com/apache/spark/pull/44778/files#r1472155598) first |
Can we check #44778 (comment) before we merge? This is the last one left that I think is not related to PyArrow. |
Sure, I am testing it with |
Merged to master. |
Let me continue to observe the next |
https://github.com/apache/spark/actions/runs/7728043635 Finally it turned green. 😄 |
What changes were proposed in this pull request?
The pr aims to fix
pypy3
python tests.Why are the changes needed?
Currently scheduled job fails (with PyPy3), we should fix it to improve test coverage.
Does this PR introduce any user-facing change?
No, test-only.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.