-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect #43214
Conversation
This functionality works fine in manual testing with Python interpreter: >>> spark # check if the current session is Spark Connect session.
<pyspark.sql.connect.session.SparkSession object at 0x105b3fbe0>
>>> import pyspark.pandas as ps
>>> import numpy as np
>>> psdf = ps.DataFrame({"A": [1, 2, 3]})
>>> np_name = "arccosh"
>>> np_func = getattr(np, np_name)
>>> np_func(psdf)
A
0 0.000000
1 1.316958
2 1.762747 But failed in UT: spark % python/run-tests --testnames 'pyspark.pandas.tests.connect.test_parity_numpy_compat NumPyCompatParityTests.test_np_spark_compat_frame'
...
======================================================================
FAIL [3.103s]: test_np_spark_compat_frame (pyspark.pandas.tests.connect.test_parity_numpy_compat.NumPyCompatParityTests)
----------------------------------------------------------------------
...
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
...
AssertionError: Test in 'arccosh' function was failed.
----------------------------------------------------------------------
Ran 1 test in 5.956s Let me test how it works on GitHub Actions. |
@@ -20,7 +20,6 @@ | |||
|
|||
from pyspark import pandas as ps | |||
from pyspark.pandas import set_option, reset_option | |||
from pyspark.pandas.numpy_compat import unary_np_spark_mappings, binary_np_spark_mappings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some lines of numpy_compat.py
call pandas_udf
which uses is_remote()
internally, so we should import numpy_compat
after initializing the Spark Connect properly when testing.
merged to master |
… Connect ### What changes were proposed in this pull request? This PR proposes to enable `test_np_spark_compat_frame` and `test_np_spark_compat_series` for Spark Connect. ### Why are the changes needed? To increase test coverage ### Does this PR introduce _any_ user-facing change? No, it's test-only. ### How was this patch tested? The existing tests should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#43214 from itholic/SPARK-43656. Authored-by: Haejoon Lee <haejoon.lee@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
This PR proposes to enable
test_np_spark_compat_frame
andtest_np_spark_compat_series
for Spark Connect.Why are the changes needed?
To increase test coverage
Does this PR introduce any user-facing change?
No, it's test-only.
How was this patch tested?
The existing tests should pass.
Was this patch authored or co-authored using generative AI tooling?
No.