Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect #43214

Closed
wants to merge 2 commits into from

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Oct 4, 2023

What changes were proposed in this pull request?

This PR proposes to enable test_np_spark_compat_frame and test_np_spark_compat_series for Spark Connect.

Why are the changes needed?

To increase test coverage

Does this PR introduce any user-facing change?

No, it's test-only.

How was this patch tested?

The existing tests should pass.

Was this patch authored or co-authored using generative AI tooling?

No.

@itholic
Copy link
Contributor Author

itholic commented Oct 4, 2023

This functionality works fine in manual testing with Python interpreter:

>>> spark  # check if the current session is Spark Connect session.
<pyspark.sql.connect.session.SparkSession object at 0x105b3fbe0>
>>> import pyspark.pandas as ps
>>> import numpy as np
>>> psdf = ps.DataFrame({"A": [1, 2, 3]})
>>> np_name = "arccosh"
>>> np_func = getattr(np, np_name)
>>> np_func(psdf)
          A
0  0.000000
1  1.316958
2  1.762747

But failed in UT:

spark % python/run-tests --testnames 'pyspark.pandas.tests.connect.test_parity_numpy_compat NumPyCompatParityTests.test_np_spark_compat_frame'

...

======================================================================
FAIL [3.103s]: test_np_spark_compat_frame (pyspark.pandas.tests.connect.test_parity_numpy_compat.NumPyCompatParityTests)
----------------------------------------------------------------------
...
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
AssertionError: Test in 'arccosh' function was failed.

----------------------------------------------------------------------
Ran 1 test in 5.956s

Let me test how it works on GitHub Actions.

@@ -20,7 +20,6 @@

from pyspark import pandas as ps
from pyspark.pandas import set_option, reset_option
from pyspark.pandas.numpy_compat import unary_np_spark_mappings, binary_np_spark_mappings
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some lines of numpy_compat.py call pandas_udf which uses is_remote() internally, so we should import numpy_compat after initializing the Spark Connect properly when testing.

@itholic itholic changed the title [WIP][SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect [SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect Oct 5, 2023
@itholic itholic marked this pull request as ready for review October 5, 2023 04:50
@itholic
Copy link
Contributor Author

itholic commented Oct 5, 2023

cc @HyukjinKwon @zhengruifeng

@zhengruifeng zhengruifeng changed the title [SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect [SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect Oct 5, 2023
@zhengruifeng
Copy link
Contributor

merged to master

LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request Oct 7, 2023
… Connect

### What changes were proposed in this pull request?

This PR proposes to enable `test_np_spark_compat_frame` and `test_np_spark_compat_series` for Spark Connect.

### Why are the changes needed?

To increase test coverage

### Does this PR introduce _any_ user-facing change?

No, it's test-only.

### How was this patch tested?

The existing tests should pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#43214 from itholic/SPARK-43656.

Authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@itholic itholic deleted the SPARK-43656 branch November 20, 2023 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants