[SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect #43214

itholic · 2023-10-04T10:34:43Z

What changes were proposed in this pull request?

This PR proposes to enable test_np_spark_compat_frame and test_np_spark_compat_series for Spark Connect.

Why are the changes needed?

To increase test coverage

Does this PR introduce any user-facing change?

No, it's test-only.

How was this patch tested?

The existing tests should pass.

Was this patch authored or co-authored using generative AI tooling?

No.

itholic · 2023-10-04T10:40:34Z

This functionality works fine in manual testing with Python interpreter:

>>> spark  # check if the current session is Spark Connect session.
<pyspark.sql.connect.session.SparkSession object at 0x105b3fbe0>
>>> import pyspark.pandas as ps
>>> import numpy as np
>>> psdf = ps.DataFrame({"A": [1, 2, 3]})
>>> np_name = "arccosh"
>>> np_func = getattr(np, np_name)
>>> np_func(psdf)
          A
0  0.000000
1  1.316958
2  1.762747

But failed in UT:

spark % python/run-tests --testnames 'pyspark.pandas.tests.connect.test_parity_numpy_compat NumPyCompatParityTests.test_np_spark_compat_frame'

...

======================================================================
FAIL [3.103s]: test_np_spark_compat_frame (pyspark.pandas.tests.connect.test_parity_numpy_compat.NumPyCompatParityTests)
----------------------------------------------------------------------
...
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
AssertionError: Test in 'arccosh' function was failed.

----------------------------------------------------------------------
Ran 1 test in 5.956s

Let me test how it works on GitHub Actions.

itholic · 2023-10-05T04:50:37Z

python/pyspark/pandas/tests/test_numpy_compat.py

@@ -20,7 +20,6 @@

 from pyspark import pandas as ps
 from pyspark.pandas import set_option, reset_option
-from pyspark.pandas.numpy_compat import unary_np_spark_mappings, binary_np_spark_mappings


Some lines of numpy_compat.py call pandas_udf which uses is_remote() internally, so we should import numpy_compat after initializing the Spark Connect properly when testing.

itholic · 2023-10-05T05:40:47Z

cc @HyukjinKwon @zhengruifeng

zhengruifeng · 2023-10-05T07:05:30Z

merged to master

… Connect ### What changes were proposed in this pull request? This PR proposes to enable `test_np_spark_compat_frame` and `test_np_spark_compat_series` for Spark Connect. ### Why are the changes needed? To increase test coverage ### Does this PR introduce _any_ user-facing change? No, it's test-only. ### How was this patch tested? The existing tests should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#43214 from itholic/SPARK-43656. Authored-by: Haejoon Lee <haejoon.lee@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

[SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect

63bbdee

github-actions bot added PYTHON PANDAS API ON SPARK labels Oct 4, 2023

Move import

e02be42

itholic commented Oct 5, 2023

View reviewed changes

itholic changed the title ~~[WIP][SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect~~ [SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect Oct 5, 2023

itholic marked this pull request as ready for review October 5, 2023 04:50

zhengruifeng changed the title ~~[SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect~~ [SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect Oct 5, 2023

zhengruifeng approved these changes Oct 5, 2023

View reviewed changes

HyukjinKwon approved these changes Oct 5, 2023

View reviewed changes

zhengruifeng closed this in fa71fc3 Oct 5, 2023

itholic deleted the SPARK-43656 branch November 20, 2023 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect #43214

[SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect #43214

itholic commented Oct 4, 2023

itholic commented Oct 4, 2023

itholic Oct 5, 2023

itholic commented Oct 5, 2023

zhengruifeng commented Oct 5, 2023

[SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect #43214

[SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect #43214

Conversation

itholic commented Oct 4, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

itholic commented Oct 4, 2023

itholic Oct 5, 2023

Choose a reason for hiding this comment

itholic commented Oct 5, 2023

zhengruifeng commented Oct 5, 2023